You are on page 1of 4

Equality and Zero Detection in Carry

Lookahead Adders
Joel D. Wigton, Member, IEEE, Brian M. Werst
Abstract This paper presents a method for performing equality detection on both input sources and zero detection on one of
the input sources of an existing adder, which allows reduction of circuitry. The method takes advantage of existing circuitry in
any carry look-ahead adder (CLA), and may add only a few gates off the critical path to support equality detection. Typical
implementations of equality and zero detection require dedicated NOR or XNOR/AND trees operating in parallel to the adder.
The presented method is desirable in that we can implement these detections with negligible impact on area, power, and timing
while eliminating the aforementioned NOR and AND trees completely.
Index TermsBranch Predicate Logic, Control Design, Carry Lookahead Adders, Computational Logic, Equality Detection,
Zero Detection

INTRODUCTION

OST instruction set architectures provide instructions for detecting equality on two register entries
as well as detecting when a register entry equals
zero. These are used as qualifiers for branch instructions,
whether done directly on a Reduced Instruction Set
Computer (RISC) machine or indirectly in an architecture
which supports instruction predication.
In this paper, we outline a simple method to do this
zero-detection or equality-detection on hardware that
exists in nearly all modern processors: the carry
lookahead adder (CLA) [1]. The methods presented here
are unique compared to other methods in that they do not
require the large additional tree circuitry, and they work
on fundamental principles of binary arithmetic.

explicit instruction in which specialized hardware does


the comparison. In this case, the most straightforward
approach would be to construct a large NOR tree across
all bits of register $rs. Thus if all bits are 0, the final output will be high.
Detecting when registers are equal or equal to zero also comes up in the field of branch predication. Instruction predication is a technique to turn control dependencies into data dependencies, and is used in both the IA-64
and ARM architectures [3] [4]. To simplify, architectures
that support predication typically execute both conditional branch paths and then use the value produced by the
one whose predicate was true. The arithmetic units that
set the predicate values typically support equality and
zero comparisons.

BACKGROUND
A typical RISC architecture [2] might define:
beq $rs, $rt, LABEL;

# branch if $rs == $rt

and
beqz $rs, LABEL;

# branch if $rs == 0

The most straight-forward way to detect the beq case


would be to XNOR each bit of $rs with the corresponding
bit of $rt this will be 1 if the bits are equal. A large AND
tree following will determine if the entire register values
are equal.
Although a simplified Instruction Set Architecture
(ISA) could alias the beqz instruction to beq with one
register hardcoded to 0, many ISAs additionally offer an

J.D. Wigton is with Intel, Corp., Fort Collins, CO.


E-mail: joel.d.wigton@intel.com
B.M. Werst is with Intel, Corp., Fort Collins, CO.
E-mail: brian.werst@intel.com
Manuscript received August 3, 2009.

PRINCIPLE OF OPERATION
a. Equality Detection
Consider the n-bit CLA which has input sources A
(src0) and B (src1). We will herein refer to input source
An:A0 as simply A, and similar for B. We can detect if the
input sources are equal using the following two steps:
1.
2.

Invert one of the input sources (B for illustration)


Set carryin = 0

The idea behind this is as follows: suppose that the two


sources A and B are indeed equal. With carryin set low,
the numbers added will be A + ~B and thus the input
sources to the adder see opposites because their noninverted values are equal. Because the numbers are opposite, the propagate signal at every bit position is asserted (see Fig. 1). But because carryin is low, there will be
nothing to propagate, hence carryout will be low (remem-

ber that in CLAs, Gi = Ai Bi and Pi = Ai + Bi, so P G and


we do not need to consider generate bits).

1.
2.
3.

100101 (A)
+ 011010 (~B)
-------111111 (P)
Fig. 1. Equal case. Adder inputs and bit propagate signals shown.

However, we cannot simply say that the sources are


equal if there is no carry-out. For example, we run into
difficulty in the case where two inputs are not-equal and
we get two zeroes in the same bit position (see Fig. 2)
here we also nullify any possible carries. In this case carryout is low because all the propagate bits are not set, not
because carryin is low. So we cannot necessarily say A
and B are equal just because there is no carry-out.
11101 (A)
+ 00000 (~B)
------11101 (P)
Fig. 2. Case where numbers are not equal, but carryout is still low

To resolve this, we notice that the propagate signals for


every bit are not high anymore, like we had in the first
case. What we need is a way to know if all of the propagate signals are high. Conveniently, propagate signals
are already generated for each bit position as part of a
standard CLA adder implementation and many of these
signals are already combined into group propagates. The
propagate tree only needs to be completed in order to use
the full group propagate to qualify the carryout signal
and generate the isequal expression. This is often as simple as ANDing the top four or so (design-dependent) bit
propagates into the group propagate.
Therefore, there are two conditions required to show
that the inputs are equal: the entire propagate tree is high
and the carryout is low. We see from the following table
(where the group propagate P63:0 is written as just P for
brevity) that the expression is:
isequal = ~carryout P
Propagation Tree
P=0
P=0
P=1
P=1

These conditions on contol logic are three-fold:

Carry Out
carryout = 0
carryout = 1
carryout = 0
carryout = 1

State
Not Equal
Not Equal
Equal
Not Equal

Table 1. Definitions for four possible states.

b. Zero Detection
Zero detection is simpler than equality detection. Using the same CLA, we have defined our iszero instruction
to look for all zeroes on src1 (B). When we configure our
input controls in a particular way, we shall see that iszero
= carryout, and we get this result without any large NOR
tree needed.

Set src0 to all zeroes


Set src1 to ~B
Set carryin to 1

For example:

Fig. 3. Zero detection.

The only way to get a carry-out of the adder is if B = 0.


In the case where B is zero, ~B is all 1s. Therefore the carry-in will propagate all the way to the carry-out. We also
see when B 0, at least one bit position in ~B will be a 0
and will prevent the propagation of the carry. In other
words, 1s in B absorb the carry and carryout will be zero. Hence we see that iszero = carryout.

PROOF OF OPERATION
The isequal operation can be proven by straightforward
logical deduction. First we examine the case when the
two input sources are equal. Given A, ~B, and carryin=0
as inputs to the CLA; if A=B then the generate, Gi, at any
bit position will be 0 and the propagate, Pi, at any bit position will be 1 and thus the propagate tree, P0P1Pn = 1.
With carryin=0 and no carry being generated with the Gi
at every bit position being 0, the carry out will be 0. The
propagate tree P=1 and carryout=0 are the two conditions
from the table above that are needed to detect equality.
100101 (A)
+ 011010 (~B)
-------111111 (P)

100101 (A)
+ 011010 (~B)
-------000000 (G)

Fig. 4. Propagates and generates for equal case.

Next we examine the case when the two input sources


are not equal. For any bit position that is not equal there
are only two possibilities after source B is inverted: both
sources could be 0 or both sources could be 1. If both
sources to the adder are 0 for a given bit position, then
propagate for that bit position will be 0 and in turn the
propagate tree will also be 0. This is sufficient to detect
the inequality regardless of what value any of the other
bits hold or the value of the carryout.
For the case of both adder sources for a given bit position equaling 1, both the generate and propagate bit for
that bit position will be 1. If all of the other bits downstream produce a propagate bit equal to 1, then the generate bit from the aforementioned bit will manifest as a

carryout and the inequality will be detected as given in


Table 1. The only other possibility is for another downstream bit to not be equal and have a bit propagate of 0
and thus a propagate tree of 0. This would be detected as
an inequality regardless of the carryout value due to zeroing out the propagate tree.
In addition to outlining each case above to demonstrate
logical correctness for the isequal operation, we have also
run a brute-force simulation on a Verilog model for an 8bit adder to demonstrate correct operation. For every
possible value of A, we looped through every possible
value of B and compared the output with the expected
answer. There is nothing unique about an 8-bit adder
that prevents us from extending the principle to an n-bit
adder.
The iszero operation is simple enough that it is proved
above by observation. It does not require the number
system to be twos-complement, although this is common.
This may be a requirement for the CLA itself to perform
subtraction, but is not needed for zero detection.

COMPARISON TO OTHER METHODS


a. Equality Detection
As previously mentioned, the standard implementation for equality detection is typically an XNOR between
Ai and Bi inputs to show equality for each bit position,
followed by an n-input AND tree to give the final answer.
For 2-input gates, the AND tree will require log2(n) stages
of logic and one more stage due to the XNORs for a total
of log2(n) + 1 stages of gate delays. In terms of gate count,
the traditional method requires n 1 gates for the AND
tree plus n XOR gates, leading to 2n 1 total gate count to
perform equality detection.

tional gate delay. The isequal signal can also be generated


in parallel to the final carry using a complex AOI gate.
The advantage of our method allows us to remove
the XOR and AND tree, saving 2n 1 gates and thus reducing area and leakage power, while reducing the load
on src0 and src1. We add a handful of gates off the critical path to enable this. For even moderate-sized adders
this is a big savings. For example, the isequal signal can be
generated for a 64-bit input by adding as little as four, 2input NAND gates and two, 2-input NOR gates instead of
127 gates for the traditional tree structure.
One advantage of our design is that it works with
both CLA implementations: OR-based propagate and
XOR-based propagate, though almost all high-speed implementations use an OR-propagate definition. Interestingly, for the XOR-propagate implementation, by inverting input B the group propagate signal computes the isequal signal directly, forcing the carryin qualification to be
irrelevant.
b. Zero Detection
A typical implementation of zero detection is a large
fan-in NOR gate, which is commonly implemented as a
multi-stage tree. Like the analysis for equality detection,
an n-bit NOR tree will require log2(n) + 1 stages of logic,
which will require a total of n gates. This NOR tree is
connected to the input of src1 and can be considered a
separate side-path circuit from the rest of the CLA.
The advantage to our method is it uses the already existing adder to get the same functionality, with minimal
additional circuitry required. This allows us to remove
the NOR tree; saving area and leakage power. It also reduces the CLA source loading and is comparable in speed
to the NOR tree. Most high-speed CLAs will take log2(n)
stages to compute the carry for the highest conditional
sum block. We need one more gate to generate carryout,
which brings the total to the same log2(n) + 1 stages of
logic as typically taken by the NOR tree, making timing
comparable.

Fig. 5. State of the art for equality detection, XNOR then AND tree
speedpath for 64 bits

In contrast, our method allows almost all of the circuitry we need to come from the already existing CLA.
For a CLA implementation using conditional sum outputs, the only propagate terms not already included in
the naturally-occurring AND tree are the high order bits
above the last conditional-sum carry. Depending on how
the carry tree was constructed, there may be some additional ANDing required to complete the propagate tree.
These bits can be ANDed separately and off the critical
path. The number of gates depends on the size of the
conditional-sum blocks, but in general it would be a few
extra gates. Finally, the isequal = ~carryout Pn:0 logic can
be generated using a single, two input gate for an addi-

Fig. 6. State of the art for zero detection, NOR tree speedpath for 64
bits

CONCLUSION
The method presented in this paper allows us to use
less area, generate less leakage power, and reduce CLA
input loading, which leads to a more efficient overall design that does not compromise timing. In addition, we
get these benefits practically for free, by configuring
control logic properly and adding only a handful of gates.

Operation
isequal
iszero

Gates (trees)
2n 1
n

Gates (CLA)
~6
None extra

Table 2. Comparison to typical (AND/NOR tree) solution.

REFERENCES
[1]
[2]

[3]
[4]

A. Weinberger and J.L. Smith, A Logic for High-Speed Addition,


National Bureau of Standards Circular, 591, pp. 3-12, 1958
D.A. Patterson and J.L. Hennessy, Computer Organization and
Design: The Hardware/Software Interface 3 rd Ed., San Francisco,
CA, Morgan Kaufmann, 2005
H. Sharangpani and K. Arora, Itanium Processor Microarchitecture, IEEE Micro, vol. 20, no. 5, pp. 24-43, Sep.-Oct. 2000
D. Seal, ARM Architecture Reference Manual, 2nd Ed., AddisonWesley, Reading, MA, 2000

Joel D Wigton (M08) received his Bachelors of Science in electrical


engineering at the University of Michigan in 2005 and his Masters of
Science in electrical engineering at the University of Michigan in
2007. He has been a Design Engineer at Intel, Fort Collins, CO
since 2007.
Brian M Werst received his Bachelors of Science in electrical engineering at Wright State University in 1996 and his Masters of Science in electrical engineering at Wright State University in 1998. He
has been a Design Engineer at Intel, Fort Collins, CO since 2005.
Previous to that he was a Design Engineer for Hewlett Packard, Fort
Collins, CO for 6 years.

You might also like