A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

1 / 23

# A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products - PowerPoint PPT Presentation

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products. Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University. f. e. c. a. d. b. q = c * d. p = a * b. p. q. z = p + q + e + f. z. What is a Sum-of-Product (SOP).

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### A Timing-Driven Synthesis Approach of a FastFour-Stage Hybrid Adder in Sum-of-Products

Sabyasachi Das

Sunil P. Khatri

Texas A&M University

f

e

c

a

d

b

q = c * d

p = a * b

p

q

z = p + q + e + f

z

What is a Sum-of-Product (SOP)
• An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms.
• General form of SOP:
Multiplier {assign z = a * b}

found in Microprocessors

Multiply-Accumulator {assign z = (a * b) + c}

found in Cryptographic Applications

Squarer {assign z = a * a}

found in DSP processors

Addition Tree {assign z = a + b + c + d}

found in ALU, Wireless applications

Generalized SOP {assign z = (a * b) + (c * d)}

found in FIR filters, IIR filters

Examples of SOP Blocks
Synthesis of Sum-of-Products

Inputs

• Synthesis of Sum-of-Product blocks is done in 3 steps (in the order of data-flow)
• Creation of Partial Products
• Reduction of Partial Products into 2 operands
• Computation of Final Sum by adding the 2 operands

Creation of

Partial Products

Reduction of

Partial Products

Computation of

Final Sum

Output

Motivation and Problem Statement
• SOP blocks are widely used and computationally-intensive
• Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression
• Stand-alone adder architectures do not work well in SOP
• Ripple-Carry
• Area-efficient, but slow
• Timing-efficient if inputs have skewed arrival time
• Parallel-Prefix architecture (Brent-Kung, Kogge-Stone)
• Faster architecture
• Requires more area
• Carry-Select
• Large area overhead (often >100%)
• Better delay if Cin signal arrives late.
• None of these are very suitable in Sum-of-Products
• Why?
Special Arrival-time Property
• The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern
• As a result, traditional monolithic adders do not work well in SOP
• Optimized for equal arrival times
• Hence, hybrid adders are required, which exploit this arrival-time pattern
• Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks

w1

w2

w3

w1

w4

w2

w3

w4

RippleCarry

KoggeStone

CarrySelect

CarrySelect

w1

w2

w3

w4

• Ripple-Carry architecture near LSB
• Fast Kogge-Stone architecture near Middle
• 2 Carry-Selects (based on Brent-Kung) near MSB
• GOAL : Find w1 , w2 , w3 and w4 algorithmically
Notations
• We use the following notations:
• The bit-width of SubAdder1 (Ripple) is w1 bits
• The bit-width of SubAdder2 (Kogge-Stone) is w2 bits
• The bit-width of SubAdder3 (Carry-Select, Brent-Kung) is w3 bits
• The bit-width of SubAdder4 (Carry-Select, Brent-Kung) is w4 bits
• w1 + w2 + w3 + w4 = n (total width of the hybrid adder)
• T(ai) = Time when input signal ai is available
• T(Si) = Time when output signal Si (Sumi) is available
• T(Ci) = Time when output signal Ci (Carryi) is available

x0

x1

y1

y0

FA

FA

z1

z0

xk

yk

x2

y2

• Most area-efficient architecture
• Very slow
• Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest)

FA

FA

zk+1

zk

z2

• In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept).
• For each bit i of the adder, Generate (Gi) indicates whether a carry is generated from that bit
• Gi = ai bi
• For each bit i of the adder, Propagate (Pi) indicates whether a carry is propagated through that bit
• Pi = ai bi
• The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next

(Gright, Pright )

(Gleft, Pleft)

(Gleft, right, Pleft, right )

• If two blocks (comprising one or more bits) have the GP value-pairs as (Gleft, Pleft) and (Gright, Pright), then the combined block has the GP values as follows:
• Gleft, right = Gleft (Pleft Gright)
• Pleft, right = Pleft Pright
• The above computation is performed

by a carry-operator or ”o”-operator

• Once we obtain carry for each bit,

it is trivial to compute the sum

output of each bit (XOR and NAND)

GP0

GP6

GP7

GP2

GP4

GP5

GP3

GP1

C7

C1

C8

C3

C5

C4

C6

C2

• Kogge-Stone Parallel prefix architecture
• Delay: log2n levelsof ”o”-operator
• Area: (n*log2n)-n+1 number of ”o”-operator

Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

Brent-Kung (BK)

GP0

GP6

GP7

GP2

GP4

GP5

GP3

GP1

C7

C1

C8

C3

C5

C4

C6

C2

• Brent-Kung Parallel prefix architecture
• Delay: (2*log2n)-2 levels of ”o”-operator
• Area: (2*n)-2-log2n number of ”o”-operator

Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

y

x

y

x

• Used as a special case, since Cin arrives late
• Speed depends on the architecture of two adders
• But these adders need not be KS (rather, we use BK)
• The arrival times of the inputs of SubAdder3 and SubAdder4 are earlier than those for SubAdder2

1’b1

1’b0

z1

z0

Mux

cin

z

• At every bit (i), compute T(Ci+1) and check if
• T(Ci+1) ≤ T(ai+1)
• T(Ci+1) ≤ T(bi+1)
• If check passes, i = i+1
• Else continue checking until 3 consecutive bits fail the check (Hill Climbing)
• Return the value i as the Ripple Adder width
• The latest arriving signals are part of this adder
• Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry-Select adder for SubAdder3 and SubAdder4
• We determine the widths with the following equation:
• w2 = n – w1 if (n-w1) ≤ 8
• w2 = 2p, where p = log2 (n-w1) if (n-w1) > 8
• Example: If n=32 and w1=7 then w2=16

w1

w2

w3

w1

w4

w2

w3

w4

RippleCarry

KoggeStone

CarrySelect

CarrySelect

w1

w2

w3

w4

T(C4)

T(S4)

T(S3)

T(S2)

Thybrid = max (T(C4), T(S4), T(S3), T(S2))

• Width of the two Carry-Select adders
• Initial width configuration
• w3 = (n-w1-w2)/2
• w4 = (n-w1-w2-w3)
• With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide)
• Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration
Experimental Setup
• To test our approach, we used:
• Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer)
• Two process technologies (0.13µ and 0.09µ)
• Two commercial library vendors
• Two different arrival time constraints
• We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool.
Results

On an average, 14.31%faster than the result of the commercial Synthesis tool (with 6.62% area penalty)

Summary