Loading in 2 Seconds...

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

Loading in 2 Seconds...

- By
**maura** - Follow User

- 131 Views
- Uploaded on

Download Presentation
## A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Timing-Driven Synthesis Approach of a FastFour-Stage Hybrid Adder in Sum-of-Products

Sabyasachi Das

University of Colorado, Boulder

Sunil P. Khatri

Texas A&M University

e

c

a

d

b

q = c * d

p = a * b

p

q

z = p + q + e + f

z

What is a Sum-of-Product (SOP)- An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms.
- General form of SOP:

Multiplier {assign z = a * b}

found in Microprocessors

Multiply-Accumulator {assign z = (a * b) + c}

found in Cryptographic Applications

Squarer {assign z = a * a}

found in DSP processors

Addition Tree {assign z = a + b + c + d}

found in ALU, Wireless applications

Generalized SOP {assign z = (a * b) + (c * d)}

found in FIR filters, IIR filters

Examples of SOP BlocksSynthesis of Sum-of-Products

Inputs

- Synthesis of Sum-of-Product blocks is done in 3 steps (in the order of data-flow)
- Creation of Partial Products
- Reduction of Partial Products into 2 operands
- Computation of Final Sum by adding the 2 operands

Creation of

Partial Products

Reduction of

Partial Products

Computation of

Final Sum

Output

Motivation and Problem Statement

- SOP blocks are widely used and computationally-intensive
- Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression
- Stand-alone adder architectures do not work well in SOP

Stand-alone Adder Architectures

- Frequently used adder architectures
- Ripple-Carry
- Area-efficient, but slow
- Timing-efficient if inputs have skewed arrival time
- Parallel-Prefix architecture (Brent-Kung, Kogge-Stone)
- Faster architecture
- Requires more area
- Carry-Select
- Large area overhead (often >100%)
- Better delay if Cin signal arrives late.
- None of these are very suitable in Sum-of-Products
- Why?

Special Arrival-time Property

- The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern
- As a result, traditional monolithic adders do not work well in SOP
- Optimized for equal arrival times
- Hence, hybrid adders are required, which exploit this arrival-time pattern
- Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks

Proposed 4-Stage Hybrid Adder

w1

w2

w3

w1

w4

w2

w3

w4

SubAdder1

RippleCarry

SubAdder2

KoggeStone

SubAdder3

CarrySelect

SubAdder4

CarrySelect

w1

w2

w3

w4

- Ripple-Carry architecture near LSB
- Fast Kogge-Stone architecture near Middle
- 2 Carry-Selects (based on Brent-Kung) near MSB
- GOAL : Find w1 , w2 , w3 and w4 algorithmically

Notations

- We use the following notations:
- The bit-width of SubAdder1 (Ripple) is w1 bits
- The bit-width of SubAdder2 (Kogge-Stone) is w2 bits
- The bit-width of SubAdder3 (Carry-Select, Brent-Kung) is w3 bits
- The bit-width of SubAdder4 (Carry-Select, Brent-Kung) is w4 bits
- w1 + w2 + w3 + w4 = n (total width of the hybrid adder)
- T(ai) = Time when input signal ai is available
- T(Si) = Time when output signal Si (Sumi) is available
- T(Ci) = Time when output signal Ci (Carryi) is available

x1

y1

y0

FA

FA

z1

z0

SubAdder1 (Ripple-Carry)xk

yk

x2

y2

- Most area-efficient architecture
- Very slow
- Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest)

FA

FA

zk+1

zk

z2

Parallel-Prefix Adders (KS, BK)

- In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept).
- For each bit i of the adder, Generate (Gi) indicates whether a carry is generated from that bit
- Gi = ai bi
- For each bit i of the adder, Propagate (Pi) indicates whether a carry is propagated through that bit
- Pi = ai bi
- The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next

(Gleft, Pleft)

(Gleft, right, Pleft, right )

Parallel-Prefix Adders (KS, BK)- If two blocks (comprising one or more bits) have the GP value-pairs as (Gleft, Pleft) and (Gright, Pright), then the combined block has the GP values as follows:
- Gleft, right = Gleft (Pleft Gright)
- Pleft, right = Pleft Pright
- The above computation is performed

by a carry-operator or ”o”-operator

- Once we obtain carry for each bit,

it is trivial to compute the sum

output of each bit (XOR and NAND)

SubAdder2 (Kogge-Stone)

GP0

GP6

GP7

GP2

GP4

GP5

GP3

GP1

C7

C1

C8

C3

C5

C4

C6

C2

- Kogge-Stone Parallel prefix architecture
- Delay: log2n levelsof ”o”-operator
- Area: (n*log2n)-n+1 number of ”o”-operator

Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

Brent-Kung (BK)

GP0

GP6

GP7

GP2

GP4

GP5

GP3

GP1

C7

C1

C8

C3

C5

C4

C6

C2

- Brent-Kung Parallel prefix architecture
- Delay: (2*log2n)-2 levels of ”o”-operator
- Area: (2*n)-2-log2n number of ”o”-operator

Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

SubAdder3 & SubAdder4 (Carry-Select)

y

x

y

x

- Large area overhead
- Used as a special case, since Cin arrives late
- Speed depends on the architecture of two adders
- But these adders need not be KS (rather, we use BK)
- The arrival times of the inputs of SubAdder3 and SubAdder4 are earlier than those for SubAdder2

1’b1

1’b0

Adder0

Adder1

z1

z0

Mux

cin

z

Determination of width of SubAdder1

- Width of the Ripple adder (SubAdder1)
- At every bit (i), compute T(Ci+1) and check if
- T(Ci+1) ≤ T(ai+1)
- T(Ci+1) ≤ T(bi+1)
- If check passes, i = i+1
- Else continue checking until 3 consecutive bits fail the check (Hill Climbing)
- Return the value i as the Ripple Adder width

Determination of width of SubAdder2

- Width of Kogge-Stone Adder (SubAdder2)
- The latest arriving signals are part of this adder
- Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry-Select adder for SubAdder3 and SubAdder4
- We determine the widths with the following equation:
- w2 = n – w1 if (n-w1) ≤ 8
- w2 = 2p, where p = log2 (n-w1) if (n-w1) > 8
- Example: If n=32 and w1=7 then w2=16

Delay of the Hybrid Adder

w1

w2

w3

w1

w4

w2

w3

w4

SubAdder1

RippleCarry

SubAdder2

KoggeStone

SubAdder3

CarrySelect

SubAdder4

CarrySelect

w1

w2

w3

w4

T(C4)

T(S4)

T(S3)

T(S2)

Thybrid = max (T(C4), T(S4), T(S3), T(S2))

Determination of widths of SubAdder3 andSubAdder4

- Width of the two Carry-Select adders
- Initial width configuration
- w3 = (n-w1-w2)/2
- w4 = (n-w1-w2-w3)
- With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide)
- Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration

Experimental Setup

- To test our approach, we used:
- Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer)
- Two process technologies (0.13µ and 0.09µ)
- Two commercial library vendors
- Two different arrival time constraints
- We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool.

Results

On an average, 14.31%faster than the result of the commercial Synthesis tool (with 6.62% area penalty)

Summary

- Hybrid adder consists of 4 SubAdders
- SubAdder1 has Ripple-Carry architecture
- SubAdder2 has Kogge-Stone architecture
- SubAdder3 and SubAdder4 have Carry-Select (based on Brent-Kung) architecture
- Widths of all SubAdders are computed based on a timing-driven analysis
- On an average, 14.31% faster (with 6.62% area penalty)

Download Presentation

Connecting to Server..