a timing driven synthesis approach of a fast four stage hybrid adder in sum of products l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products PowerPoint Presentation
Download Presentation
A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products

Loading in 2 Seconds...

play fullscreen
1 / 23

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products - PowerPoint PPT Presentation


  • 130 Views
  • Uploaded on

A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products. Sabyasachi Das University of Colorado, Boulder Sunil P. Khatri Texas A&M University. f. e. c. a. d. b. q = c * d. p = a * b. p. q. z = p + q + e + f. z. What is a Sum-of-Product (SOP).

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A Timing-Driven Synthesis Approach of a Fast Four-Stage Hybrid Adder in Sum-of-Products' - maura


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a timing driven synthesis approach of a fast four stage hybrid adder in sum of products

A Timing-Driven Synthesis Approach of a FastFour-Stage Hybrid Adder in Sum-of-Products

Sabyasachi Das

University of Colorado, Boulder

Sunil P. Khatri

Texas A&M University

what is a sum of product sop

f

e

c

a

d

b

q = c * d

p = a * b

p

q

z = p + q + e + f

z

What is a Sum-of-Product (SOP)
  • An arithmetic Sum-of-Product block (SOP) consists of an arbitrary number of product terms and sum terms.
  • General form of SOP:
examples of sop blocks
Multiplier {assign z = a * b}

found in Microprocessors

Multiply-Accumulator {assign z = (a * b) + c}

found in Cryptographic Applications

Squarer {assign z = a * a}

found in DSP processors

Addition Tree {assign z = a + b + c + d}

found in ALU, Wireless applications

Generalized SOP {assign z = (a * b) + (c * d)}

found in FIR filters, IIR filters

Examples of SOP Blocks
synthesis of sum of products
Synthesis of Sum-of-Products

Inputs

  • Synthesis of Sum-of-Product blocks is done in 3 steps (in the order of data-flow)
    • Creation of Partial Products
    • Reduction of Partial Products into 2 operands
    • Computation of Final Sum by adding the 2 operands

Creation of

Partial Products

Reduction of

Partial Products

Computation of

Final Sum

Output

motivation and problem statement
Motivation and Problem Statement
  • SOP blocks are widely used and computationally-intensive
  • Final adder in SOP consumes about 30% to 40% delay of the SOP block. This paper focuses on the synthesis of an efficient final adder for a SOP expression
  • Stand-alone adder architectures do not work well in SOP
stand alone adder architectures
Stand-alone Adder Architectures
  • Frequently used adder architectures
    • Ripple-Carry
      • Area-efficient, but slow
      • Timing-efficient if inputs have skewed arrival time
    • Parallel-Prefix architecture (Brent-Kung, Kogge-Stone)
      • Faster architecture
      • Requires more area
    • Carry-Select
      • Large area overhead (often >100%)
      • Better delay if Cin signal arrives late.
  • None of these are very suitable in Sum-of-Products
    • Why?
special arrival time property
Special Arrival-time Property
  • The 2 operands of the final adder in a SOP exhibit a peculiar arrival time pattern
  • As a result, traditional monolithic adders do not work well in SOP
    • Optimized for equal arrival times
  • Hence, hybrid adders are required, which exploit this arrival-time pattern
  • Hence it is critical to synthesize an efficient hybrid adder which is designed specifically for SOP blocks
proposed 4 stage hybrid adder
Proposed 4-Stage Hybrid Adder

w1

w2

w3

w1

w4

w2

w3

w4

SubAdder1

RippleCarry

SubAdder2

KoggeStone

SubAdder3

CarrySelect

SubAdder4

CarrySelect

w1

w2

w3

w4

  • Ripple-Carry architecture near LSB
  • Fast Kogge-Stone architecture near Middle
  • 2 Carry-Selects (based on Brent-Kung) near MSB
  • GOAL : Find w1 , w2 , w3 and w4 algorithmically
notations
Notations
  • We use the following notations:
    • The bit-width of SubAdder1 (Ripple) is w1 bits
    • The bit-width of SubAdder2 (Kogge-Stone) is w2 bits
    • The bit-width of SubAdder3 (Carry-Select, Brent-Kung) is w3 bits
    • The bit-width of SubAdder4 (Carry-Select, Brent-Kung) is w4 bits
    • w1 + w2 + w3 + w4 = n (total width of the hybrid adder)
    • T(ai) = Time when input signal ai is available
    • T(Si) = Time when output signal Si (Sumi) is available
    • T(Ci) = Time when output signal Ci (Carryi) is available
subadder 1 ripple carry

x0

x1

y1

y0

FA

FA

z1

z0

SubAdder1 (Ripple-Carry)

xk

yk

x2

y2

  • Most area-efficient architecture
  • Very slow
  • Timing-efficient if input arrival time is skewed. We use it for a few bits near LSB (which arrive earliest)

FA

FA

zk+1

zk

z2

parallel prefix adders ks bk
Parallel-Prefix Adders (KS, BK)
  • In a Parallel-Prefix adder, Carry for each bit is computed by an efficient tree-structure (using the Generate and Propagate concept).
  • For each bit i of the adder, Generate (Gi) indicates whether a carry is generated from that bit
    • Gi = ai bi
  • For each bit i of the adder, Propagate (Pi) indicates whether a carry is propagated through that bit
    • Pi = ai bi
  • The Generate and Propagate concept is extendable to blocks comprising multiple bits, as we discuss next
parallel prefix adders ks bk12

(Gright, Pright )

(Gleft, Pleft)

(Gleft, right, Pleft, right )

Parallel-Prefix Adders (KS, BK)
  • If two blocks (comprising one or more bits) have the GP value-pairs as (Gleft, Pleft) and (Gright, Pright), then the combined block has the GP values as follows:
    • Gleft, right = Gleft (Pleft Gright)
    • Pleft, right = Pleft Pright
  • The above computation is performed

by a carry-operator or ”o”-operator

  • Once we obtain carry for each bit,

it is trivial to compute the sum

output of each bit (XOR and NAND)

subadder 2 kogge stone
SubAdder2 (Kogge-Stone)

GP0

GP6

GP7

GP2

GP4

GP5

GP3

GP1

C7

C1

C8

C3

C5

C4

C6

C2

  • Kogge-Stone Parallel prefix architecture
    • Delay: log2n levelsof ”o”-operator
    • Area: (n*log2n)-n+1 number of ”o”-operator

Kogge and Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, In IEEE transaction for Computers, 1973

brent kung bk
Brent-Kung (BK)

GP0

GP6

GP7

GP2

GP4

GP5

GP3

GP1

C7

C1

C8

C3

C5

C4

C6

C2

  • Brent-Kung Parallel prefix architecture
    • Delay: (2*log2n)-2 levels of ”o”-operator
    • Area: (2*n)-2-log2n number of ”o”-operator

Brent and Kung, “A regular layout for parallel adders”, In IEEE transaction for Computers, 1982

subadder 3 subadder 4 carry select
SubAdder3 & SubAdder4 (Carry-Select)

y

x

y

x

  • Large area overhead
  • Used as a special case, since Cin arrives late
  • Speed depends on the architecture of two adders
    • But these adders need not be KS (rather, we use BK)
    • The arrival times of the inputs of SubAdder3 and SubAdder4 are earlier than those for SubAdder2

1’b1

1’b0

Adder0

Adder1

z1

z0

Mux

cin

z

determination of width of subadder 1
Determination of width of SubAdder1
  • Width of the Ripple adder (SubAdder1)
    • At every bit (i), compute T(Ci+1) and check if
      • T(Ci+1) ≤ T(ai+1)
      • T(Ci+1) ≤ T(bi+1)
    • If check passes, i = i+1
    • Else continue checking until 3 consecutive bits fail the check (Hill Climbing)
    • Return the value i as the Ripple Adder width
determination of width of subadder 2
Determination of width of SubAdder2
  • Width of Kogge-Stone Adder (SubAdder2)
    • The latest arriving signals are part of this adder
    • Hence keep this adder wide, while ensuring that this does not result in a very narrow Carry-Select adder for SubAdder3 and SubAdder4
    • We determine the widths with the following equation:
      • w2 = n – w1 if (n-w1) ≤ 8
      • w2 = 2p, where p = log2 (n-w1) if (n-w1) > 8
    • Example: If n=32 and w1=7 then w2=16
delay of the hybrid adder
Delay of the Hybrid Adder

w1

w2

w3

w1

w4

w2

w3

w4

SubAdder1

RippleCarry

SubAdder2

KoggeStone

SubAdder3

CarrySelect

SubAdder4

CarrySelect

w1

w2

w3

w4

T(C4)

T(S4)

T(S3)

T(S2)

Thybrid = max (T(C4), T(S4), T(S3), T(S2))

determination of widths of subadder 3 and subadder 4
Determination of widths of SubAdder3 andSubAdder4
  • Width of the two Carry-Select adders
    • Initial width configuration
      • w3 = (n-w1-w2)/2
      • w4 = (n-w1-w2-w3)
    • With this initial configuration, estimate delay of the overall hybrid adder (based on the previous slide)
    • Use an iterative approach to explore in the appropriate direction (similar to Binary Search) and converge on the smallest delay configuration
experimental setup
Experimental Setup
  • To test our approach, we used:
    • Adders in several different types of SOP blocks (Multipliers, MAC, generalized SOP and Squarer)
    • Two process technologies (0.13µ and 0.09µ)
    • Two commercial library vendors
    • Two different arrival time constraints
  • We compared the results of our hybrid adder with the adder produced by a commercial datapath synthesis tool.
results
Results

On an average, 14.31%faster than the result of the commercial Synthesis tool (with 6.62% area penalty)

summary
Summary
  • Hybrid adder consists of 4 SubAdders
  • SubAdder1 has Ripple-Carry architecture
  • SubAdder2 has Kogge-Stone architecture
  • SubAdder3 and SubAdder4 have Carry-Select (based on Brent-Kung) architecture
  • Widths of all SubAdders are computed based on a timing-driven analysis
  • On an average, 14.31% faster (with 6.62% area penalty)