1 / 19

CSE246 Adder – Part II

Learn about Zimmerman's Heuristic Approach for generating a parallel prefix adder of minimum size with depth constraint. Explore the advantages, disadvantages, and dynamic programming involved in constructing the fastest prefix adder under arbitrary input arrival time profiles.

olivec
Download Presentation

CSE246 Adder – Part II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSE246Adder – Part II Instructor: Prof. Chung-Kuan Cheng

  2. Zimmerman’s Heuristic Approach • Problem formulation • Given depth constraint, generate a parallel prefix adder of minimum size • Two step Heuristic Start with a serial prefix adder • Compress to a fastest prefix structure at the cost of increasing size • LSB to MSB, low level to high level • Expand to reduce size, subject to depth constraint • MSB to LSB, high level to low level

  3. Zimmerman’s Heuristic Approach • Local compression/expansion operation • Up/down shift

  4. Zimmerman’s Heuristic Approach • Advantages • Simple and fast • Product depth-size optimal result in many cases • Handles non-uniform input arrival times • Disadvantage • No guarantee on optimality

  5. Prefix Adder with arbitrary input arrival time profile • Non-uniform input arrival times represented in real number • How to construct the fastest prefix adder under arbitrary input arrival time profile?

  6. Cont’ • Timing model • All (G,P) generators have the same delay C • Denote the output timing of generator (G,P)[i:j] as t[i:j] • Suppose in the prefix graph, (G,P)[i:j] is generated from (G,P)[j:k] and (G,P)[k-1:j], then t[i:j] = max{t[i:k] , t[k-1:j] }+C

  7. … Level 1: … (G,P)[i:j] = (G,P)[i:k] (G,P)[k-1:j] Level 2: . … . . Level n: Dynamic Programming – The idea • Image a full array of partial prefix results • All (G,P) signals of length i are on level i • Rightmost signals are wanted prefix results • Generate all the (G,P) signals row by row, from lower level to higher level • For each (G,P) signal, find the scheme that leads to best timing, i.e., find the partition point k such that t[i:j] = min{max{t[i:k] , t[k-1:j] }+C} t[n:n] t[n-1:n-1] t[2:2] t[1:1] k t[n:n-1] t[2:1] t[n:n-2] t[3:1] t[n:2] t[n-1:1] t[n:1]

  8. 2(g4p4) 4(g3p3) 3(g2p2) 1(g1p1) 0(G0) Level 1 6 6 5 3(GP[1,0]) Level 2 8 7 5(GP[2,0]) Level 3 8 7(GP[3,0]) Level 4 8(GP[4,0]) Level 5 7 8 Dynamic Programming • A 5-bit example

  9. Dynamic Programming • Complexity • For (G,P)[i:j], search (i-j) combinations • Overall O(n3) • Hints for reducing complexity • For (G,P)[i:j], there might more than one optimal partition points, but we want just one • At least one optimal partition point of (G,P)[i:j] is bounded by the optimal partition points of (G,P)[i-1:j] and (G,P)[i:j+1]

  10. Backward Reduction I • Some of the partial prefix results are not used, hence can be removed Level 1 Level 2 Level 3 Level 4 Level 5 (a) (b)

  11. 3(g4p4) 3(g4p4) 6(g3p3) 6(g3p3) 7(g2p2) 7(g2p2) 11(g1p1) 11(g1p1) 8 9 8 9 13 13 (9) (G,P)[2,1] (G,P)[2,1] (11) (9) (11) (13) (13) 10 10 13 13 (11) (G,P)[4,2] (G,P)[4,2] (G,P)[3,1] (G,P)[3,1] (11) (13) (13) 13 13 (G,P)[4,1] (G,P)[4,1] (13) (13) 9 8 9 () (9) (9) 11 11 (11) (11) Backward Reduction II • Some nodes may be over tightened, and can be relaxed to reduce area

  12. A missing detail • (G,P) signals allows overlap  search space increases • However, allowing overlapping does not produce better timing (G,P)[i:j] = (G,P)[i:k] (G,P)[l:j] l ≥k

  13. a11,8 b11,8 a7,4 b7,4 a3,0 b3,0 c12 c8 c4 cin A2 A1 A0 p11,8 p7,4 p3,0 x c12 0 1 0 1 0 1 c4 c8 Function level optimization • Carry Skip Adder If p3,0=p3p2p1p0 = 1, then x = cin

  14. False Path • A1 <- MUX <- A0 <- cin is a false path • If carry is from cin, then block must have p3p2p1p0 = 1 • Since p3,0 = 1, g3,0 must be 0 • The carry is not generated from A0 • The carry needs not to propagate via A0, it will go from the MUX

  15. False Path: Cycles • Cycles of False Paths: Eg. 1’s complement number addition Positive: x Negative: (2n-1)-x • Addition (2n-1)-x + (2n-1)-y = 2n+(2n-1)-(x+y)-1 A3,0 B3,0 Cout Cin Adder S3,0

  16. Example • 0+0=0 11111 0 + 11111 0 111110 111111 0 • -3-5 = -8 11100 -3 + 11010 -5 110110 110111 -8

  17. Multi-Operand Addition • Carry save adder: a (3,2) counter

  18. Example • A (3,2) counter compresses X rows to 2/3X rows each time • Tree structure in implementation

  19. Other Counters • (7,3) counter • (5,3) counter S1 Ca Cb S0 S2 S0 • Design of (5,3) counter using full adders Ca Cb S0

More Related