1 / 28

An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Cir

An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits. Rong Lin Ronald B. Alonzo SUNY at Geneseo University of Rochester. ISCA-WCED, San Diego, CA, June 2003.

sarah
Download Presentation

An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Cir

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Extra-Regular, Compact, Low-Power Multiplier Design Using Triple-Expansion Schemes and Borrow Parallel Counter Circuits Rong Lin Ronald B. Alonzo SUNY at Geneseo University of Rochester ISCA-WCED, San Diego, CA, June 2003

  2. The Focus of The Presentation:A Complexity-Reduced Multiplier Design Approach With superiority in layout compactness, small area, low-power, high-performance, with potential for self testability. Contents • Background • Overview of the building block circuits • Overview of the intermediate block circuits • Overview of the triple expanded multiplier architecture • Experimental work • Concluding remarks

  3. 1. Background

  4. Traditional Approach Stage 1: Generation of the large partial product bit matrix -----Usually with Booth recoding Stage 2: Reduction of the partial product matrix into two numbers ----- Usually with binary CSA adders: (3,2) (4, 2) based Stage 3: Final addition (by a standard fast adder) Recently proposed designs: Rectangular-styled Wallacetree [Ref. 2] (Itoh, et al. 2001) two groups of partial Product bits Limited switch dynamic logic [Ref.1] (Montoye, et al. 2003) merging precharged Dynamic logic into Input of every latch

  5. Our Approach Stage 1: Generation of many (81 for 54x54-b) small partial product bit matrices in parallel -----Non-Booth Stage 2: Reduction of the partial product matrices into two numbers ----- with non-binary 4-b 1-hot encoded counters (called borrow parallel counters ), which are larger than (3,2) (4, 2) binary counters Stage 3: Final addition (by a standard fast adder) Complexity is reduced significantly: simple CMOS technology Smaller area minimal custom design repeatable and modular self-testable low-power

  6. 2. The Circuits Of Building Blocks

  7. The building block circuits: borrow parallel counters The 5_1 borrow parallel counter

  8. About the large parallel counter 5_1 Receiving 5 binary Input bits with 1 of them being weighted 2 (called borrow bit), and others weighted 1. Producing 2 output bits and 3 In-stage carry in and out bits), so that the weighted sums of all in bits and all out bits are equal. CMOS pass-transistor circuit processing 4-b 1-hot encoded signals, each representing an integer of value ranging 0 to 3.

  9. (1) Low switching activity(2) Fewer hot lines (data paths)(3) Low transistor count (78; equivalent to 3.3 FA’s)

  10. (4) A very compact layout due to good transistor distribution and 4 identical paths processed in parallel (binary logic does not have the advantages)

  11. The borrow bit (in red) Simplify the logic, reduce the number of transistors (2) Reduce the number of pass transistors cascaded (no more than 4 including 1 within the input inverter) (3) Rearrange and balance input bits for small multipliers

  12. The embedded full adder adding two 4-b 1-hot encoded bits (s0 at column j+2, s1 column at j+1) and 1 binary bit (q at column j) directly ------ they have the same weight! No type-conversion needed

  13. The embedded full adder adding two 4-b 1-hot encoded bits (s0 at column j+2, s1 column at j+1) and 1 binary bit (q at column j) directly ------ they have the same weight! No type-conversion needed

  14. 3. The Circuits Of Intermediate Blocks

  15. The 6 x 6-b borrow parallel multiplier ovals with the same color form an embedded FA (or HA or a binary bit) (3,2): 3 ovals (2,2): 2 ovals single bit: 1 oval Input: two 6-b numbers; output two numbers: p10 - p0 and q10 - q5 CSA style output, because it serves as an intermediate block) • An array of borrow parallel counters (virtually eliminating all area needed for inter-counter connections) • The height of the block is very small (important for triple expansion) Inheriting all advantages of borrow parallel counters Delay = a single counter delay Height = a single counter height • Extra compact virtually no inter-counterconnection

  16. Comparison of inter-block connections of 6 x6 multipliers Borrow parallel approach Traditional approach 30% area reduction!

  17. 4. The Triple Expanded Multipliers

  18. The partial product bit matrix trisect-decomposition and first-level multiplier triple expansion Triple 6 x 6-b => 18 x 18-b multiplier

  19. Second-level multiplier triple expansion Triple 18 x 18-b => 54 x 54-b multiplier 54 x 54-b

  20. The typical simulation data

  21. The summary of multipliers 0.70

  22. 5. The Experimental Work: Layout And Tests

  23. The 5_1 borrow parallel counter (with output buffers):

  24. The 6 x 6 multiplier - wiring at this level very simple - Manhattan cell structure

  25. The 4X4 multiplier with counters (4,2), (3,2), and (2,2) - wiring very irregular

  26. 6. Concluding Remarks

  27. Concluding Remarks Complexity-reduced multiplier design with new arithmetic circuits and schemes achieving low-power high-performance through a novel logic approach which includes: (1) 4-b 1-hot data paths are dominated (lower switching activity in each logic stage) (2)Fewer hot lines generated in logic process (power & leakage power) (3) Lower transistor count (4) Higher circuit regularity, lower layout complexity (5) Lower complexity of component interconnection

  28. Concluding Remarks (cont’d) (6) Utilizing borrow bits for simple circuit and high speed, more importantly, reducing pass-transistor path length (no more than 4) and rearranging and balancing input bits to each column of small multipliers. (7) Utilizing partial product bit matrix decomposition for component repetition and full self-testability, achieving high observability and controllability for component circuits (small multipliers are exhaustively testable)

More Related