1 / 12

Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier

Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier. Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University (sunilkhatri@tamu.edu). Presented by David Pan, UT Austin. What is a Multiplier?. IC block that perform multiplication operation

lexiss
Download Presentation

Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Generation of Optimal Bit-Width Topology of Fast Hybrid Adder in a Parallel Multiplier Sabyasachi Das Synplicity Inc. Sunil P. Khatri Texas A&M University (sunilkhatri@tamu.edu) Presented by David Pan, UT Austin

  2. What is a Multiplier? • IC block that perform multiplication operation • Well-known logic architectures • Computationally-intensive • Wide usage in DSP, Graphics, Microprocessors

  3. Structure of Multiplier Inputs • Multiplier block consists of 3 parts (written in the order of data-flow) • Partial Product Generator (PPGen) • Partial Product Reduction Tree (PPRT) • Final Carry-Propagation Adder (CPA) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Final Carry Propagation Adder (CPA) Output

  4. Final Adder in a Multiplier • Frequently used adder architectures • Ripple-Carry • Area-efficient, but slow • Timing-efficient if inputs have skewed arrival time • Parallel-Prefix architecture (Brent-Kung, Kogge-Stone) • Faster architecture • Requires more area • Carry-Select • Large area overhead (often >100%) • Better delay if Cin signal arrives late.

  5. 3-stage Hybrid Adder • Multipliers exhibit a typical arrival time pattern (in the input of the CPA) • Hybrid adder produces best result for Multipliers • This outperforms all stand-alone architectures Stelling et al., “Design Strategies for optimal hybrid final adders in a parallel multiplier”, In The Journal of VLSI Signal Processing, 1996

  6. wrpl wbk wcs wrpl wbk wcs SubAdder1 (Ripple) SubAdder2 (Brent-Kung) SubAdder3 (Carry-Select) wrpl wbk wcs 3-Stage Hybrid Adder There are many possible configurations (w1, w2and w3). Exhaustive exploration is not feasible (huge runtime) How to identify the best configuration?

  7. Identification of Optimal Topology • Width of the Ripple adder • At every bit (i), compute T(Ci+1) and check if • T(Ci+1) ≤ T(ai+1) or • T(Ci+1) ≤ T(bi+1) • If check passes, wrpl = i+1 • Else continue checking until 3 consecutive bits fail the check (Hill Climbing) • Return the value i as the Ripple Adder width

  8. Delay of the Hybrid Adder wrpl wbk wcs wrpl wbk wcs SubAdder1 (Ripple) SubAdder2 (Brent-Kung) SubAdder3 (Carry-Select) wrpl wbk wcs Ts3 + Dmx Tco2 + Dmx Ts2 Thybrid =Max (Ts2, (Tco2+Dmx), (Ts3+Dmx))

  9. Identification of Optimal Topology • Width of the BK and Carry-Select Adders • Initial Configuration • wbk = 2p, where p= log2 (n – wrpl) • wcs = n – wbk – wrpl • Example: If n=32 and wrpl=7 then wbk=16 and wcs=9 • Iterative approach • Estimate delay of a configuration and explore in the appropriate direction (similar to Binary Search)

  10. Results • For different adder widths, our approach always found best configuration in very short runtime. • Runtime example: for a 32-bit Adder, • Trying all possible configurations (561) takes 16-23 hours of runtime • Our approach takes 4-18 minutes of runtime and always computes the best configuration.

  11. Results • Now, it is feasible to use this powerful hybrid-adder architecture during synthesis (~12% faster adder).

  12. Thank you

More Related