1 / 12

A Timing-Driven Hybrid-Compression Algorithm for Faster Sum-of-Products

A Timing-Driven Hybrid-Compression Algorithm for Faster Sum-of-Products. Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University. e. d. f. b. c. a. q = c * d. p = a * b. q. p. z = p + q + e + f. z. What is a Sum-of-Product?.

march
Download Presentation

A Timing-Driven Hybrid-Compression Algorithm for Faster Sum-of-Products

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Timing-Driven Hybrid-Compression Algorithm for Faster Sum-of-Products Sabyasachi Das Synplicity Inc Sunil P. Khatri Texas A&M University

  2. e d f b c a q = c * d p = a * b q p z = p + q + e + f z What is a Sum-of-Product? • IC block that performs addition of multiple product and sum terms • Computationally-intensive • Wide usage in DSP, Graphics, Microprocessors

  3. Multiplication {assign z = a * b} MAC {assign z = (a * b) + c} 2-operand Addition {assign z = a + b} Squarer {assign z = a * a} Adder-Tree {assign z = a + b + c + d} Generalized SOP {assign z = (a * b) + (c * d) + (e * f) + g + h + k} Examples of Sum-of-Product Blocks

  4. Structure of Sum-of-Products Inputs • Sum-of-Products block consists of 3 parts (written in the order of data-flow) • Partial Product Generator (PPGen) • Partial Product Reduction Tree (PPRT) • Final Carry-Propagation Adder (CPA) Partial Product Generator (PPGen) Partial Product Reduction Tree (PPRT) Final Carry Propagation Adder (CPA) Output

  5. Partial Product Reduction Tree • In Partial Product Reduction Tree, total number of elements in each bit gets reduced to upto two • Partial Product Reduction Tree (PPRT) consumes >50% delay of the SOP block • Hence the performance of PPRT is crucial to the performance of the SOP block

  6. ai bi Ci+1 Si ai bi ci Ci+1 Si Two Reduction Counters in PPRT • (2:2) Counter • Reduces 2 inputs (ai and bi) to 2 outputs (Si and Ci+1) • (3:2) Counter • Reduces 3 inputs (ai, bi and ci) to 2 outputs (Si and Ci+1)

  7. ai bi ci di Ci+2Ci+1 Si 4:3 Reduction Counters • (4:3) Counter • 4 inputs to 3 outputs • The functionality of the Ci+2 is a 4-input AND gate. • Faster reduction at ith column • Produces element to (i+2)th column at an earlier time • Has larger area than other two counters Key idea is to use (4:3) counter as much as possible in conjunction with the (3:2) and (2:2) counters

  8. Explanation of our approach • Perform column-wise reduction (LSB to MSB) • For each column (or BitSlice/BitCluster) • Sort inputs based on arrival time • Is (2:2) reduction fast? If yes, instantiate that • Else is (3:2) reduction fast? If yes, instantiate that • Else instantiate (4:3) reduction • After each reduction, re-sort the signals and continue

  9. An example of our approach P07 P06 P05 P04 P03 P02 P01 P00 P17 P16 P15 P14 P13 P12 P11 P10 P27 P26 P25 P24 P23 P22 P21 P20 P37 P36 P35 P34 P33 P32 P31 P30 P47 P46 P45 P44 P43 P42 P41 P40 P57 P56 P55 P54 P53 P52 P51 P50 C02 C01 S00 C11 S10 C03 S01 C13 C12 S02

  10. Results On an average, our approach produces about 3.5% speed improvement with 4.3% area penalty

  11. Summary • A 4:3 reduction counter is designed • Reduces elements in the given column at a faster pace • Produces an element to the (i+2)th column at an earlier time • 4:3 reduction counter is used extensively (in conjunction with the existing 3:2 and 2:2 counters) • A timing-driven algorithm selects the correct type of counter that needs to be instantiated • On an average, 3.5% improvement in speed with 4.3% area penalty.

  12. Thank you

More Related