1 / 31

Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization

Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization. Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto {yeandy, lewis, jayar}@eecg.utoronto.ca. Motivation: Datapath Regularity. Larger FPGAs

rayford
Download Presentation

Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Synthesizing Datapath Circuits for FPGAs With Emphasis on Area Minimization Andy Ye, David Lewis, Jonathan Rose Department of Electrical and Computer Engineering, University of Toronto {yeandy, lewis, jayar}@eecg.utoronto.ca

  2. Motivation: Datapath Regularity • Larger FPGAs • Larger applications on FPGAs • More datapath logic in larger applications • Datapath logic is highly regular • Utilize regularity to improve logic density

  3. Utilizing Datapath Regularity • A new datapath-oriented FPGA • New CAD tools supporting the new FPGA • Synthesis • Packing • Placement • Routing • This talk focuses on synthesis

  4. Background: Datapath-oriented FPGA • Architected to utilize datapath regularity • Architectural features • Capture regularity using special logic blocks • Increase logic density by coarse grain routing

  5. L L L L L Logic cluster S S Switch box Coarse grain routing tracks Fine grain routing tracks Background: FPGA Overview Routing Channels

  6. BLE BLE BLE BLE BLE BLE BLE BLE BLE MUX LUT BLE BLE BLE BLE Local Routing Network BLE DFF BLE BLE BLE BLE BLE M BLE A Basic Logic Element (BLE) A Subcluster Background: Logic Cluster Subcluster 4 Subcluster 3 Subcluster 2 Subcluster 1

  7. L L L L L Logic cluster S S Switch box Coarse grain routing tracks Fine grain routing tracks Background: FPGA Overview Routing Channels

  8. Logic Cluster Sub- cluster Sub- cluster Sub- cluster Sub- Cluster M Switch Box M M M M M Fine Grain Routing M Coarse Grain Routing Background: Coarse Grain Routing Tracks

  9. Datapath Synthesis • Synthesis • The first step in a fully automated CAD flow • Transforms high level descriptions into logic • Conventional synthesis (flat synthesis) • Minimizes area and delay metrics • Destroys datapath regularity • Datapath synthesis • Preserves datapath regularity • Supports downstream CAD tools

  10. Datapath Representation • Datapath circuits are represent by netlists of datapath components (VHDL or Verilog) • Datapath component library • Multiplexers • Adders/subtracters • Shifters • Comparators • Registers • Each component consists of identical bit-slices

  11. Hard Boundary Hierarchical Synthesis • Optimize within the boundaries of bit-slices • Keep identical bit-slices identical • Optimized 15 datapath circuits from Pico-java processor using Synopsys [sun] • Good regularity • Bad area - 38% area inflation • FPGA architecture – increase logic density • Need a better synthesis tool

  12. Causes of Area Inflation • Examined circuits to determine the causes • Constraint of preserving bit-slice boundaries • Common sub-expressions exist across bit-slices • Harder to discover in datapath synthesis • Constraint of preserving datapath regularity • Identical bit-slices have different external connections • Some bit-slices have more optimization opportunities • Missing optimization opportunities if one has to keeping all bit-slices identical

  13. Enhanced Module Compaction Netlist of Datapath Components Manual Operation Word-level Optimization Module Compaction Bit-slice Netlist I/O Optimization Flat Synthesis & Optimization Within Bit-slice Boundaries Netlist of Synthesized Bit-slices

  14. Word-level Optimization • Done manually and will be automated • Optimizes across bit-slice boundaries • Uses the functionality of each datapath component to create optimization opportunities • Two are performed • Multiplexer tree collapsing • Operation reordering • More in the future

  15. Multiplexer Tree Collapsing • Datapath circuits contain multiplexers in a tree topology • Collapses several multiplexers in a multiplexer tree into a single multiplexer • Collapsing operation creates common sub-expressions • Extracts common expressions out of multiple bit-slices to save area

  16. A A S1 S1 rl S2 S2 R FF FF rl – random logic An Example mux1 mux2

  17. Operation Reordering • Transforms result selection into operand selection • Accepts the transformation if resulting in smaller area

  18. a c b d b0 d0 s mux mux a0 c0 cin0a cin0b a0 b0 c0 d0 a b c d s s0 + + + e e cin0 sum carry sum carry mux cout0a cout0b sum carry cout0 s0 e0 e0 An Example

  19. Module Compaction • Merges bit-slices into larger bit-slices • Based on connectivity between datapath components • Larger bit-slices have more optimization opportunities for flat synthesis • Avoids merging based on carry chains • Similar to the algorithm proposed by Koch

  20. An Example FA0 FA1 FA2 FA3 FA4 mux0 mux1 mux2 mux3

  21. Bit-slice I/O Optimization • Granularity of bit-slice I/O optimization, m • Breaks datapath components into m-bit wide chunks • m bit-slices are kept identical to each other • Allows some bit-slices in a datapath component to be optimized more than others

  22. Bit-slice I/O Optimization • Converts bit-slice I/O signals into internal signals if all m bit-slices meet an optimization criteria • More optimization opportunities for flat synthesis • Four types of I/O optimizations • Constant absorption • Feedback absorption • Duplicated input absorption • Unused output absorption

  23. Experimental Results • Fifteen benchmark circuits • From the Pico-java processor • Synthesized into 4-LUTs and DFFs • Experiments • Area • Regularity • Area against m (the granularity of bit-slice I/O optimization)

  24. Area • m (granularity of bit-slice I/O optimization) = 4 • Compare datapath synthesis with flat synthesis

  25. Post-synthesis Area (LUT Count)

  26. Regularity • m (granularity of bit-slice I/O optimization) = 4 • Two terminal connections captured by • 4-bit wide buses • 4-bit wide control groups

  27. S4 S4 S4 S3 S3 S3 S2 S2 S2 S1 S1 S1 Regularity A 4-bit wide bus A 4-bit wide control group

  28. Regularity Results • 94% of LUTs remain in regular datapath components

  29. Granularity (m) Vs. Area • Higher m (the granularity of bit-slice I/O optimization) • Keeps more bit-slices identical • Preserves more regularity • Higher area cost

  30. Granularity Vs. Area Inflation

  31. Conclusion • Presented a datapath-oriented FPGA architecture • Presented an enhanced module compaction algorithm • Empirically demonstrated the area efficiency of the algorithm • 3%-8% area inflation • Good regularity • 48% two terminal connections are in 4-bit wide buses • 35% two terminal connections are in 4-bit wide control groups

More Related