1 / 27

Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays. Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Contributions. Two new FPGA benchmark circuit “suites”

etoile
Download Presentation

Marvin Tom University of British Columbia Department of Electrical and Computer Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Channel Width Reduction Techniques for System-on-Chip Circuits in Field-Programmable Gate Arrays Marvin Tom University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

  2. Contributions • Two new FPGA benchmark circuit “suites” • Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs • Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand • Two new FPGA CAD flows • DHPack: Design Hierarchy Packing • Identify congested IP blocks  depopulate  reduced interconnect demand • Conference paper: “Logic Block Clustering…”, published at DAC 2005 • Un/DoPack: UnPack and DoPack • Find “local” interconnect congestion  depopulate  reduced interconnect demand • Conference paper, submitted to DAC 2006 • Discoveries… • “Non-uniform” depopulation limits area inflation • “BLE limiting” gives better interconnect controllability than “Input limiting” • “Interconnect variation” important for area inflation and FPGA architecture design • “Routing closure” achieved by re-clustering and incremental place & route • UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!

  3. L L L L L L L L L L L L L L L L L L L L L L L L L Mesh-Based FPGA Architecture • 16 logic blocks • 4 wires per channel • 4*4=16 total horizontal tracks • 9 logic blocks • 4 wires per channel • 3*4=12 total horizontal tracks • Larger FPGAs have more “aggregate” interconnect

  4. L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L Logic Utilization vs. Channel Width • Trade-off logic utilization for channel width • User can always buy more logic…. (not more wires) Trade-off: CLB count for Channel width FPGA 1 FPGA 2 But….. can we achieve lower Total Area? ( = SIZE * CLB Count) ( No! but we can break even! )

  5. L L L L L L L L L L L L L L L L Logic Element: BLE and CLB BLE #1 • Basic Logic Element (BLE) • ‘k’-input LUT + FF • Configurable Logic Block (CLB) • ‘N’ BLEs, ‘N’ outputs • ‘I’ shared inputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 Note: I < k*N BLE #5 CLB

  6. CLB Depopulation BLE #1 • General Approach • Use existing clustering tools • Do not fill CLB while clustering • Input-Limited • Eg. Maximum 67% inpututilization per CLB • Might use allBLEs • BLE-Limited • Eg. Maximum 60% BLE utilization per CLB • Might use allInputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 BLE #5 CLB

  7. Reducing Channel Width Results(max cluster size 16, max num inputs 51) • Input-Limited • No channel width control • BLE-Limited • (almost) monotonically increasing  good channel width control

  8. Meta Benchmark Circuit Creation • Mimic process of creating large designs • “IP Blocks” <==> MCNC Circuits • SoC <==> Randomly integrate/stitch together “IP Blocks” • IP Blocks have varied interconnect needs • Considered 3 stitching schemes… • Independent • IP Blocks are not connected to each other • Pipeline • Outputs of one IP block connected to inputs of next IP block • Clique • Outputs of each IP block are uniformly distributed to inputs of all other IP blocks

  9. DHPack: Meta Circuit P&R • Use VPR FPGA tools from University of Toronto • Observation 1 • VPR placer successfully groups IP blocks from random initial placement • Observation 2 • VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks{ pdc, clma, ex1010 }

  10. DHPack: Meta Circuit P&R Results Constraint Routed Channel Width Normalized Area • Clique MetaCircuit • P&R channel width results closely match “constraints” 1 Channel Width Constraint Channel Width Constraint • Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase

  11. Meta Circuits vs. Stdev Circuits • Meta Circuit Drawbacks • Design hierarchy boundaries not well-defined • Coarse-grained IP block boundary • Stitching unrealistic • Flip Flop placed at every output • Connections only have FO1 • Stdev Circuits (created using GNL) • Synthetic clone of Meta circuits • Hierarchical  specify Rent parameter of each partition • Root  # I/Os, # IP blocks • Second Level  20 IP blocks, # LEs, Rent parameter

  12. Stdev Circuits: Rent Parameters • 7 benchmark circuits • 240/120 primary inputs/outputs, approx 52,000 CLBs • Rent parameter: Average 0.62, vary Stdev 0.0 to 0.12

  13. Un/DoPack Flow • Iterative non-uniform cluster depopulation tool • Step 1: Traditional SIS/VPR • Step 2: UnPack: • Congestion Calculator • Step 3: DoPack: • Incremental Re-Cluster • Step 4,5: Fast Place/Route

  14. Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR

  15. Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR

  16. Un/DoPack Flow: SIS/VPR • Step 1: Traditional SIS/VPR

  17. Un/DoPack Flow: UnPack • Step 2: UnPack • Generate Congestion Map • CLB Label = Largest CW occ in 4 adjacent channels

  18. Un/DoPack Flow: UnPack • Step 2: UnPack: Depop Center = Largest CLB label M X M Array

  19. Un/DoPack Flow: UnPack • Step 2: UnPack: Depop Radius = M/4 Depop Amt: 1 new row/col in array M X M Array

  20. Un/DoPack Flow: DoPack • Step 3: DoPack: • Incremental Re-Cluster

  21. Un/DoPack Flow: Fast P&R • Step 4,5: Fast Place/Route • Fast Placement • UBC Incremental Placer(under development) • VPR “–fast” option • Router • Use full routed solution • Slow but reliable

  22. Before 120/79/27 Peak / Avg / Stddev After 100/79/20 Peak / Avg / Stddev

  23. Normalized Area of GNL Benchmarks

  24. Absolute Area of GNL Benchmarks

  25. Interconnect Variation: Impact on FPGA Architecture Design High VariationCircuits RequireWide Channel Width

  26. Contributions • Two new FPGA benchmark circuit “suites” • Meta Circuit: mimic “System-on-Chip” design by randomly “stitching” real designs • Stdev: synthetic clones of Meta Circuit, used to vary interconnect demand • Two new FPGA CAD flows • DHPack: Design Hierarchy Packing • Identify congested IP blocks  depopulate  reduced interconnect demand • Conference paper: “Logic Block Clustering…”, published at DAC 2005 • Un/DoPack: UnPack and DoPack • Find “local” interconnect congestion  depopulate  reduced interconnect demand • Conference paper, submitted to DAC 2006 • Discoveries… • “Non-uniform” depopulation limits area inflation • “BLE limiting” gives better interconnect controllability than “Input limiting” • “Interconnect variation” important for area inflation and FPGA architecture design • “Routing closure” achieved by re-clustering and incremental place & route • UNROUTABLE circuits made ROUTABLE buy an FPGA with MORE LOGIC!!!

  27. End of Talk

More Related