1 / 36

Clustering of Large Designs for Channel-Width Constrained FPGAs

Clustering of Large Designs for Channel-Width Constrained FPGAs. Marvin Tom Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada. Overview. Introduction, Goals and Motivation

dugan
Download Presentation

Clustering of Large Designs for Channel-Width Constrained FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Clustering of Large Designs forChannel-Width Constrained FPGAs Marvin Tom Guy Lemieux University of British Columbia Department of Electrical and Computer Engineering Vancouver, BC, Canada

  2. Overview • Introduction, Goals and Motivation • Reduce channel width, lower cost, make circuits “routable” • Reducing Channel Width By Depopulation • Large Benchmark Circuits • New Clustering Technique • Selective Depopulation • Conclusions and Future Work

  3. L L L L L L L L L L L L L L L L L L L L L L L L L Mesh-Based FPGA Architecture • Channel width • Number of routing tracks per channel • Larger FPGA devices: more tiles • Channel width is fixed

  4. SIZE of Layout Tile Number of Layout Tiles Motivation: Area of FPGA Devices MCNC Circuits Mapped onto an FPGA Total Layout AREA = SIZE * Number

  5. Interconnect Range User has no choice! Logic Range User buys bigger device. Motivation: Channel Width Demand MCNC Circuits Mapped onto an FPGA Devices built for worst-casechannel width (fixed width) Interconnect cost dominates (>70%)

  6. Altera Cyclone • Channel width constraint • of 80 routing tracks • Constrained FPGA • Channel width constraint of 60 routing tracks • Smaller area, lower cost for low-channel-width circuits Goal: Reduce Channel Width But { apex4, elliptic, frisc, ex1010, spla, pdc } are unroutable…. Can we make them routable in a Constrained FPGA?

  7. L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L L Possible Solution • Trade-off logic utilization for channel width • User can always buy more logic…. (not more wires) Trade-off: CLB count for Channel width FPGA 1 FPGA 2 But….. can we achieve lower Total Area? ( = SIZE * CLB Count)

  8. L L L L L L L L L L L L L L L L Logic Element: BLE and CLB BLE #1 • Basic Logic Element (BLE) • ‘k’-input LUT + FF • Clustered Logic Block (CLB) • ‘N’ BLEs, ‘N’ outputs • ‘I’ shared inputs BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 Note: I < k*N BLE #5 CLB

  9. CLB Depopulation BLE #1 • Normally: CLBs fully packed • Reduces total # of CLBs needed for circuit • CLB Depopulation: Tessier, DeHon • Do not use all BLEs  • Increase # CLBs used  • Decrease channel width  • Decreaseoverall area • Problem • Increase in # CLBs high for large circuits • Our work: limits # CLB increase BLE #2 BLE #3 ‘N’ Outputs ‘I’ Inputs BLE #4 BLE #5 CLB

  10. Uniform Depopulation • Previous work • Depopulate each CLB by equal amount • But… circuit observations • regions of high routing demand • regions of low routing demand • Depopulate in low congestion areas ?? • Unnecessary increase in area

  11. Non-Uniform Depopulation • Our depopulation method: • Assume congestion is localized • Depopulate only congested areas • We show non-uniform de-population • Effective method of channel width reduction • Graceful tradeoff between channel width and area • Makes unroutable circuits routable

  12. Depopulation MethodstoReduce Channel Width

  13. CLB Depopulation BLE #1 • General Approach • Use existing clustering tools • Do not fill CLB while clustering • Input-Limited • Eg. Maximum 67% inpututilization per CLB • Might use allBLEs • BLE-Limited • Eg. Maximum 60% BLE utilization per CLB • Might use allInputs BLE #2 ‘N’ Outputs BLE #3 ‘I’ Inputs BLE #4 BLE #5 CLB

  14. Reducing Channel Width Results(max cluster size 16) • Input-Limited • No channel width control • BLE-Limited • (almost) monotonically increasing  good channel width control

  15. Benchmark Circuit Creation (We want BIG circuits!) (What do REALLY BIG circuits look like?)

  16. Benchmarking Circuits: Some Observations • Altera has bigger benchmarks than academics • We noted similar characteristics: • Some LARGE circuits routable with NARROW routing channels • Some SMALL circuits need WIDE routing channels • What if each circuit is IP Block in larger system… ??

  17. Benchmark Creation – IP Blocks • Mimic process of creating large designs • “IP Blocks” <==> MCNC Circuits • SoC <==> Randomly integrate/stitch together “IP Blocks” • IP Blocks have varied interconnect needs • Real-life large designs: System-on-Chip Methodology • IP blocks (own, 3rd party) • Re-use improves productivity • Primarily integration and verification effort

  18. Benchmark Creation – Large Designs • Considered 3 stitching schemes… • Independent • IP Blocks are not connected to each other • Pipeline • Outputs of one IP block connected to inputs of next IP block • Clique • Outputs of each IP block are uniformly distributed to inputs of all other IP blocks

  19. MetaCircuit:Reducing Routed Channel Width? • Observations • IP blocks are tightly-connected internally • IP blocks have varied channel width needs • Hypotheses • Placement keeps each “IP block” together • IP blocks has large routed channel width  MetaCircuit has large routed channel width

  20. Hypothesis Testing:MetaCircuit P&R Results • Use VPR FPGA tools from University of Toronto • Hypothesis 1 • VPR placer successfully groups IP blocks from random initial placement • Hypothesis 2 • VPR router confirms channel width of MetaCircuit is dominated by a few IP blocks{ pdc, clma, ex1010 }

  21. Consequences of Hypothesis 2 • Question • Shrink channel width of few IP blocks ?? shrink channel width of MetaCircuit? • How to shrink channel widths? • Selective CLB Depopulation !! • Depopulate hard-to-route IP blocks the most • How much to depopulate? • Channel width profiling of IP block…

  22. Meeting Channel Width Constraints:Selective Depopulation • Step 1: Channel Width Profiling of IP Blocks (Congestion Estimation) • Step 2: Re-cluster Only Congested IP Blocks (Selective Depopulation)

  23. IP Block Properties • Cluster IP Blocks into N=16, k=6 • VPR: determine minimum channel width for each IP Block • Sort IP Blocks based on channel width Hard-to-Route Circuits Easy-to-Route Circuits

  24. Channel Width Profiling of IP Block • Cluster sizes • NA = FPGA Architecture Cluster Size (fixed) • NC = BLE-Limit Size (variable) • Sweep NC for each IP block

  25. Analysis with Constraint • Given channel-width constraint of 60 tracks • tseng routable (easy) • clma routable for NC <= 10 • clma not routable for NC > 10

  26. Our Technique: Selective Depopulation • Step 1: Channel Width Profiling of IP Blocks (Congestion Estimation) • Step 2: Re-cluster Only Congested IP Blocks (Selective Depopulation)

  27. Uniform Depopulation • Minimum NC Cluster Size • De-populate all clusters equally • Eg, use NC=10 for both IP Blocks

  28. Non-Uniform Depopulation • Maximal NC Cluster Size • Depopulate each IP block according to maximal cluster size • Eg, clma NC=10, tseng NC=16

  29. Uniform vs. Non-Uniform Total CLBs Needed LUT Utilization Uniform Non-Uniform Uniform Non-Uniform • Non-Uniform depopulation better than Uniform • Lower CLB count • Higher LUT utilization x 1,000 Channel Width Constraint Channel Width Constraint

  30. MetaCircuit Clustering Results • Depopulate the most-congested IP blocks • (BLE-Limit) of each IP block shown(max=16) • Some IP blocks are depopulated more than others

  31. MetaCircuit P&R Results Constraint Routed Channel Width Normalized Area • Clique MetaCircuit • P&R channel width results closely match “constraints” 1 Channel Width Constraint Channel Width Constraint • Shrink Channel Width by ~20% (from 95 to 75), NO AREA INCREASE by ~50% (from 95 to 50), 1.7x area increase

  32. Other MetaCircuit Results * These latest results are better than those given in paper

  33. Critical Path Delay and Average Wirelength • Expect critical path delay to increase under tighter constraints • Delay “noise” due to instability of floorplan locations • Average wirelength / net increases under tighter constraints

  34. Conclusion • System-level technique to map large System-on-Chip (SoC) designs to channel-width constrained FPGAs using fewer routing resources • Depopulating CLBs effective at reducing channel width • Non-uniform depopulation important to limit area inflation • Channel width reduced • by 0-20% with < 5% area increase • by up to 50% with 3.3 X area increase • Effective solution to trade-off CLBs for Interconnect !!! • UNROUTABLE circuits (channel width TOO LARGE)can be made ROUTABLE (reduced channel width)by buying an FPGA with MORE LOGIC!!!

  35. End of Talk

  36. Future Work • Real-Life SoC Benchmark • Licensed IP: Bluetooth baseband processor • 325,000 ASIC gates • Numerous IP blocks of varying complexity • Needed to authenticate “Synthetic” results • Automated technique to find “hard” IP blocks • Granularity is based on design hierarchy (?) • Replaces time-consuming Step 1 of process

More Related