1 / 28

Exploration of Pipelined FPGA Interconnect Structures

ACME LAB. Exploration of Pipelined FPGA Interconnect Structures. Scott Hauck Akshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison. T 1. . . S. T 2. PipeRoute. FPGA’2003: Pipelining-aware Router for FPGAs

derrick
Download Presentation

Exploration of Pipelined FPGA Interconnect Structures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACME LAB Exploration of Pipelined FPGA Interconnect Structures Scott HauckAkshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison

  2. T1   S T2 PipeRoute • FPGA’2003: Pipelining-aware Router for FPGAs • Architecture-adaptive, based on Pathfinder • Uses optimal 2-terminal, 1-delay router • Greedy formulation for multi-delay, multi-terminal routing

  3. GPR RAM GPR RAM MULT GPR ALU GPR ALU GPR RAM GPR ALU RaPiD • Coarse-grained, 1D, 16-bit, w/DSP Units • Carl Ebeling @ UW-CSE • Pipelined interconnect via Bus Connectors (BCs)

  4. T T S S   Pipelined Routing Results • Area expansion due to pipelining • Normalized to unpipelined circuit area Ave: 75% cost

  5. S T   Contributions • Optimized PipeRoute • Support multiple delays per BC (greedy preprocessor) • Timing driven – Pathfinder’s, worst-case criticality across signal • RouteCost = Criticality * delay_cost + (1-criticality) * area_cost • Arch. Exploration of RaPiD Pipelined Interconnects • Registered logic block (input/output/none) • BC track length • Delays per register/BC • BC/non-BC routing mix • Register-only logic blocks • Goal: More efficient support of pipelined interconnects

  6. Benchmarks Retimed, not C-slowed Graphs Increase arch to fit (cells, tracks/cell) Variation around local minima Methodology

  7. + +    T1 +   S T2 Registers in Logic Blocks • Output Registers • No Registers • Input Registers 5%20%23%

  8. 1 Delay/BC 2 Delays/BC    Delays per Register/BC 15%20%30%

  9. BC Track Length • Length 16 BC wires • Length 8 BC wires 17%64%69%

  10. Routing Resource Mix (BC vs. non-BC) • 5/7 • 7/7 19%17%18%

  11. GPRs per Cell • GPR roles: • Registers from computation • Passthrough for changing tracks • 6 per cell • 9 per cell 6%23%22%

  12. RaPiD-I 1 BC / cell (13 LBs long) 5/7 BC tracks 3 registers / BC 6 GPRs / cell registered outputs Post-Explore 1 BC / cell (16 LBs long) 5/7 BC tracks 3 registers / BC 9 GPRs / cell registered inputs Overall – vs. RaPiD-I Ave: 1%18%19%

  13. T T S S   Overall – Pipelining Cost Ave: 18% cost

  14. Conclusions • Router for arbitrary pipelined architectures • Timing-driven • Supports multiple delays at each register site • Good quality: <18% of pseudo-lower bound (non-pipelined) area • Architecture Exploration of RaPiD • Parameters: • Registered inputs on functional units • Length 16 wires • 3 delays per BC/register • 2/7 non-registered, 5/7 registered wires • 9 GPRs/cell to improve flexibility • Delay: spacing of registers CRITICAL, too close better than too far • 19% area*delay improvement over RaPiD-I (primarily delay)

  15. *** End of Talk Marker ***

  16. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  17. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  18. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  19. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  20. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  21. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  22. 1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

  23. N-Delay Two Terminal • Greedy Approximation via 1-Delay Router     S T 

  24. N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route     S T 

  25. N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement     S T 

  26. N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement     S T 

  27. N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement     S T 

  28. N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement     S T 

More Related