280 likes | 405 Views
This research presents PipeRoute, a novel routing architecture designed for FPGAs that is aware of pipelining. It leverages a pathfinding approach that optimizes routing by factoring in multiple delays and criticalities, thus improving interconnect efficiency. The study reveals significant area savings, with an average cost reduction of 75% compared to non-pipelined structures. Additionally, the interconnect supports various configurations of registered logic blocks and bus connectors, enhancing flexibility in FPGA designs. The findings contribute to architecture exploration for efficient pipelined interconnect designs.
E N D
ACME LAB Exploration of Pipelined FPGA Interconnect Structures Scott HauckAkshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison
T1 S T2 PipeRoute • FPGA’2003: Pipelining-aware Router for FPGAs • Architecture-adaptive, based on Pathfinder • Uses optimal 2-terminal, 1-delay router • Greedy formulation for multi-delay, multi-terminal routing
GPR RAM GPR RAM MULT GPR ALU GPR ALU GPR RAM GPR ALU RaPiD • Coarse-grained, 1D, 16-bit, w/DSP Units • Carl Ebeling @ UW-CSE • Pipelined interconnect via Bus Connectors (BCs)
T T S S Pipelined Routing Results • Area expansion due to pipelining • Normalized to unpipelined circuit area Ave: 75% cost
S T Contributions • Optimized PipeRoute • Support multiple delays per BC (greedy preprocessor) • Timing driven – Pathfinder’s, worst-case criticality across signal • RouteCost = Criticality * delay_cost + (1-criticality) * area_cost • Arch. Exploration of RaPiD Pipelined Interconnects • Registered logic block (input/output/none) • BC track length • Delays per register/BC • BC/non-BC routing mix • Register-only logic blocks • Goal: More efficient support of pipelined interconnects
Benchmarks Retimed, not C-slowed Graphs Increase arch to fit (cells, tracks/cell) Variation around local minima Methodology
+ + T1 + S T2 Registers in Logic Blocks • Output Registers • No Registers • Input Registers 5%20%23%
1 Delay/BC 2 Delays/BC Delays per Register/BC 15%20%30%
BC Track Length • Length 16 BC wires • Length 8 BC wires 17%64%69%
Routing Resource Mix (BC vs. non-BC) • 5/7 • 7/7 19%17%18%
GPRs per Cell • GPR roles: • Registers from computation • Passthrough for changing tracks • 6 per cell • 9 per cell 6%23%22%
RaPiD-I 1 BC / cell (13 LBs long) 5/7 BC tracks 3 registers / BC 6 GPRs / cell registered outputs Post-Explore 1 BC / cell (16 LBs long) 5/7 BC tracks 3 registers / BC 9 GPRs / cell registered inputs Overall – vs. RaPiD-I Ave: 1%18%19%
T T S S Overall – Pipelining Cost Ave: 18% cost
Conclusions • Router for arbitrary pipelined architectures • Timing-driven • Supports multiple delays at each register site • Good quality: <18% of pseudo-lower bound (non-pipelined) area • Architecture Exploration of RaPiD • Parameters: • Registered inputs on functional units • Length 16 wires • 3 delays per BC/register • 2/7 non-registered, 5/7 registered wires • 9 GPRs/cell to improve flexibility • Delay: spacing of registers CRITICAL, too close better than too far • 19% area*delay improvement over RaPiD-I (primarily delay)
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T
N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement S T