Exploration of Pipelined FPGA Interconnect Structures

ACME LAB Exploration of Pipelined FPGA Interconnect Structures Scott HauckAkshay Sharma, Carl Ebeling University of Washington Katherine Compton University of Wisconsin - Madison

T1   S T2 PipeRoute • FPGA’2003: Pipelining-aware Router for FPGAs • Architecture-adaptive, based on Pathfinder • Uses optimal 2-terminal, 1-delay router • Greedy formulation for multi-delay, multi-terminal routing

GPR RAM GPR RAM MULT GPR ALU GPR ALU GPR RAM GPR ALU RaPiD • Coarse-grained, 1D, 16-bit, w/DSP Units • Carl Ebeling @ UW-CSE • Pipelined interconnect via Bus Connectors (BCs)

T T S S   Pipelined Routing Results • Area expansion due to pipelining • Normalized to unpipelined circuit area Ave: 75% cost

S T   Contributions • Optimized PipeRoute • Support multiple delays per BC (greedy preprocessor) • Timing driven – Pathfinder’s, worst-case criticality across signal • RouteCost = Criticality * delay_cost + (1-criticality) * area_cost • Arch. Exploration of RaPiD Pipelined Interconnects • Registered logic block (input/output/none) • BC track length • Delays per register/BC • BC/non-BC routing mix • Register-only logic blocks • Goal: More efficient support of pipelined interconnects

Benchmarks Retimed, not C-slowed Graphs Increase arch to fit (cells, tracks/cell) Variation around local minima Methodology

+ +    T1 +   S T2 Registers in Logic Blocks • Output Registers • No Registers • Input Registers 5%20%23%

1 Delay/BC 2 Delays/BC    Delays per Register/BC 15%20%30%

BC Track Length • Length 16 BC wires • Length 8 BC wires 17%64%69%

Routing Resource Mix (BC vs. non-BC) • 5/7 • 7/7 19%17%18%

GPRs per Cell • GPR roles: • Registers from computation • Passthrough for changing tracks • 6 per cell • 9 per cell 6%23%22%

RaPiD-I 1 BC / cell (13 LBs long) 5/7 BC tracks 3 registers / BC 6 GPRs / cell registered outputs Post-Explore 1 BC / cell (16 LBs long) 5/7 BC tracks 3 registers / BC 9 GPRs / cell registered inputs Overall – vs. RaPiD-I Ave: 1%18%19%

T T S S   Overall – Pipelining Cost Ave: 18% cost

Conclusions • Router for arbitrary pipelined architectures • Timing-driven • Supports multiple delays at each register site • Good quality: <18% of pseudo-lower bound (non-pipelined) area • Architecture Exploration of RaPiD • Parameters: • Registered inputs on functional units • Length 16 wires • 3 delays per BC/register • 2/7 non-registered, 5/7 registered wires • 9 GPRs/cell to improve flexibility • Delay: spacing of registers CRITICAL, too close better than too far • 19% area*delay improvement over RaPiD-I (primarily delay)

*** End of Talk Marker ***

1-Delay Two Terminal • Can do optimal routing for 1-delay routes via BFS  S T 

N-Delay Two Terminal • Greedy Approximation via 1-Delay Router     S T 

N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route     S T 

N-Delay Two Terminal • Greedy Approximation via 1-Delay Router • Find 1-delay route • While not enough delay on route • Replace any 0-delay segment with cheapest 1-delay replacement     S T 

Exploration of Pipelined FPGA Interconnect Structures

Exploration of Pipelined FPGA Interconnect Structures

Presentation Transcript

Rapid Exploration of Pipelined Processors through Automatic Generation of Synthesizable RTL Models

Interconnect Testing in Cluster Based FPGA Architectures

Interconnect Complexity-Aware FPGA Placement Using Rent’s Rule

The Future of FPGA Interconnect

A BIST Scheme for FPGA Interconnect Delay Faults

Optimization of Power Reduction in FPGA Interconnect by Charge Recycling

Testing and Diagnosis of Interconnect Faults in Cluster-Based FPGA Architectures

Finding the Optimal Switch Box Topology for an FPGA Interconnect

Onchip Interconnect Exploration for Multicore Processors Utilizing FPGAs

Runtime Logic and Interconnect Fault Recovery on Diverse FPGA Architectures

Design Space Exploration for FPGA-based Multiprocessing Systems

Search Space Properties for Pipelined FPGA Applications

Reconfigurable Computing - FPGA structures

interconnect

A Low Power Approach to System Level Pipelined Interconnect Design

Using Module Compiler to build FPGA Structures

CS61C - Machine Structures Lecture 24 - Review Pipelined Execution

A BIST Scheme for FPGA Interconnect Delay Faults

Interconnect

Design of Next-Generation FPGA w ith Hierarchical Interconnect Architecture

Architecture Exploration of FPGA based Accelerators for Bioinformatics

interconnect