1 / 24

The Optimization of Interconnection Networks in FPGAs

The Optimization of Interconnection Networks in FPGAs. Dr. Yajun Ha Assistant Professor Department of Electrical & Computer Engineering National University of Singapore. Outline. Background and Motivation Time-multiplexed interconnects in FPGAs sFPGA2 architecture Conclusion.

eden
Download Presentation

The Optimization of Interconnection Networks in FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Optimization of Interconnection Networks in FPGAs Dr. Yajun Ha Assistant Professor Department of Electrical & Computer Engineering National University of Singapore Dagstuhl Seminar

  2. Outline • Background and Motivation • Time-multiplexed interconnects in FPGAs • sFPGA2 architecture • Conclusion Dagstuhl Seminar

  3. FPGA Research Challenges • Research challenges for FPGA architectures and tool are closely linked. • The source for FPGA challenges are coming from the underlying semiconductor technologies. • Scaling semiconductor technologies bring the following new challenges: • Leakage Power • Process variations • Substantially more transistors Technology Architecture Dual Vt or Vdd or subthreshold architectures Tools Reconfigurability for variability, fault tolerant Scalable, multi-core, secure architectures and SLD Dagstuhl Seminar

  4. Motivation • Logic and Interconnect are un-balanced in FPGAs. • Qualitatively: • “PLDs are 90% routing and 10% logic.” • Prof. Jonathan Rose, Design of Interconnection Networks for Programmable Logic, Kluwer Academic Publishers, 2004, Page xix; • “…(in FPGAs) programmable interconnect comes at a substantial cost in performance in area, performance and power.” • Prof. Jan Rabaey, Digital Integrated Circuits, 2nd Edition, Prentice-Hall, 2003, Page 413; • Quantitatively: • Area: Logic area v.s. Routing area; • Delay: Logic delay v.s. Net delay; • Power: Dynamic power consumption by Logic v.s. by Interconnect. Dagstuhl Seminar

  5. Unbalance: Area Relative weight of routing area and logic area of the 20 largest MCNC benchmark circuits, assuming PTM 90nm CMOS process. Data produced by VPR v5.0.2. Dagstuhl Seminar

  6. Unbalance: Delay Delay breakdown along the critical path for the 20 largest MCNC benchmarks, assuming PTM 90nm CMOS process. Data produced by VPR v5.0.2. Dagstuhl Seminar

  7. Unbalance: Power Note: Double: The length-2 wires; Hex: The length-6 wires; Long: The long wires spanning the whole chip; IXbar & OXbar: Crossbar at the input & output pins of the logic blocks. Dynamic power breakdown for a real circuit [1], assuming the Xilinx Virtex-II FPGAs [1] L. Shang, A. Kaviani and K. Bathala, “Dynamic power consumption in Virtex-II FPGA family,” ACM FPGA, 2002. Dagstuhl Seminar

  8. Outline • Background and Motivation • Time-multiplexed interconnects in FPGAs • sFPGA2 architecture • Conclusion Dagstuhl Seminar

  9. Intra-Clock Cycle Idleness • Clock cycle is constrained by the critical path delay. Many wires are idle for a significant amount of time in a clock cycle. • An example: • clma: the largest circuit (~8400 4-input LUTs) in MCNC benchmark; • Use VPR v5.0.2 to implement to an island-style FPGA (10 4-inputs LUT in each CLB and 100% length-4 wires ), assuming the PTM 90nm CMOS process; • Timing results after P&R: • Critical path delay = 9.50ns; • Delay of most nets (~96.5%) are less than 1ns; • Expensive wires are often less utilized. Dagstuhl Seminar

  10. Time-Multiplexing Net N1 CLB CLB Net N2 CLB CLB routing wire Switches with multiple contexts Conventional switch Two nets use two wires Two nets share one wire • Use switches with multiple contexts to achieve time-multiplexing of wires. Keep wires busy; • Can potentially save wire area and achieve better timing performance. Dagstuhl Seminar

  11. Preliminary Results • Bring time-multiplexing enhancements to existing CAD tools; • Preliminary studies show positive results: • For 16 MCNC benchmark circuits, ~11.5% saving in minimum required number of wires, (but) ~1.5% timing overhead; • For 16 MCNC benchmark circuits, ~8.2% reduction in critical path delay, using the same number of wires; • See [1] [2] for details. [1] H. Liu et al, “An Area-Efficient Timing-Driven Routing Algorithm for Scalable FPGAs with Time-Multiplexed Interconnects,” FCCM 2008. [2] H. Liu et al, “An Architecture and Timing-Driven Routing Algorithm for Area-Efficient FPGAs with Time-Multiplexed Interconnects,” FPL 2008. Dagstuhl Seminar

  12. TM FPGA Challenges and Ongoing Work • The TM rate cannot be too high to have a reasonable TM clock rate. We are targeting at 2-4 at the moment. • The nets that are qualified for TM are limited since most nets having delays finished in the first micro-cycle. • Dual Vt architectures are proposed to adjust the delay to achieve low power and higher TM opportunities. Dagstuhl Seminar

  13. Outline • Background and Motivation • Time-multiplexed interconnects in FPGAs • sFPGA2 architecture • Conclusion Dagstuhl Seminar

  14. Motivation • In current FPGAs, switching requirement grows superlinearly with number of logic resources. In other words, current architecture scales poorly. • To address this, we need to organize FPGA interconnecting wires hierarchically to achieve scalability [3] Rizwan Syed et al, “sFPGA2 - A Scalable GALS FPGA Architecture and Design Methodology,” FPL 2009. Dagstuhl Seminar

  15. How Multiple FPGAs Are Connected? MGT based Serial Switch Interconnect PCI Express Serial and switched based interconnects are the future of peripheral interconnect! Dagstuhl Seminar

  16. sFPGA2 Is an On-Chip Version • sFPGA2 is a scalable FPGA architecture using hierarchical routing network employing high speed serial links and switches to route multiple nets simultaneously [3]. • Consists of two levels: • Base Level (eg.: A0…A7, S0) • Higher Levels (eg.: X0) Architecture Block Diagram [3] Rizwan Syed et al, “sFPGA2 - A Scalable GALS FPGA Architecture and Design Methodology,” FPL 2009. Dagstuhl Seminar

  17. sFPGA2 Architecture (Contd) Courtesy of Xilinx (Virtex II Pro) • A0…A7 are FPGA tiles (similar to current FPGAs). S0 contains very high speed transceivers capable of aggregating multiple high speed serial links into a very high link. Dagstuhl Seminar

  18. sFPGA2 (Contd) • Routing is done using either of the two methodology shown in figure. • Intra cluster routing uses only the switch blocks and channels in that level. • While inter cluster routing uses very high speed links and switches. Dagstuhl Seminar

  19. Design Methodology v0 NOP v1 * * * * + v2 v6 v8 v10 + < v7 * * v3 v11 v9 An inter tile net - v4 - v5 NOP vn The new step to deal with inter-tile nets! Dagstuhl Seminar

  20. Preliminary Results • Successfully implemented a JPEG engine and demonstrated it to transport groups of nets on an emulation platform built on 3 Xilinx Virtex 2 Pro FPGA boards. Serial communication was emulated by MGTs. • Preliminary studies show that latency in transport is very high mainly due to high latency transceivers thus limiting application domain to GALS designs only. However, with the advancement in transceivers, this can be extended to pure synchronous designs as well. Dagstuhl Seminar

  21. Conclusion • Logic / Interconnect unbalance in FPGAs makes the optimization of interconnection network important. • Significant intra-clock cycle idleness exists in FPGA routing wires. • Time-multiplexing increases resource utilization, and can potentially save area and achieve better timing. • Current FPGA interconnection network is not scalable. • On-chip network, consisting of switches and serial links, can improve scalability. • Promising preliminary results justify our approaches. Future work needs to thoroughly investigate the impact of architecture changes. Dagstuhl Seminar

  22. Multi-FPGA or Multi-Core? FPGA Tile uP Tile FPGA Tile uP Tile FPGA Tile uP Tile NoC NoC FPGA Tile uP Tile FPGA Tile uP Tile FPGA Tile uP Tile • Building Multi-FPGA or Multi-Core will not be difficult with the development of semiconductor technology. • We (hardware engineers) know programming multi-FPGA more than programming multi-core processors. • Should we use VHDL/Verilog as the (intermediate) programming language for both Multi-FPGA or Multi-Core? Dagstuhl Seminar

  23. Thank you ! Dagstuhl Seminar

  24. See also • VPR v5.0.2 – Versatile Placement & Routing tool for heterogeneous FPGAs: http://www.eecg.utoronto.ca/vpr/ • Predictive Technology Model (PTM): http://www.eas.asu.edu/~ptm Dagstuhl Seminar

More Related