1 / 52

The Case for Embedded NoCs on FPGAs

The Case for Embedded NoCs on FPGAs. Mohamed ABDELFATTAH Vaughn BETZ. Outline. 1. Why NoCs on FPGAs?. 2. Embedded NoCs. 3. Area & Power Analysis. 4. Comparison Against P2P/Buses. 1. Why NoCs on FPGAs?. Motivation. Logic Blocks. Switch Blocks. Wires. Interconnect.

onan
Download Presentation

The Case for Embedded NoCs on FPGAs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Case for Embedded NoCson FPGAs Mohamed ABDELFATTAH Vaughn BETZ

  2. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Area & Power Analysis 4 Comparison Against P2P/Buses

  3. 1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks Wires Interconnect

  4. 1. Why NoCs on FPGAs? Motivation Logic Blocks Switch Blocks • Hard Blocks: • Memory • Multiplier • Processor Wires

  5. 1. Why NoCs on FPGAs? Motivation 1600 MHz Hard Interfaces DDR/PCIe .. Logic Blocks 800 MHz Switch Blocks Interconnect still the same • Hard Blocks: • Memory • Multiplier • Processor Wires 200 MHz

  6. 1. Why NoCs on FPGAs? Motivation 1600 MHz Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure DDR3 PHY and Controller PCIe Controller 800 MHz 200 MHz Gigabit Ethernet

  7. 1. Why NoCs on FPGAs? Motivation Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated DDR3 PHY and Controller PCIe Controller Gigabit Ethernet

  8. Source: Google Earth Los Angeles Barcelona Keep the “roads”, but add “freeways”. Logic Cluster Hard Blocks

  9. 1. Why NoCs on FPGAs? FPGA with NoC NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated DDR3 PHY and Controller Router forwards data packet PCIe Controller Links Router moves data to local interconnect Routers Gigabit Ethernet

  10. 1. Why NoCs on FPGAs? FPGA with NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated • Abstraction favours modularity: • Parallel compilation • Partial reconfiguration • Multi-chip interconnect DDR3 PHY and Controller PCIe Controller • High bandwidth endpoints known • Pre-design NoC to requirements Gigabit Ethernet • NoC links are “re-usable” • NoC is heavily “pipelined” • NoC abstraction favors modularity

  11. 1. Why NoCs on FPGAs? FPGA with NoC Problems: • Bandwidth requirements for hard logic/interfaces • Timing closure • High interconnect utilization: • Huge CAD Problem • Slow compilation • Power/area utilization • Wire speed not scaling: • Delay is interconnect-dominated • Abstraction favours modularity: • Parallel compilation • Partial reconfiguration • Multi-chip interconnect DDR3 PHY and Controller NoCs can simplify FPGA design PCIe Controller How to integrate NoCs in FPGAs? Does the NoC abstraction come at a high area/power cost? How do embedded NoCs compare to current interconnects? Gigabit Ethernet • Latency-tolerant communication • NoC abstraction favors modularity

  12. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs Mixed NoCs Hard NoCs 3 Area & Power Analysis 4 Comparison Against P2P/Buses

  13. 2. Embedded NoCs Embedded NoCs = + “Soft” NoC Soft Routers Soft Links = + “Mixed” NoC Hard Routers Soft Links = + “Hard” NoC Hard Routers Hard Links

  14. Methodology Soft Mixed Hard FPGA CAD Tools ASIC CAD Tools Area Speed Design Compiler Power? Power HSPICE Gate-level simulation Gate-level simulation Toggle rates

  15. 2. Embedded NoCs Mixed NoCs Logic blocks FPGA Programmable “soft” interconnect Router Baseline Router = + “Mixed” NoC Hard Routers Soft Links

  16. 2. Embedded NoCs Mixed NoCs FPGA Router = + “Mixed” NoC Hard Routers Soft Links 16

  17. 2. Embedded NoCs Mixed NoCs FPGA Router Special Feature Configurable topology Assumed a mesh  Can form any topology

  18. 2. Embedded NoCs Hard NoCs Logic blocks FPGA Programmable “soft” interconnect Dedicated “hard” interconnect Router = + “Hard” NoC Hard Routers Hard Links 18

  19. 2. Embedded NoCs Hard NoCs FPGA Router = + “Hard” NoC Hard Routers Hard Links 19

  20. 2. Embedded NoCs Hard NoCs 1.1 V 0.9 V FPGA Router Special Feature Low-V mode Save 33% Dynamic Power ~15% slower = + “Hard” NoC Hard Routers Hard Links 20

  21. 2. Embedded NoCs Fabric Port Bridge NoC and FPGA fabric: • Width adaptation • Frequency adaptation • Voltage adaptation • Bus protocol e.g. AXI 21

  22. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Area & Power Analysis System Area/Power Soft vs. mixed vs.Hard 4 Comparison Against P2P/Buses

  23. 3. Area/Power Analysis Router Microarchitecture • State-of-the-art router architecture from Stanford: • NoC community have excelled at building on-chip routers: We just use it • To meet FPGA bandwidth requirements: High-performance router • Complex functionality such as virtual channels: Assigning traffic priority could be useful

  24. 3. Area/Power Analysis Routers and Links Hard Router vs. Soft Router 30X smaller, 6X faster, 14X lower power Hard Links vs. Soft Links 9X smaller, 2.4X faster, 1.4X lower power

  25. 3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Hard Mixed Soft 448 LBs 576 LBs ~12,500 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 730 – 940 MHz 166 MHz ~ 50 GB/s Speed ~ 10 GB/s Bisection BW

  26. 3. Area/Power Analysis Soft, Mixed and Hard [65 nm] 64-node NoC on Stratix III Provides ~50GB/s peak bisection bandwidth Very Cheap! Less than cost of 3 soft nodes Hard (Low-V) Mixed Soft 448 LBs 576 LBs ~12,500 LBs Area 33% of FPGA ~ 1.5% of FPGA 64 – NoC Speed 730 – 940 MHz 166 MHz ~ 50 GB/s Speed ~ 10 GB/s Bisection BW

  27. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 123% How much is used for system-level communication? 17.4 W Largest Stratix-III device Typical FPGA Dynamic Power

  28. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

  29. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 11% 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

  30. 3. Area/Power Analysis NoC Power Budget 250 GB/s total bandwidth 7% 11% 123% 15% NoC 17.4 W Typical FPGA Dynamic Power

  31. 3. Area/Power Analysis Bandwidth in Perspective DDR3  Module 1 PCIe Module 2 14.6 GB/s Full theoretical BW 14.6 GB/s Cross whole chip! 17 GB/s 17 GB/s 17 GB/s 17 GB/s 14.6 GB/s Aggregate Bandwidth 126 GB/s 14.6 GB/s NoC Power Budget 3.5%

  32. Outline 1 Why NoCs on FPGAs? 2 Embedded NoCs 3 Area &Power Analysis 4 Comparison Against P2P/Buses Point-to-point links Qsys Buses

  33. 4. Comparison FPGA Interconnect Interconnect = Wires + Logic Interconnect = Just wires Interconnect = NoC Point-to-point Links 1 1 Multiple Masters 1 .. .. 1 .. Compare “wires” interconnect to NoCs Mux + Arbiter 1 .. .. .. .. Broadcast n .. .. .. .. 1 1 .. n .. .. Multiple Masters, Multiple Slaves n Mux + Arbiter 1 1 Mux + Arbiter n n

  34. 4. Comparison NoC Power vs. FPGA Interconnect High Performance / Packet Switched Length of 1 NoC Link 1 % area overhead on Stratix 5 200 MHz Runs at 730-943 MHz Power on-par with simplest FPGA interconnect Hard and Mixed NoCs  Area/Power Efficient

  35. 4. Comparison DDR3: Qsys Bus vs. NoC Embedded NoC: 16 Nodes, hard routers & links Qsys bus: Build logical bus from fabric

  36. 4. Comparison Design Effort close • Steps to close timing using Qsys FPGA

  37. 4. Comparison Design Effort far • Steps to close timing using Qsys FPGA

  38. 4. Comparison Design Effort far • Steps to close timing using Qsys FPGA Timing closure can be simplified with an embedded NoC

  39. 4. Comparison Area Comparison

  40. 4. Comparison Area Comparison

  41. 4. Comparison Area Comparison Entire NoC smaller than bus for 3 modules!

  42. 4. Comparison Area Comparison 1/8 Hard NoC BW used  already less area for most systems

  43. 4. Comparison Power Comparison Hard NoC saves power for even the simplest systems

  44. Why NoCs on FPGAs? 1 Big city needs freeways to handle traffic Embedded NoCs: Mixed & Hard 2 Power: 9-15X Area: 20-23X Speed: 5-6X Area & Power Analysis 3 • Area Budget for 64 nodes: ~1% • Power Budget for 100 GB/s: 3-7% 4 Comparison Against P2P/Buses • Raw efficiency close to simplest P2P links • NoC more efficient & lower design effort

  45. eecg.utoronto.ca/~mohamed/noc_designer.html

  46. Thank You! eecg.utoronto.ca/~mohamed/noc_designer.html

  47. 2. Embedded NoCs Fabric Port • 200 MHz 128-bit module, 900 MHz 32-bit router? • Configurable time-domain mux / demux: match bandwidth • Asynchronous FIFO: cross clock domains  Full NoC bandwidth, w/o clock restrictions on modules

  48. 1. Why NoCs on FPGAs? Compute Acceleration GPU CPU • Maxeler • Geoscience (14x, 70x) • Financial analysis (5x, 163x) • Altera OpenCL • Video compression (3x, 114x) • Information filtering (5.5x)

  49. 1. Why NoCs on FPGAs? Compute Acceleration

  50. 1. Why NoCs on FPGAs? Compute Acceleration

More Related