1 / 34

Hardwired networks on chip for FPGAs and their applications

Hardwired networks on chip for FPGAs and their applications. Kees Goossens (TU Delft, NXP) Muhammad Aqeel Wahlah (TU Delft). Kees Goossens (NXP, TUD) Muhammad Aqeel Wahlah (TUD). overview. applications network on chip FPGA key ideas hardwired NOC unified interconnect

harva
Download Presentation

Hardwired networks on chip for FPGAs and their applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hardwired networks on chip for FPGAsand their applications Kees Goossens (TU Delft, NXP) Muhammad Aqeel Wahlah (TU Delft) Kees Goossens (NXP, TUD) Muhammad Aqeel Wahlah (TUD)

  2. overview • applications • network on chip • FPGA • key ideas • hardwired NOC • unified interconnect • data coercion / type casting • application: dynamic partial reconfiguration • multiple concurrent applications • multiplex sub-applications (“hardware tasks”) • example • conclusions

  3. BA A1 A2 BAC C1 C2 C3 T1 T2 T3 applications • task / function mapped on IP • includes local storage / buffering • application: set of communicating IPs / tasks / ... • data, control, code • communication via connections • use case: set of concurrent applications

  4. network on chip (NOC) • connects ports on hardware blocks (IP) • data, control • connections: virtual wires • real-time / quality of service • programmable at run-time • set up & remove connections by programming control registersin the NOC • styles of communication • address-based /memory-mapped • streaming T3 A1 A2 IP NOC NI NI BA IP IP NI R R NI T2 IP R NI BAC IP T1

  5. LUT LUT LUT LUT FPGA fabric IO processor LUT • soft IP are configured in • configurable elements (LUT) • and switch boxes (not shown) • with a given configuration granularity (frame) using the configuration interconnect (ICAP) • hard IP • CPU • on-chip memories (BRAM, ...) • off-chip memory interfaces • decryption IP • etc. CPU LUT de/encrypt accelerator off-chipmemory LUT on-chip memory LUT on-chip memory configuration: bitstream loading programming / control: set MMIO registers xilinx terminology (frames, ICAP, etc.) ICAP

  6. LUT LUT LUT LUT application on FPGA IO processor LUT soft control interconnect soft data interconnect A1 A2 • design an application as for ASIC • IPs, interconnect, storage, sw • but map on soft & hard IP resources • traditionally have separate softdata and control interconnects • could also use soft NOC for both CPU frame de/encrypt accelerator off-chipmemory BAC frame BA A1 A2 BAC on-chip memory BA frame on-chip memory ICAP

  7. LUT LUT LUT LUT T1 T2 T3 multiple applications on FPGA IO processor LUT soft control interconnect soft data interconnect A1 A2 • interconnects and IPs of different applications share reconfiguration regions (frames) • dynamic reconfiguration is global, not partial CPU T3 LUT de/encrypt accelerator T1 off-chipmemory BAC LUT BA A1 A2 BAC on-chip memory BA LUT T2 on-chip memory ICAP

  8. overview • application • network on chip • FPGA • key ideas • hardwired NOC  improved performance : cost • unified interconnect  flexibility • data coercion / type casting  cool (and useful) applications • application: dynamic partial reconfiguration • multiple concurrentapplications • multiplex sub-applications (“hardware tasks”) • example • conclusions

  9. 1. hardwired interconnect hardinterconnect(s) IO processor CFR A1 A2 • replace soft interconnect(s)by hard interconnect(s) • connect reconfifgurable regionsof LUTs (CFR) • bit-level reconfigurability (CFR) • switch boxes • transaction-levelreconfigurability (NOC) • routers, NIs • memory mapped / streaming [Hecht FPL’05] CPU T3 CFR de/encrypt accelerator off-chipmemory BAC CFR T1 on-chip memory BA CFR T2 on-chip memory ICAP

  10. 1. hardwired interconnect hardinterconnect(s) IO processor CFR c3 C1 • ~35 X smaller area • ~3.5 X higher speed • ~150 X better perf:cost ratio(bits/sec/area) • ~200 X smaller configuration footprint(program MMIO, no bitstream) • ~200 X faster soft IP load & boot • dynamic partial reconfiguration • no constraints on soft IP placement due to communication • loss of flexibility • fewer LUTs • CFR = frame  7% hard NOC [based on Virtex4 & Aethereal NOC, Goossens NOCS’08] C2 CPU T3 CFR de/encrypt accelerator off-chipmemory BAC CFR T1 on-chip memory CFR T2 on-chip memory ICAP

  11. performance & cost • essentially, it all depends on • area soft:hard ≈ 35:1 • speed soft:hard≈ 3.5:1 • configuration footprint of soft NOC (bitstream) :programming footprint of hard NOC (MMIO registers) ≈ 214:1 • resulting in • boot time soft:hard ≈ 1:200 • functional performance:cost (bit/sec:area) soft:hard ≈ 1:147

  12. performance & cost • configuration speed • 1.9 Gb/s for dedicated configuration interconnect (ICAP) • 8 Gb/s for hard NOC • programming speed • 118 MHz soft NOC • 500 MHz hard NOC • configuration footprint for soft NOC • 1.8 Mb (8300 LUTs per router+NI) • programming footprint for hard NOC • 2100 bit per connection • thus to configure & program an NI • 1 msec for soft NOC • 10.6 μsec for hard NOC

  13. 2. unified interconnect single hardinterconnect IO processor CFR A1 A2 • one interconnect (e.g. NOC) for • data for functional mode • control for programming • bitstreams for configuration • dynamic partitioning of different interconnects CPU T3 CFR de/encrypt accelerator off-chipmemory BAC CFR T1 on-chip memory BA CFR T2 on-chip memory ICAP

  14. 3. data coercion bitstream single hard interconnect IO processor CFR • data = control = bitstream = test = … • connect a data portto a configuration port • decrypt bitstreams CPU CFR de/encrypt accelerator off-chipmemory CFR data on-chip memory CFR on-chip memory

  15. 3. data coercion single hard interconnect IO processor CFR • data = control = bitstream = test = … • connect a data portto a configuration port • decrypt bitstreams • relocate bitstreams • run-time compute / optimise bitstreams • JIT, peephole CPU PH CFR de/encrypt accelerator bitstream off-chipmemory CFR on-chip memory CFR IP on-chip memory

  16. 3. data coercion single hard interconnect IO processor CFR • data = control = bitstream = test = … • connect a data portto a configuration port • decrypt bitstreams • relocate bitstreams • run-time compute / optimise bitstreams • JIT, peephole • data port to test port (NOC as TAM) • on-line (structural) testing • on-chip test-vector generation CPU PH CFR de/encrypt accelerator bitstream off-chipmemory CFR on-chip memory CFR IP on-chip memory

  17. overview • applications • network on chip • FPGA • key ideas • hardwired NOC • unified interconnect • data coercion / type casting • application: dynamic partial reconfiguration • multiple concurrent applications • multiplex sub-applications (“hardware tasks”) • example • conclusions

  18. BA A1 A2 BAC C1 C2 C3 T1 T2 T3 dynamic partial reconfiguration: idea • “hardware operating system” implements run-time scheduling of • multiple concurrent applications • independent applications on own virtual platform • no communication, no interference • “performance virtualisation” • activation given by user, environment, etc. app T A AC app D time

  19. dynamic partial reconfiguration: idea • “hardware operating system” implements run-time scheduling of • multiple concurrent applications • parts of single applications (soft IP, “hardware tasks”) • multiplex parts of a single application on same resources sub-app A or sub-app C app T A C app D BA A1 A2 C1 C2 C3 time

  20. BA A1 A2 BAC C1 C2 C3 dynamic partial reconfiguration: idea • “hardware operating system” implements run-time scheduling of • multiple concurrent applications • parts of single applications (soft IP, “hardware tasks”) • multiplex parts of a single application on same resources • internal state state app T A C app D time

  21. dynamic partial reconfiguration: implementation • system manager • resource management (CFR, NOC, memory, …) • inter-application virtual platforms T application manager A C BAC application manager system manager time

  22. dynamic partial reconfiguration: implementation • system manager • resource management (CFR, NOC, memory, …) • inter-application virtual platforms • intra-application phases • NOC programming • soft IP / (sub)-application configuration (incl. clock, reset) • bottleneck? A C BAC application manager system manager time

  23. dynamic partial reconfiguration: implementation • system manager • application manager • application programming T application manager A C BAC application manager system manager time

  24. BA A1 A2 BAC C1 C2 C3 dynamic partial reconfiguration: implementation • system manager • application manager • application programming • intra-application persistent data management state A C BAC application manager system manager time

  25. overview • applications • FPGA • network on chip • key ideas • hardwired NOC • unified interconnect • data coercion / type casting • application: dynamic partial reconfiguration • multiple concurrentapplications • multiplex sub-applications (“hardware tasks”) • example • conclusions

  26. modelling • SystemC • bit & cycle accurate NOC model • behavioural CFR models • accurate bitstream structure • behavioural hard IP models • model • starting / stopping of applications • dynamic, based on user input • starting / stopping of sub-applications • dynamic, based on flow of data • configuration: loading of bitstreams for soft IP; clock & reset • programming: of NOC, system & sub-application managers • management of persistent state

  27. example single hard interconnect IO processor CFR A1 A2 • system manager • program NOC for configuration CPU systemmanager CFR de/encrypt accelerator off-chipmemory BAC CFR applicationmanager on-chip memory BA CFR on-chip memory

  28. bitstream programming example data single hard interconnect IO processor CFR A1 A2 • system manager • program NOC for configuration • configure: load bitstreams • including bitstream syntax, etc. CPU systemmanager CFR de/encrypt accelerator off-chipmemory BAC CFR applicationmanager on-chip memory BA CFR on-chip memory

  29. bitstream programming example data single hard interconnect IO processor CFR A1 A2 • system manager • program NOC for configuration • configure: load bitstreams • program NOC for (sub)-application A CPU systemmanager CFR de/encrypt accelerator off-chipmemory BAC CFR applicationmanager on-chip memory BA CFR on-chip memory

  30. bitstream programming example data single hard interconnect IO processor CFR A1 A2 • system manager • program NOC for configuration • configure: load bitstreams • program NOC for (sub)-application A • program & start application manager • including clocking & reset CPU systemmanager CFR de/encrypt accelerator off-chipmemory BAC CFR applicationmanager on-chip memory BA CFR on-chip memory

  31. bitstream programming example data single hard interconnect IO processor CFR A1 A2 • system manager • program NOC for configuration • configure: load bitstreams • program NOC for (sub)-application A • program & start application manager • application manager • programs & starts sub-app A • soft IP fn is modelled by CFR CPU systemmanager CFR de/encrypt accelerator off-chipmemory BAC CFR applicationmanager on-chip memory BA CFR on-chip memory

  32. bitstream programming example data single hard interconnect IO processor CFR A1 A2 • system manager • program NOC for configuration • configure: load bitstreams • program NOC for (sub)-application A • program & start application manager • application manager • programs & starts sub-app A • sub-application A runs CPU systemmanager CFR de/encrypt accelerator off-chipmemory BAC CFR applicationmanager on-chip memory BA CFR on-chip memory

  33. conclusions • ideas: • hardwired NOC  performance:cost • unified interconnects  hardware multi-tasking • data coercion / type casting  cool & useful • very detailed model • many simplifications & restrictions • many open issues • design flow: soft IP placement, binding, relocation, etc. [Madsen?] • application model: • extend use-case model with intra-application dynamism • more general notions of persistent state • implementation: separation of system & application managers

More Related