Augmenting fpgas with embedded networks on chip
Sponsored Links
This presentation is the property of its rightful owner.
1 / 45

Augmenting FPGAs with Embedded Networks-on-Chip PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on
  • Presentation posted in: General

Augmenting FPGAs with Embedded Networks-on-Chip. Mohamed ABDELFATTAH Vaughn BETZ. Outline. 1. Why NoCs on FPGAs?. 2. Embedded NoCs. 3. Comparison Against Buses. 1. Why NoCs on FPGAs?. Motivation. Logic Blocks. Switch Blocks. Wires. Interconnect. 1. Why NoCs on FPGAs?.

Download Presentation

Augmenting FPGAs with Embedded Networks-on-Chip

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Augmenting FPGAs with Embedded Networks-on-Chip

Mohamed ABDELFATTAH

Vaughn BETZ


Outline

1

Why NoCs on FPGAs?

2

Embedded NoCs

3

Comparison Against Buses


1. Why NoCs on FPGAs?

Motivation

Logic Blocks

Switch Blocks

Wires

Interconnect


1. Why NoCs on FPGAs?

Motivation

Logic Blocks

Switch Blocks

  • Hard Blocks:

  • Memory

  • Multiplier

  • Processor

Wires


1. Why NoCs on FPGAs?

Motivation

1600 MHz

Hard Interfaces

DDR/PCIe ..

Logic Blocks

800 MHz

Switch Blocks

Interconnect still the same

  • Hard Blocks:

  • Memory

  • Multiplier

  • Processor

Wires

200 MHz


1. Why NoCs on FPGAs?

Motivation

1600 MHz

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

DDR3 PHY and Controller

PCIe Controller

800 MHz

200 MHz

Gigabit Ethernet


1. Why NoCs on FPGAs?

Motivation

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

DDR3 PHY and Controller

PCIe Controller

Gigabit Ethernet


Source: Google Earth

Los Angeles

Barcelona

Keep the “roads”, but add “freeways”.

Logic Cluster

Hard Blocks


1. Why NoCs on FPGAs?

FPGA with NoC

NoC

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

DDR3 PHY and Controller

Router forwards data packet

PCIe Controller

Links

Router moves data to local interconnect

Routers

Gigabit Ethernet


1. Why NoCs on FPGAs?

FPGA with NoC

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

  • Abstraction favours modularity:

    • Parallel compilation

    • Partial reconfiguration

    • Multi-chip interconnect

DDR3 PHY and Controller

PCIe Controller

  • High bandwidth endpoints known

  • Pre-design NoC to requirements

Gigabit Ethernet

  • NoC links are “re-usable”

  • NoC is heavily “pipelined”

  • NoC abstraction favors modularity


1. Why NoCs on FPGAs?

FPGA with NoC

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

  • Abstraction favours modularity:

    • Parallel compilation

    • Partial reconfiguration

    • Multi-chip interconnect

DDR3 PHY and Controller

PCIe Controller

Gigabit Ethernet

  • Latency-tolerant communication

  • NoC abstraction favors modularity


1. Why NoCs on FPGAs?

Compute Acceleration

GPU

CPU

  • Maxeler

    • Geoscience (14x, 70x)

    • Financial analysis (5x, 163x)

  • Altera OpenCL

    • Video compression (3x, 114x)

    • Information filtering (5.5x)


1. Why NoCs on FPGAs?

Compute Acceleration


1. Why NoCs on FPGAs?

Compute Acceleration


1. Why NoCs on FPGAs?

Compute Acceleration

NoC


Outline

1

Why NoCs on FPGAs?

2

Embedded NoCs

Mixed NoCs

Hard NoCs

3

Comparison Against Buses


2. Embedded NoCs

Embedded NoCs

=

+

“Soft” NoC

Soft Routers

Soft Links

=

+

“Mixed” NoC

Hard Routers

Soft Links

=

+

“Hard” NoC

Hard Routers

Hard Links


Methodology

Soft

Mixed

Hard

FPGA CAD Tools

ASIC CAD Tools

Area

Speed

Design Compiler

Power?

Power

HSPICE

Gate-level simulation

Gate-level simulation

Toggle rates


2. Embedded NoCs

Mixed NoCs

Logic blocks

FPGA

Programmable

“soft” interconnect

Router

Baseline Router

=

+

“Mixed” NoC

Hard Routers

Soft Links


2. Embedded NoCs

Mixed NoCs

FPGA

Router

=

+

“Mixed” NoC

Hard Routers

Soft Links

20


2. Embedded NoCs

Mixed NoCs

FPGA

Router

Special Feature

Configurable topology

Assumed a mesh  Can form any topology


2. Embedded NoCs

Hard NoCs

Logic blocks

FPGA

Programmable

“soft” interconnect

Dedicated “hard” interconnect

Router

=

+

“Hard” NoC

Hard Routers

Hard Links

22


2. Embedded NoCs

Hard NoCs

FPGA

Router

=

+

“Hard” NoC

Hard Routers

Hard Links

23


2. Embedded NoCs

Hard NoCs

1.1 V

0.9 V

FPGA

Router

Special Feature

Low-V mode

Save 33% Dynamic Power

~15% slower

=

+

“Hard” NoC

Hard Routers

Hard Links

24


3. Area/Power Analysis

Soft, Mixed and Hard

[65 nm]

64-node NoC on Stratix III

Hard

Mixed

Soft

448 LBs

576 LBs

~12,500 LBs

Area

33% of FPGA

~ 1.5% of FPGA

64 – NoC

Speed

730 – 940 MHz

166 MHz

~ 50 GB/s

Speed

~ 10 GB/s

Bisection BW


3. Area/Power Analysis

Soft, Mixed and Hard

[65 nm]

64-node NoC on Stratix III

Provides ~50GB/s peak bisection bandwidth

Very Cheap! Less than cost of 3 soft nodes

Hard (Low-V)

Mixed

Soft

448 LBs

576 LBs

~12,500 LBs

Area

33% of FPGA

~ 1.5% of FPGA

64 – NoC

Speed

730 – 940 MHz

166 MHz

~ 50 GB/s

Speed

~ 10 GB/s

Bisection BW


3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

123%

How much is used for system-level communication?

17.4 W

Largest Stratix-III device

Typical FPGA Dynamic Power


3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

123%

15%

NoC

17.4 W

Typical FPGA Dynamic Power


3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

11%

123%

15%

NoC

17.4 W

Typical FPGA Dynamic Power


3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

7%

11%

123%

15%

NoC

17.4 W

Typical FPGA Dynamic Power


3. Area/Power Analysis

Bandwidth in Perspective

DDR3  Module 1

PCIe Module 2

14.6 GB/s

Full theoretical BW

14.6 GB/s

Cross whole chip!

17 GB/s

17 GB/s

17 GB/s

17 GB/s

14.6 GB/s

Aggregate Bandwidth

126 GB/s

14.6 GB/s

NoC Power Budget

3.5%


Outline

1

Why NoCs on FPGAs?

2

Embedded NoCs

3

Comparison Against Buses

Area/Power

Efficiency

Design Effort


4. Comparison

DDR3: Qsys Bus vs. NoC

Embedded NoC:

16 Nodes, hard routers & links

Qsys bus:

Build logical bus from fabric


4. Comparison

DDR3: Qsys Bus vs. NoC

“The Case for Embedded Networks-on-Chip on FPGAs”

To appear in IEEE Micro Magazine (February)

Embedded NoC:

16 Nodes, hard routers & links

Qsys bus:

Build logical bus from fabric


4. Comparison

Design Effort

close

  • Steps to close timing using Qsys

FPGA


4. Comparison

Design Effort

far

  • Steps to close timing using Qsys

FPGA


4. Comparison

Design Effort

far

  • Steps to close timing using Qsys

FPGA

Timing closure can be simplified with an embedded NoC


4. Comparison

Area Comparison


4. Comparison

Area Comparison


4. Comparison

Area Comparison

Entire NoC smaller than bus for 3 modules!


4. Comparison

Area Comparison

1/8 Hard NoC BW used  already less area for most systems


4. Comparison

Power Comparison

Hard NoC saves power for even the simplest systems


Why NoCs on FPGAs?

1

Big city needs freeways to handle traffic

Embedded NoCs: Mixed & Hard

2

Power: 9-15X

Area: 20-23X

Speed: 5-6X

  • Area Budget for 64 nodes: ~1%

  • Power Budget for 100 GB/s: 3-7%

3

Comparison Against P2P/Buses

  • Raw efficiency close to simplest P2P links

  • NoC more efficient & lower design effort.


Thank You!

www.eecg.utoronto.ca/~mohamed


  • Login