Augmenting fpgas with embedded networks on chip
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

Augmenting FPGAs with Embedded Networks-on-Chip PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on
  • Presentation posted in: General

Augmenting FPGAs with Embedded Networks-on-Chip. Mohamed ABDELFATTAH Vaughn BETZ. Outline. 1. Why NoCs on FPGAs?. 2. Embedded NoCs. 3. Comparison Against Buses. 1. Why NoCs on FPGAs?. Motivation. Logic Blocks. Switch Blocks. Wires. Interconnect. 1. Why NoCs on FPGAs?.

Download Presentation

Augmenting FPGAs with Embedded Networks-on-Chip

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Augmenting fpgas with embedded networks on chip

Augmenting FPGAs with Embedded Networks-on-Chip

Mohamed ABDELFATTAH

Vaughn BETZ


Outline

Outline

1

Why NoCs on FPGAs?

2

Embedded NoCs

3

Comparison Against Buses


Motivation

1. Why NoCs on FPGAs?

Motivation

Logic Blocks

Switch Blocks

Wires

Interconnect


Motivation1

1. Why NoCs on FPGAs?

Motivation

Logic Blocks

Switch Blocks

  • Hard Blocks:

  • Memory

  • Multiplier

  • Processor

Wires


Motivation2

1. Why NoCs on FPGAs?

Motivation

1600 MHz

Hard Interfaces

DDR/PCIe ..

Logic Blocks

800 MHz

Switch Blocks

Interconnect still the same

  • Hard Blocks:

  • Memory

  • Multiplier

  • Processor

Wires

200 MHz


Motivation3

1. Why NoCs on FPGAs?

Motivation

1600 MHz

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

DDR3 PHY and Controller

PCIe Controller

800 MHz

200 MHz

Gigabit Ethernet


Motivation4

1. Why NoCs on FPGAs?

Motivation

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

DDR3 PHY and Controller

PCIe Controller

Gigabit Ethernet


Augmenting fpgas with embedded networks on chip

Source: Google Earth

Los Angeles

Barcelona

Keep the “roads”, but add “freeways”.

Logic Cluster

Hard Blocks


Fpga with noc

1. Why NoCs on FPGAs?

FPGA with NoC

NoC

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

DDR3 PHY and Controller

Router forwards data packet

PCIe Controller

Links

Router moves data to local interconnect

Routers

Gigabit Ethernet


Fpga with noc1

1. Why NoCs on FPGAs?

FPGA with NoC

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

  • Abstraction favours modularity:

    • Parallel compilation

    • Partial reconfiguration

    • Multi-chip interconnect

DDR3 PHY and Controller

PCIe Controller

  • High bandwidth endpoints known

  • Pre-design NoC to requirements

Gigabit Ethernet

  • NoC links are “re-usable”

  • NoC is heavily “pipelined”

  • NoC abstraction favors modularity


Fpga with noc2

1. Why NoCs on FPGAs?

FPGA with NoC

Problems:

  • Bandwidth requirements for hard logic/interfaces

  • Timing closure

  • High interconnect utilization:

    • Huge CAD Problem

    • Slow compilation

    • Power/area utilization

  • Wire speed not scaling:

    • Delay is interconnect-dominated

  • Abstraction favours modularity:

    • Parallel compilation

    • Partial reconfiguration

    • Multi-chip interconnect

DDR3 PHY and Controller

PCIe Controller

Gigabit Ethernet

  • Latency-tolerant communication

  • NoC abstraction favors modularity


Compute acceleration

1. Why NoCs on FPGAs?

Compute Acceleration

GPU

CPU

  • Maxeler

    • Geoscience (14x, 70x)

    • Financial analysis (5x, 163x)

  • Altera OpenCL

    • Video compression (3x, 114x)

    • Information filtering (5.5x)


Compute acceleration1

1. Why NoCs on FPGAs?

Compute Acceleration


Compute acceleration2

1. Why NoCs on FPGAs?

Compute Acceleration


Compute acceleration3

1. Why NoCs on FPGAs?

Compute Acceleration

NoC


Outline1

Outline

1

Why NoCs on FPGAs?

2

Embedded NoCs

Mixed NoCs

Hard NoCs

3

Comparison Against Buses


Embedded nocs

2. Embedded NoCs

Embedded NoCs

=

+

“Soft” NoC

Soft Routers

Soft Links

=

+

“Mixed” NoC

Hard Routers

Soft Links

=

+

“Hard” NoC

Hard Routers

Hard Links


Methodology

Methodology

Soft

Mixed

Hard

FPGA CAD Tools

ASIC CAD Tools

Area

Speed

Design Compiler

Power?

Power

HSPICE

Gate-level simulation

Gate-level simulation

Toggle rates


Mixed nocs

2. Embedded NoCs

Mixed NoCs

Logic blocks

FPGA

Programmable

“soft” interconnect

Router

Baseline Router

=

+

“Mixed” NoC

Hard Routers

Soft Links


Mixed nocs1

2. Embedded NoCs

Mixed NoCs

FPGA

Router

=

+

“Mixed” NoC

Hard Routers

Soft Links

20


Augmenting fpgas with embedded networks on chip

2. Embedded NoCs

Mixed NoCs

FPGA

Router

Special Feature

Configurable topology

Assumed a mesh  Can form any topology


Hard nocs

2. Embedded NoCs

Hard NoCs

Logic blocks

FPGA

Programmable

“soft” interconnect

Dedicated “hard” interconnect

Router

=

+

“Hard” NoC

Hard Routers

Hard Links

22


Hard nocs1

2. Embedded NoCs

Hard NoCs

FPGA

Router

=

+

“Hard” NoC

Hard Routers

Hard Links

23


Hard nocs2

2. Embedded NoCs

Hard NoCs

1.1 V

0.9 V

FPGA

Router

Special Feature

Low-V mode

Save 33% Dynamic Power

~15% slower

=

+

“Hard” NoC

Hard Routers

Hard Links

24


Soft mixed and hard

3. Area/Power Analysis

Soft, Mixed and Hard

[65 nm]

64-node NoC on Stratix III

Hard

Mixed

Soft

448 LBs

576 LBs

~12,500 LBs

Area

33% of FPGA

~ 1.5% of FPGA

64 – NoC

Speed

730 – 940 MHz

166 MHz

~ 50 GB/s

Speed

~ 10 GB/s

Bisection BW


Soft mixed and hard1

3. Area/Power Analysis

Soft, Mixed and Hard

[65 nm]

64-node NoC on Stratix III

Provides ~50GB/s peak bisection bandwidth

Very Cheap! Less than cost of 3 soft nodes

Hard (Low-V)

Mixed

Soft

448 LBs

576 LBs

~12,500 LBs

Area

33% of FPGA

~ 1.5% of FPGA

64 – NoC

Speed

730 – 940 MHz

166 MHz

~ 50 GB/s

Speed

~ 10 GB/s

Bisection BW


Noc power budget

3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

123%

How much is used for system-level communication?

17.4 W

Largest Stratix-III device

Typical FPGA Dynamic Power


Noc power budget1

3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

123%

15%

NoC

17.4 W

Typical FPGA Dynamic Power


Noc power budget2

3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

11%

123%

15%

NoC

17.4 W

Typical FPGA Dynamic Power


Noc power budget3

3. Area/Power Analysis

NoC Power Budget

250 GB/s total

bandwidth

7%

11%

123%

15%

NoC

17.4 W

Typical FPGA Dynamic Power


Bandwidth in perspective

3. Area/Power Analysis

Bandwidth in Perspective

DDR3  Module 1

PCIe Module 2

14.6 GB/s

Full theoretical BW

14.6 GB/s

Cross whole chip!

17 GB/s

17 GB/s

17 GB/s

17 GB/s

14.6 GB/s

Aggregate Bandwidth

126 GB/s

14.6 GB/s

NoC Power Budget

3.5%


Outline2

Outline

1

Why NoCs on FPGAs?

2

Embedded NoCs

3

Comparison Against Buses

Area/Power

Efficiency

Design Effort


Ddr3 qsys bus vs noc

4. Comparison

DDR3: Qsys Bus vs. NoC

Embedded NoC:

16 Nodes, hard routers & links

Qsys bus:

Build logical bus from fabric


Ddr3 qsys bus vs noc1

4. Comparison

DDR3: Qsys Bus vs. NoC

“The Case for Embedded Networks-on-Chip on FPGAs”

To appear in IEEE Micro Magazine (February)

Embedded NoC:

16 Nodes, hard routers & links

Qsys bus:

Build logical bus from fabric


Design effort

4. Comparison

Design Effort

close

  • Steps to close timing using Qsys

FPGA


Design effort1

4. Comparison

Design Effort

far

  • Steps to close timing using Qsys

FPGA


Design effort2

4. Comparison

Design Effort

far

  • Steps to close timing using Qsys

FPGA

Timing closure can be simplified with an embedded NoC


Area comparison

4. Comparison

Area Comparison


Area comparison1

4. Comparison

Area Comparison


Area comparison2

4. Comparison

Area Comparison

Entire NoC smaller than bus for 3 modules!


Area comparison3

4. Comparison

Area Comparison

1/8 Hard NoC BW used  already less area for most systems


Power comparison

4. Comparison

Power Comparison

Hard NoC saves power for even the simplest systems


Augmenting fpgas with embedded networks on chip

Why NoCs on FPGAs?

1

Big city needs freeways to handle traffic

Embedded NoCs: Mixed & Hard

2

Power: 9-15X

Area: 20-23X

Speed: 5-6X

  • Area Budget for 64 nodes: ~1%

  • Power Budget for 100 GB/s: 3-7%

3

Comparison Against P2P/Buses

  • Raw efficiency close to simplest P2P links

  • NoC more efficient & lower design effort.


Augmenting fpgas with embedded networks on chip

Thank You!

www.eecg.utoronto.ca/~mohamed


  • Login