A novel 3d layer multiplexed on chip network
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

A Novel 3D Layer-Multiplexed On-Chip Network PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on
  • Presentation posted in: General

A Novel 3D Layer-Multiplexed On-Chip Network. Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego. Networks -on-Chip. Chip-multiprocessors ( CMPs ) increasingly popular 2D-mesh networks often used as on-chip fabric. 12.64mm. I/O Area.

Download Presentation

A Novel 3D Layer-Multiplexed On-Chip Network

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A novel 3d layer multiplexed on chip network

A Novel 3D Layer-Multiplexed On-Chip Network

Rohit Sunkam Ramanujam

Bill Lin

Electrical and Computer Engineering

University of California, San Diego


Networks on chip

Networks-on-Chip

  • Chip-multiprocessors (CMPs) increasingly popular

  • 2D-mesh networks often used as on-chip fabric

12.64mm

I/O Area

single tile

1.5mm

2.0mm

21.72mm

Tilera Tile64

Intel 80-core

I/O Area


3d i ntegrated c ircuits

3D Integrated Circuits

Through Silicon Via

Device layer 2

≥ 2 active device layers

Short inter-layer distances

Device layer 1

  • Reduced chip footprint

  • Reduced wire delays

  • High inter-layer bandwidth

  • Heterogeneous system integration


Natural progression 3d mesh for 3d cmps

Natural Progression: 3D Mesh for 3D CMPs

3D Mesh

2D Mesh

What routing algorithms to use for 3D mesh networks?


Outline

Outline

Oblivious routing on a 3D mesh

Layer-multiplexed 3D architecture

Evaluation


Oblivious routing objectives

Oblivious Routing Objectives

  • Maximize throughput

    • Distribute traffic evenly on network links

    • Maximize worst-case throughput as traffic is application dependent

  • Minimize hop count

    • Minimize routing delay between source and destination

    • Reduce power


Routing algorithms for 3d mesh networks

Routing Algorithms for 3D Mesh Networks

  • Valiant Routing

  • Optimal worst-case throughput

  • Poor latency

2

VAL

  • Dimension Ordered Routing

  • Minimal latency

  • Poor worst-case throughput

  • O1TURN Routing

  • Minimal latency

  • Poor worst-case throughput

  • Ideal routing algorithm

  • Minimal latency

  • Maximum worst-case throughput

Average hop count

(normalized to minimal)

1

IDEAL

DOR

O1TURN

0.5

0.25

Worst-case throughput

(fraction of network capacity)


Randomized partially minimal routing rpm

Randomized Partially-Minimal Routing (RPM)

Z

Y

X

Random

intermediate layer

Destination

Source

Phase-2Z

Intermediate layer to the destination

Phase-1Z

Source to the intermediate layer

XYorYX routing on the intermediate layer


Main idea

Main Idea

  • Load-balance uniformly across the vertical layers

    • 2 phases of vertical routing

  • Min XY/YX used on each layer


Routing algorithms for 3d mesh networks1

Routing Algorithms for 3D Mesh Networks

2

VAL

  • Randomized Partially Minimal Routing

  • Near-optimal worst-case throughput

  • Low latency

Average hop count

(normalized to minimal)

RPM

1.1

1

IDEAL

DOR

O1TURN

0.5

0.25

Worst-case throughput

(fraction of network capacity)


Rpm has near optimal worst case throughput

RPM has Near-optimal Worst-case Throughput

RPM is optimal for even radix, within 1/k2 of optimal for odd radix.


Performance of rpm average case throughput

Performance of RPM:Average-case Throughput


Outline1

Outline

Oblivious routing on a 3D mesh

Layer-multiplexed (LM) 3D architecture

Evaluation


Unique features of 3d ics

Unique Features of 3D ICs

50μm

TSV

  • Inter-layer distances are very small (~50 μm)

    • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)

    • Vertical interconnects implemented using Through-Silicon-Vias (TSVs) have very low delay

1500μm


Unique features of 3d ics1

Unique Features of 3D ICs

4 μm

  • Inter-layer distances are very small (~50 μm)

    • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)

    • Vertical wires using Through-Silicon-Vias (TSVs) have very low delay

  • Vertical bandwidth abundant as TSVs can be densely packed in 2D with small via pitch (~4 μm)

4 μm


Unique features of 3d ics2

Unique Features of 3D ICs

  • Inter-layer distances are very small (~50 μm)

    • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)

    • Vertical wires using Through-Silicon-Vias (TSVs) have very low delay

  • Vertical wiring abundant as TSVs can be packed in 2D with small via pitch (~4 μm)

  • Number of device layers likely to remain small (4-5 layers) due to thermal and manufacturing issues


Rpm on a 3d mesh

RPM on a 3D Mesh

Z

Y

X

Random

intermediate layer

Destination

Source

Phase-2Z

Intermediate layer to the destination

Phase-1Z

Source to the intermediate layer

*

XYorYX routing on the intermediate layer


Proposed layer multiplexed architecture

Proposed Layer-Multiplexed Architecture

Y

Phase-2Z

Intermediate layer to the destination

Phase-1Z

Source to the intermediate layer

Z

X

Random

intermediate layer

P1

P2

P1

P3

P2

P4

RPM routing adapted to the LM architecture : RPM-LM

P3

Destination

*

P4

XYorYX routing on the intermediate layer

Source


Power and area savings

Power and Area Savings

P1

P2

.

.

.

P3

P1

P1

P2

P2

P4

Conventional 3D Mesh

P3

P3

P4

P4

Layer-Multiplexed Architecture

  • Decouple vertical routing from horizontal routing

  • Restrict vertical routing to packet injection and packet ejection

Packet injection demultiplexer

Packet ejection multiplexer

  • 5x5 crossbar in LM vs. 7x7 crossbar in 3D mesh


Single hop vertical communication

Single Hop Vertical Communication

  • Single hop vertical routing more power efficient than one-layer-per-hop routing

    • Leverages short inter-layer distances in 3D ICs

    • Better utilizes available vertical bandwidth


Packet injection demultiplexer

Packet Injection Demultiplexer

Route Selection/Load Balancing

VC Allocation

Credits in from the injection port of routers on layers 1-4

Flit Counters

Switch Arbitration

To the injection port of the Layer 1 router

P1

.

.

.

P2

P3

To the injection port of the Layer 4 router

P4


Packet ejection multiplexer

Packet Ejection Multiplexer

Credits out for L1-P1,

L2-P1, L3-P1 and L4-P1

Arbiter

VCID

L1-P1

P1

L2-P1

Router on Layer 1

Packets from layer2

L3-P1

Packets from layer3

Packets from layer4

L4-P1

.

.

.

P2

P3

Credits out for L1-P4,

L2-P4, L3-P4 and L4-P4

Arbiter

L1-P4

P4

Packets from layer2

L2-P4

Packets from layer3

L3-P4

Packets from layer4

L4-P4


Outline2

Outline

  • Oblivious routing on a 3D mesh

  • Layer-multiplexed 3D architecture

  • Evaluation

    • Power and Area

    • Performance


Power and area evaluation

Power and Area Evaluation

  • Used Orion 2.0 models for router power and area estimation.

  • 65nm process at 1V and 1GHz

  • Buffers

    • 4VCs/port, 5flits/VC for routers

    • 5 flits/port for packet injection demultiplexer

    • 5 flits/port for each packet ejection multiplexer


Power comparison

Power Comparison

  • 3D mesh

    • One 7-port router per tile

  • LM

    • One 5-port router per tile

    • One packet injection demultiplexer for every 4 tiles

    • One packet ejection multiplexer per tile


Power evaluation

Power Evaluation

27% power reduction


Area evaluation

Area Evaluation

26.5% power reduction


Outline3

Outline

  • Oblivious routing on a 3D mesh

  • Layer-multiplexed 3D architecture

  • Evaluation

    • Power and Area

    • Performance


Rpm on a 3d mesh vs rpm lm

RPM on a 3D mesh vs. RPM-LM

  • Worst-case throughput

    • RPM-LM achieves same (near-optimal) worst-case throughput as RPM

  • Average-case throughput


Flit level simulation

Flit-Level Simulation

  • Ideal throughput evaluation assumes

    • Ideal single-cycle router

    • Infinite buffers

    • No contention in switches, no flow control

  • Flit-level simulation

    • PopNet network simulator

    • 5stage router pipeline

    • Credit-based flow control

    • 8 virtual channels, each 5 flits deep

    • Multi-flit packets injected into the network (5 flits/packet)


Flit level simulation cont d

Flit-Level Simulation (cont’d)

  • Network configurations simulated

    • 4 x 4 x 4 mesh

    • 8 x 8 x 4 mesh

  • Four different traffic traces used

    • Uniform traffic

    • Transpose traffic: (x,y,z) → (y,z,x)

    • Complement traffic: (x,y,z) → (k-x-1, k-y-1, k-z-1)

    • Worst Case traffic pattern for DOR (DOR-WC):

      (x,y,z) → (k-z-1, k-y-1, k-x-1)


Uniform traffic 8x8x4 mesh

Uniform Traffic8x8x4 Mesh


Transpose traffic 8x8x4 mesh

Transpose Traffic8x8x4 Mesh


Worst case traffic for dor 8x8x4 mesh

Worst-case Traffic for DOR8x8x4 Mesh


Summary of contributions

Summary of Contributions

Proposed a 3D Layer-multiplexed architecture which is an optimization of a 3D mesh

Exploits the optimality of RPM together with the high vertical bandwidth enabled in 3D technology

LM architecture consumes 27% less power, occupies 26% less area than a 3D mesh

RPM-LM has comparable (marginally better) performance to RPM on a 3D mesh


A novel 3d layer multiplexed on chip network

Thank you!!


  • Login