a novel 3d layer multiplexed on chip network
Download
Skip this Video
Download Presentation
A Novel 3D Layer-Multiplexed On-Chip Network

Loading in 2 Seconds...

play fullscreen
1 / 36

A Novel 3D Layer-Multiplexed On-Chip Network - PowerPoint PPT Presentation


  • 206 Views
  • Uploaded on

A Novel 3D Layer-Multiplexed On-Chip Network. Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego. Networks -on-Chip. Chip-multiprocessors ( CMPs ) increasingly popular 2D-mesh networks often used as on-chip fabric. 12.64mm. I/O Area.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' A Novel 3D Layer-Multiplexed On-Chip Network' - oro


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a novel 3d layer multiplexed on chip network

A Novel 3D Layer-Multiplexed On-Chip Network

Rohit Sunkam Ramanujam

Bill Lin

Electrical and Computer Engineering

University of California, San Diego

networks on chip
Networks-on-Chip
  • Chip-multiprocessors (CMPs) increasingly popular
  • 2D-mesh networks often used as on-chip fabric

12.64mm

I/O Area

single tile

1.5mm

2.0mm

21.72mm

Tilera Tile64

Intel 80-core

I/O Area

3d i ntegrated c ircuits
3D Integrated Circuits

Through Silicon Via

Device layer 2

≥ 2 active device layers

Short inter-layer distances

Device layer 1

  • Reduced chip footprint
  • Reduced wire delays
  • High inter-layer bandwidth
  • Heterogeneous system integration
natural progression 3d mesh for 3d cmps
Natural Progression: 3D Mesh for 3D CMPs

3D Mesh

2D Mesh

What routing algorithms to use for 3D mesh networks?

outline
Outline

Oblivious routing on a 3D mesh

Layer-multiplexed 3D architecture

Evaluation

oblivious routing objectives
Oblivious Routing Objectives
  • Maximize throughput
    • Distribute traffic evenly on network links
    • Maximize worst-case throughput as traffic is application dependent
  • Minimize hop count
    • Minimize routing delay between source and destination
    • Reduce power
routing algorithms for 3d mesh networks
Routing Algorithms for 3D Mesh Networks
  • Valiant Routing
  • Optimal worst-case throughput
  • Poor latency

2

VAL

  • Dimension Ordered Routing
  • Minimal latency
  • Poor worst-case throughput
  • O1TURN Routing
  • Minimal latency
  • Poor worst-case throughput
  • Ideal routing algorithm
  • Minimal latency
  • Maximum worst-case throughput

Average hop count

(normalized to minimal)

1

IDEAL

DOR

O1TURN

0.5

0.25

Worst-case throughput

(fraction of network capacity)

randomized partially minimal routing rpm
Randomized Partially-Minimal Routing (RPM)

Z

Y

X

Random

intermediate layer

Destination

Source

Phase-2Z

Intermediate layer to the destination

Phase-1Z

Source to the intermediate layer

XYorYX routing on the intermediate layer

main idea
Main Idea
  • Load-balance uniformly across the vertical layers
    • 2 phases of vertical routing
  • Min XY/YX used on each layer
routing algorithms for 3d mesh networks1
Routing Algorithms for 3D Mesh Networks

2

VAL

  • Randomized Partially Minimal Routing
  • Near-optimal worst-case throughput
  • Low latency

Average hop count

(normalized to minimal)

RPM

1.1

1

IDEAL

DOR

O1TURN

0.5

0.25

Worst-case throughput

(fraction of network capacity)

rpm has near optimal worst case throughput
RPM has Near-optimal Worst-case Throughput

RPM is optimal for even radix, within 1/k2 of optimal for odd radix.

outline1
Outline

Oblivious routing on a 3D mesh

Layer-multiplexed (LM) 3D architecture

Evaluation

unique features of 3d ics
Unique Features of 3D ICs

50μm

TSV

  • Inter-layer distances are very small (~50 μm)
    • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)
    • Vertical interconnects implemented using Through-Silicon-Vias (TSVs) have very low delay

1500μm

unique features of 3d ics1
Unique Features of 3D ICs

4 μm

  • Inter-layer distances are very small (~50 μm)
    • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)
    • Vertical wires using Through-Silicon-Vias (TSVs) have very low delay
  • Vertical bandwidth abundant as TSVs can be densely packed in 2D with small via pitch (~4 μm)

4 μm

unique features of 3d ics2
Unique Features of 3D ICs
  • Inter-layer distances are very small (~50 μm)
    • Order of magnitude lower than distances between adjacent tiles on a 2D plane (~1500 μm)
    • Vertical wires using Through-Silicon-Vias (TSVs) have very low delay
  • Vertical wiring abundant as TSVs can be packed in 2D with small via pitch (~4 μm)
  • Number of device layers likely to remain small (4-5 layers) due to thermal and manufacturing issues
rpm on a 3d mesh
RPM on a 3D Mesh

Z

Y

X

Random

intermediate layer

Destination

Source

Phase-2Z

Intermediate layer to the destination

Phase-1Z

Source to the intermediate layer

*

XYorYX routing on the intermediate layer

proposed layer multiplexed architecture
Proposed Layer-Multiplexed Architecture

Y

Phase-2Z

Intermediate layer to the destination

Phase-1Z

Source to the intermediate layer

Z

X

Random

intermediate layer

P1

P2

P1

P3

P2

P4

RPM routing adapted to the LM architecture : RPM-LM

P3

Destination

*

P4

XYorYX routing on the intermediate layer

Source

power and area savings
Power and Area Savings

P1

P2

.

.

.

P3

P1

P1

P2

P2

P4

Conventional 3D Mesh

P3

P3

P4

P4

Layer-Multiplexed Architecture

  • Decouple vertical routing from horizontal routing
  • Restrict vertical routing to packet injection and packet ejection

Packet injection demultiplexer

Packet ejection multiplexer

  • 5x5 crossbar in LM vs. 7x7 crossbar in 3D mesh
single hop vertical communication
Single Hop Vertical Communication
  • Single hop vertical routing more power efficient than one-layer-per-hop routing
    • Leverages short inter-layer distances in 3D ICs
    • Better utilizes available vertical bandwidth
packet injection demultiplexer
Packet Injection Demultiplexer

Route Selection/Load Balancing

VC Allocation

Credits in from the injection port of routers on layers 1-4

Flit Counters

Switch Arbitration

To the injection port of the Layer 1 router

P1

.

.

.

P2

P3

To the injection port of the Layer 4 router

P4

packet ejection multiplexer
Packet Ejection Multiplexer

Credits out for L1-P1,

L2-P1, L3-P1 and L4-P1

Arbiter

VCID

L1-P1

P1

L2-P1

Router on Layer 1

Packets from layer2

L3-P1

Packets from layer3

Packets from layer4

L4-P1

.

.

.

P2

P3

Credits out for L1-P4,

L2-P4, L3-P4 and L4-P4

Arbiter

L1-P4

P4

Packets from layer2

L2-P4

Packets from layer3

L3-P4

Packets from layer4

L4-P4

outline2
Outline
  • Oblivious routing on a 3D mesh
  • Layer-multiplexed 3D architecture
  • Evaluation
    • Power and Area
    • Performance
power and area evaluation
Power and Area Evaluation
  • Used Orion 2.0 models for router power and area estimation.
  • 65nm process at 1V and 1GHz
  • Buffers
    • 4VCs/port, 5flits/VC for routers
    • 5 flits/port for packet injection demultiplexer
    • 5 flits/port for each packet ejection multiplexer
power comparison
Power Comparison
  • 3D mesh
    • One 7-port router per tile
  • LM
    • One 5-port router per tile
    • One packet injection demultiplexer for every 4 tiles
    • One packet ejection multiplexer per tile
power evaluation
Power Evaluation

27% power reduction

area evaluation
Area Evaluation

26.5% power reduction

outline3
Outline
  • Oblivious routing on a 3D mesh
  • Layer-multiplexed 3D architecture
  • Evaluation
    • Power and Area
    • Performance
rpm on a 3d mesh vs rpm lm
RPM on a 3D mesh vs. RPM-LM
  • Worst-case throughput
    • RPM-LM achieves same (near-optimal) worst-case throughput as RPM
  • Average-case throughput
flit level simulation
Flit-Level Simulation
  • Ideal throughput evaluation assumes
    • Ideal single-cycle router
    • Infinite buffers
    • No contention in switches, no flow control
  • Flit-level simulation
    • PopNet network simulator
    • 5stage router pipeline
    • Credit-based flow control
    • 8 virtual channels, each 5 flits deep
    • Multi-flit packets injected into the network (5 flits/packet)
flit level simulation cont d
Flit-Level Simulation (cont’d)
  • Network configurations simulated
    • 4 x 4 x 4 mesh
    • 8 x 8 x 4 mesh
  • Four different traffic traces used
    • Uniform traffic
    • Transpose traffic: (x,y,z) → (y,z,x)
    • Complement traffic: (x,y,z) → (k-x-1, k-y-1, k-z-1)
    • Worst Case traffic pattern for DOR (DOR-WC):

(x,y,z) → (k-z-1, k-y-1, k-x-1)

summary of contributions
Summary of Contributions

Proposed a 3D Layer-multiplexed architecture which is an optimization of a 3D mesh

Exploits the optimality of RPM together with the high vertical bandwidth enabled in 3D technology

LM architecture consumes 27% less power, occupies 26% less area than a 3D mesh

RPM-LM has comparable (marginally better) performance to RPM on a 3D mesh

ad