# Computer architecture II - PowerPoint PPT Presentation

1 / 40

Computer architecture II. Network topologies. Plan for today Scalable interconnection networks. Basic concepts, definitions Topologies Switching Routing Performance. Outline. Basic concepts, definitions Topologies Switching Routing Performance. Formalism. Graph G=(V,E)

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Computer architecture II

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Computer architecture II

Network topologies

Computer Architecture II

### Plan for todayScalable interconnection networks

• Basic concepts, definitions

• Topologies

• Switching

• Routing

• Performance

Computer Architecture II

### Outline

• Basic concepts, definitions

• Topologies

• Switching

• Routing

• Performance

Computer Architecture II

### Formalism

• Graph G=(V,E)

V : switches and nodes

E: communication channels (edges) e ÍV ´ V

• Route: (v0, ..., vk) path of length k between nodes 0 und k, where (vi,vi+1)E

• Routing distance

• Diameter: the maximal route length between two nodes

• Average distance

• Degree: number of input (output) channels of a node

• Bisection width: minimal number of parallel connections that saturates the network

Computer Architecture II

### What characterizes a network?

• Bandwidth (offered bandwidth)b = wf

• where width w (in bytes) and signaling rate f = 1/t (in Hz)

• Latency

• Time a message travels between two nodes

• Throughput (delivered bandwidth)

• How much from the offered bandwidth is effectively used

Computer Architecture II

### What characterizes a network?

• Topology

• physical interconnection structure of the network graph

• Routing Algorithm

• restricts the set of paths that messages may follow

• many algorithms with different properties

• Switching Strategy

• how data in a message traverses a route

• circuit switching vs. packet switching

• Flow Control Mechanism

• when a message or portions of it traverse a route what happens when traffic is encountered?

Computer Architecture II

### Goals

• Latency as small as possible

• High Throughput

• As many concurrent transfers as possible

• Bisection width gives the potential number of parallel connection

• Cost as low as possible

Computer Architecture II

1

2

3

4

5

### Bus (e.g. Ethernet)

• Degree = 1

• diameter = 1

• No routing necessary

• bisection width = 1

CSMA/CD-protocol limited bus length

Simplest and cheapest

dynamic network

Computer Architecture II

1

2

3

4

5

### Complete graph

• degree= n-1

too expensive for big nets

• diameter = 1

• bisection width=ën/2ûén/2ù

Static Network

Connection between each

Pair of nodes

When cutting the network into two

halves, each node has connection to

n/2 other nodes. There are n/2 such

Nodes.

Computer Architecture II

1

2

3

4

5

### Ring

• degree= 2

• diameter = n/2

slow for big networks

• bisection width = 2

Static network

A node i linked with nodes

i+1 and i-1 modulo n.

• Examples: FDDI, SCI, FiberChannel Arbitrated Loop, KSR1

Computer Architecture II

Cray T3D und T3E.

### d-dimensional grid

1,2

1,3

1,1

For d dimensions

• degree= d

• diameter = d (dn –1)

• bisection width = (dn) d–1

2,1

2,2

2,3

3,1

3,2

3,3

Static network

Computer Architecture II

### Crossbar

1

• fast and expensive (n2 switches)

• Most: Processor x memory

• degree= 1

• diameter = 2

• bisection width = n/2

Ex: 4x4, 8x8, 16x16

2

3

1

2

3

 switch

Dynamic network

Computer Architecture II

0010

0110

0011

0111

0000

0100

0001

0101

### Hypercube (1)

Hamming-Distance =

number of bits in which the binary representation of two numbers differ

Two nodes are connected if the Hamming distance is 1

Routing from x to y by decreasing the Hemming distance

0010

0011

0000

0001

Static network

Computer Architecture II

0110

0010

0010

0011

0111

0011

0000

0100

0000

0001

0001

0101

### Hypercube (2)

k dimensions, n= 2k nodes

• degree= k

• diameter = k

• bisection width = n/2

Two (k-1)-hypercubes are linked through n/2 edges to form a k-hypercube

Intel iPSC/860,

SGI Origin 2000

Computer Architecture II

### Omega-Network (1)

• Building block: 2x2 Shuffle

• Perfect Shuffle Target = cyclic left shift

000

000

001

001

010

010

011

011

100

100

101

101

110

110

111

111

Computer Architecture II

000

000

001

001

010

010

011

011

100

100

101

101

110

110

111

111

### Omega-Network (2)

• Log2n levels of of 2x2 Shuffle building block

• dynamic network

Level i looks at bit i

If 0 goes up

If 1 goes down

See example for 100

sending to 110

Computer Architecture II

### Omega-Network (3)

n nodes, (n/2) log2n building blocks

• degree= 2 for nodes, 4 for building blocks

• diameter = log2n

• bisection width = n/2

• for a random permutation, n/2 messages are expected to cross the network in parallel

• Extremes

• If all the nodes want to send to 0, only one message in parallel

• If each sends a message to himself n messages in parallel

Computer Architecture II

### Fat Tree /Clos-Network (1)

• Nodes = leaves of a tree

• Tree has the diameter 2log2n

„von farthest left over the root to farthest right"

• Simple tree has bisection width = 1

bottleneck

• Fat Tree:

• Edges at level i have double capacity as edges at level i-1

• At level i expensive switches with 2i inputs and 2i outputs

• Known as Clos-networks

Computer Architecture II

### Fat Tree/Clos-Network (2)

• Routing:

• Direct way over the lowest common parent

• When alternative exists, choose randomly.

• Tolerance to node failure

• diameter 2log2n, bisection width: n

• CM-5

Computer Architecture II

### Switching

• How a message traverses the network from one node to the other

• Circuit switching

• One path from source to destination established

• All packets will take that way

• Like the telephone system

• Packet switching

• A message broken into a sequence of packets which can be sent across different routes

• Better utilization of network resources

Computer Architecture II

### Packet Routing

• There are two basic approaches to routing packets, based on what a switch does when the packet begins arriving

• Store-and-forward

• Cut-through

• Virtual cut-through

• Wormhole

### Packet routing: Store-and-Forward

• A packet is completely stored at a switch before being forwarded

• The packet is always on at least two nodes

• Pb: Switches need lots of memory for storing the incoming packets

• Switching takes place step-by-step, the blocking danger is small

Computer Architecture II

### Packet routing: Cut through

• A packet may come partially into the switch and leave its tail on other nodes

• It may reside on more than 2 switches

• The decision to forward the packet may be taken right away

• What to do with the rest of the packet if the head blocks?

• Cut-through: gather tail where the head is

• It degenerates into store-and-forward for high contention

• Wormhole: If the head blocks the whole “worm” blocks

Computer Architecture II

### Store&Forward vs Cut-Through Routing

h(n/b + D) vsn/b + h D

h: number of hops n: message size

b: bandwidth D: routing delay per hop

Computer Architecture II

### Routing Algorithm

• How do I know where a packet should go?

• Topology does NOT determine routing

• Routing algorithms

• Arithmetic

• Source-based

• Table lookup

• Adaptive—route based on network state (e.g., contention)

### (1) Arithmetic Routing

• For regular topology, use simple arithmetic to determine route

• E.g., 3D Torus xy-routing

• Packet header contains signed offset to destination (per dimension)

• At each hop, switch +/- to reduce offset in a dimension

• When x == 0 and y == 0, then at correct processor

• Drawbacks

• Requires ALU in switch

• Must re-compute CRC at each hop

(1,1,1)

(0,1,1)

(0,0,1)

(1,0,1)

(0,1,0)

(1,1,0)

(0,0,0) (1,0,0)

### (2) Source Based & (3) Table Lookup Routing

Source Based

• Source specifies output port for each switch in route

• Very simple switches

• No control state

• Strip output port off header

• Myrinet uses this

Table Lookup

• Very small header: contains a field that is a index into table for output port

• Big tables, must be kept up-to-date

110

010

111

011

100

000

101

001

• Deterministic—follows a pre-specified route

• K-ary d-cube: dimension-order routing

• (x1, y1)  (x2, y2)

• First Dx = x2 - x1,

• Then Dy = y2 - y1,

• Tree: common ancestor

• Adaptive—route determined by contention for output port

• Essential for fault tolerance

• At least multipath

• Can improve utilization of the network

• Simple deterministic algorithms easily run into bad permutations

Computer Architecture II

### Contention

• Two packets trying to use the same link at same time

• limited buffering

• drop?

• Most parallel machines networks block in place

• Traffic may back up toward the source

• tree saturation: backing up all the way long toward destination

Computer Architecture II

### Communication Perf: Latency

• Time(n)s-d = overhead + routing delay + channel occupancy + contention delay

• Overhead: time necessary for initiating the sending and reception of a message

• occupancy = (n + ne) / b

• ne: packet envelope size

• Routing delay

• Contention

Computer Architecture II

### Bandwidth

• What affects local bandwidth?

• packet densityb x n/(n + ne)

• routing delayb x n / (n + ne + wD)

D: nr. Of cycles waiting for a routing decision

w: width of the channel

• contention

• endpoints

• within the network

• Aggregate bandwidth

• bisection bandwidth

• sum of bandwidth of smallest set of links that partition the network

• Bad if not uniform distribution of communication

• total bandwidth of all the channels

Computer Architecture II

### Interconnects

Computer Architecture II

### Myrinet

• Offered bandwidth 2+2 Gbit/s, full duplex

• 5-7 s latency

• Arbitrary Topology, Fat Tree/Clos-Network preferable

• Routing: Wormhole, Source Routing

• Cable (8+1 Bit parallel) or fiber optics

• programmable RISC-Processor 333 MHz,

• PCI/PCI-X connection, upto 133 MHz, 64-Bit,

• 8 Gb/s over PCI-X Bus uni-directional

• 2 MB

Computer Architecture II

16x16 crossbar

### Myrinet Fat Tree (128 node)

Computer Architecture II

cable

connect

Netw.

interface

Net-

DMA

2 MB

SRAM

Host-

DMA

PCI Bridge

LanAI

CPU

2MB SRAM

PCI (-X)-bridge,64 Bit, 66-133 MHz

LanAI RISC, 333 MHz

2 LWL-connectors,

both duplex

Computer Architecture II

### Myrinet 16x16 crossbar

• 8 computers connected in the front side (2 chanels)

• On the backside 8 outputs (2 chanels) toward next level of Clos network

• 32x32, two

Computer Architecture II

### 128-nodes Clos

Building block

from earlier

Computer Architecture II

### Myrinet 256+256-Clos-Network

• Routing network with bisection width

• 256

• Front side 256 computer connection

• Back side 256 connection to next

• level routing units

Computer Architecture II

### Clos-Network with full bisection width: 64 nodes and 32 nodes

Computer Architecture II