interconnection networks l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Interconnection Networks PowerPoint Presentation
Download Presentation
Interconnection Networks

Loading in 2 Seconds...

play fullscreen
1 / 67

Interconnection Networks - PowerPoint PPT Presentation


  • 250 Views
  • Uploaded on

Interconnection Networks . Overview. Physical Layer and Message Switching Network Topologies Metrics Deadlock & Livelock Routing Layer The Messaging Layer. Interconnection Networks. Fabric for scalable, multiprocessor architectures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Interconnection Networks' - aaron


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Physical Layer and Message Switching
  • Network Topologies
  • Metrics
  • Deadlock & Livelock
  • Routing Layer
  • The Messaging Layer
interconnection networks3
Interconnection Networks
  • Fabric for scalable, multiprocessor architectures
  • Distinct from traditional networking architectures such as Internet Protocol (IP) based systems
  • We are interested in applications to large clusters as well as embedded systems
clux a beowulf cluster
CLUX: A Beowulf Cluster

Interconnection Network Cables

Myrinet Switch

Images from the Clux cluster at http://www.fyslab.hut.fi/clux/

the practical problem
The Practical Problem

From: Ambuj Goyal, “Computer Science Grand Challenge – Simplicity of Design,” Computing Research Association Conference on "Grand Research Challenges" in Computer Science and Engineering, June 2002

example embedded devices
Example: Embedded Devices

picoChip: http://www.picochip.com/

  • Issues
    • Execution performance
    • Power dissipation
    • Number of chip types
    • Size and form factor

PACT XPP Technologies: http://www.pactcorp.com/

messaging hierarchy
Messaging Hierarchy

Routing Layer

Where?: Destination decisions, i.e., which output port

Switching Layer

When?: When is data forwarded

Physical Layer

How?: synchronization of data transfer

  • This organization is distinct from traditional networking implementations
  • Emphasis is on low latency communication
    • Only recently have standards been evolving
      • Infiniband: http://www.infinibandta.org/home
the physical layer
The Physical Layer

Data

  • Data is transmitted based on a hierarchical data structuring mechanism
    • Messages  packets  flits  phits
    • While flits and phits are fixed size, packets and data may be variable sized

Packets

checksum

header

Flit: flow control digit

Phit: physical flow control digit

flow control
Flow Control
  • Flow control digit: synchronized transfer of a unit of information
    • Based on buffer management
  • Asynchronous vs. synchronous flow control
  • Flow control occurs at multiple levels
    • message flow control
    • physical flow control
  • Mechanisms
    • Credit based flow control
switching layer
Switching Layer
  • Comprised of three sets of techniques
    • switching techniques
    • flow control
    • buffer management
  • Organization and operation of routers are largely determined by the switching layer
  • Connection Oriented vs. Connectionless communication
generic router architecture
Generic Router Architecture

Wire delay

Switching delay

Routing delay

virtual channels
Virtual Channels
  • Each virtual channel is a pair of unidirectional channels
  • Independently managed buffers multiplexed over the physical channel
  • De-couples buffers from physical channels
  • Originally introduced to break cyclic dependencies
  • Improves performance through reduction of blocking delay
  • Virtual lanes vs. virtual channels
  • As the number of virtual channels increase, the increased channel multiplexing has two effects
    • decrease in header delay
    • increase in average data flit delay
  • Impact on router performance
    • switch complexity
circuit switching
Circuit Switching

Header Probe

Acknowledgment

Data

  • Hardware path setup by a routing header or probe
  • End-to-end acknowledgment initiates transfer at full hardware bandwidth
  • Source routing vs. distributed routing
  • System is limited by signaling rate along the circuits

Link

tr

ts

tsetup

tdata

Time Busy

packet switching

Message Header

Message Data

Link

tr

tpacket

Time Busy

Packet Switching
  • Blocking delays in circuit switching avoided in packet switched networks  full link utilization in the presence of data
  • Increased storage requirements at the nodes
  • Packetization and in-order delivery requirements
  • Buffering
    • use of local processor memory
    • central queues
virtual cut through
Virtual Cut-Through

Packet Header

Message Packet

cuts through

the Router

  • Messages cut-through to the next router when feasible
  • In the absence of blocking, messages are pipelined
    • pipeline cycle time is the larger of intra-router and inter-router flow control delays
  • When the header is blocked, the complete message is buffered
  • High load behavior approaches that of packet switching

tw

Link

tblocking

tr

ts

Time Busy

wormhole switching
Wormhole Switching

Header Flit

  • Messages are pipelined, but buffer space is on the order of a few flits
  • Small buffers + message pipelining  small compact buffers
  • Supports variable sized messages
  • Messages cannot be interleaved over a channel: routing information is only associated with the header
  • Base Latency is equivalent to that of virtual cut-through

Link

Single Flit

tr

ts

twormhole

Time Busy

comparison of switching techniques
Comparison of Switching Techniques
  • Packet switching and virtual cut-through
    • consume network bandwidth proportional to network load
    • predictable demands
    • VCT behaves like wormhole at low loads and like packet switching at high loads
    • link level error control for packet switching
  • Wormhole switching
    • provides low latency
    • lower saturation point
    • higher variance of message latency than packet or VCT switching
  • Virtual channels
    • blocking delay vs. data delay
    • router flow control latency
  • Optimistic vs. conservative flow control
motivation
Motivation
  • Crossbars provide full connectivity among ports, but cost and complexity grow quadratically in the number of ports
  • Buses provide minimal connectivity and do not provide scalable performance
  • Network topologies span a spectrum of solutions that trade-off cost, performance (latency & bandwidth), reliability, and implementation complexity
direct networks
Direct Networks
  • Fixed degree
  • Modular
  • Topologies
    • Meshes
    • Multidimensional tori
    • Special case of tori – the binary hypercube
indirect networks

0000

0001

1110

1111

Indirect Networks
  • Indirect networks
    • uniform base latency
    • centralized or distributed control
    • Engineering approximations to direct networks

Multistage Network

Backward

Forward

Fat Tree Network

Bandwidth increases as you go up the tree

specific mins
Switch sizes and interstage interconnect establish distinct MINS

Majority of interesting MINs have been shown to be topologically equivalent

Specific MINs

000

000

000

000

000

000

001

001

001

001

001

001

010

010

010

010

010

010

011

011

011

011

011

011

100

100

100

100

100

100

101

101

101

101

101

101

110

110

110

110

110

110

111

111

111

111

111

111

evaluation metrics
Evaluation Metrics
  • Latency
    • Message transit time
    • Determined by switching technique and traffic patterns
  • Node degree (channel width)
    • Number of input/output channels
    • This metric is determined by packaging constraints
      • pin/wiring constraints
  • Diameter
  • Path diversity
    • A measure of reliability
evaluation metrics27
Evaluation Metrics

bisection

  • Bisection bandwidth
    • This is minimum bandwidth across any bisection of the network
  • Bisection bandwidth is a limiting attribute of performance
latency under contention
Latency Under Contention

32-ary 2-cube vs.

10-ary 3 cube

deadlock and livelock32
Deadlock freedom can be ensured by enforcing constraints

For example, following dimension order routing in 2D meshes

Deadlock and Livelock

router

Virtual Channel

occurrence of deadlock
Occurrence of Deadlock

3

1

4

2

  • Deadlock is caused by dependencies between buffers
deadlock avoidance principle
Deadlock Avoidance: Principle
  • Deadlock is caused by dependencies between buffers
routing constraints on virtual channels
Routing Constraints on Virtual Channels
  • Add multiple virtual channels to each physical channel
  • Place routing restrictions between virtual channels
routing protocols
Routing Protocols

Routing Algorithms

Number of Destinations

Unicast Routing

Multicast Routing

Routing Decisions

Centralized Routing

Source Routing

Distributed Routing

Multiphase Routing

Implementation

Table Lookup

Finite State Machine

Adaptivity

Deterministic Routing

Adaptive Routing

Progressiveness

Progressive

Backtracking

Minimality

Profitable

Misrouting

Number of Paths

Complete

Partial

Source: J. Duato, S. Yalamanchili, and L. Ni, “Interconnection Networks,” Morgan Kaufman 2003.

key routing categories
Key Routing Categories
  • Deterministic
    • The path is fixed by the source destination pair
  • Source Routing
    • Path is looked up prior to message injection
    • May differ each time the network and NIs are initialized
  • Adaptive routing
    • Path is determined by run-time network conditions
  • Unicast
    • Single source to single destination
  • Multicast
    • Single source to multiple destinations
generic router architecture42

From/to local processor

Input queues

(virtual channels)

Output queues

(virtual channels)

mux

Switch

Physical input channels

Physical output channels

mux

Address

decoder

Generic Router Architecture
the message layer
The Message Layer
  • Message layer background
    • Cluster computers
    • Myrinet SAN
    • Design properties
  • End-to-End communication path
    • Injection
    • Network transmission
    • Ejection
  • Overall performance
cluster computers

CPU

CPU

CPU

Memory

Memory

Memory

CPU

Memory

I/O Bus

I/O Bus

I/O Bus

I/O Bus

Network

Interface

Network

Interface

Network

Interface

Network

Interface

Network

Cluster Computers
  • Cost-effective alternative to supercomputers
    • Number of commodity workstations
    • Specialized network hardware and software
  • Result: Large pool of host processors

Courtesy of C. Ulmer

myrinet

CPU

NI

CPU

NI

CPU

NI

X

X

CPU

NI

X

CPU

NI

CPU

CPU

NI

NI

Myrinet
  • Descendant of Caltech Mosaic project
    • Wormhole network
    • Source routing
    • High-speed, Ultra-reliable network
    • Configurable topology: Switches, NICs, and cables

Courtesy of C. Ulmer

myrinet switches links

Fiber

Backplane

Fiber

X

X

X

X

Fiber

16

Xbar

To

Backplane

Fiber

Fiber

Line

Cards

X

X

X

Fiber

X

X

X

X

X

Line

Card

Fiber

Fiber

16 Port

Xbar

8 Hosts / Line Card

Myrinet Switches & Links
  • 16 Port crossbar chip
    • 2.0+2.0 Gbps per port
    • ~300 ns Latency
  • Line card
    • 8 Network ports
    • 8 Backplane ports
  • Backplane cabinet
    • 17 line card slots
    • 128 Hosts

Courtesy of C. Ulmer

myrinet ni architecture
Myrinet NI Architecture
  • Custom RISC CPU
    • 33-200MHz
    • Big endian
    • gcc is available
  • SRAM
    • 1-9MB
    • No CPU cache
  • DMA Engines
    • PCI / SRAM
    • SRAM / Tx
    • Rx / SRAM

SRAM

RISC

CPU

PCI

Host

DMA

SAN

DMA

Tx

Rx

LANai Processor

Network Interface Card

Courtesy of C. Ulmer

message layers

Message Layers

Courtesy of C. Ulmer

message layer communication software

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Cluster Message Layer

“Message Layer” Communication Software
  • Message layers are enabling technology for clusters
    • Enable cluster to function as single image multiprocessor system
    • Responsible for transferring messages between resources
    • Hide hardware details from end users

Courtesy of C. Ulmer

message layer design issues
Message Layer Design Issues
  • Performance is critical
    • Competing with SMPs, where overhead is <1us
  • Use every trick to get performance
    • Single cluster user --remove device sharing overhead
    • Little protection --co-operative environment
    • Reliable hardware --optimize for common case of few errors
    • Smart hardware --offload host communication
    • Arch hacks --x86 is a turkey, use MMX, SSE, WC..

Courtesy of C. Ulmer

message layer organization

Device Driver

  • Physical access
  • DMA transfers
  • ISR

Communication Library

  • Maintains cluster info
  • Message passing API
  • Device interface

Firmware

  • Monitor network wire
  • Send/Receive messages
Message Layer Organization

User-space Application

Kernel

NI Device

Driver

User-space

Message Layer Library

NI Firmware

Courtesy of C. Ulmer

end user s perspective

Message Layer

send( dest,

data,

size )

Msg = extract();

Msg

End User’s Perspective

Processor A

Processor B

Msg

Courtesy of C. Ulmer

end to end communication path

Message Passing

Remote Memory Operations

End-to-End Communication Path
  • Three phases of data transfer
    • Injection
    • Network
    • Ejection

CPU

CPU

Memory

Memory

2

1

3

NI

SAN

NI

Source

Destination

Courtesy of C. Ulmer

tpil performance lanai 9 ni with pentium iii 550 mhz host
TPIL Performance: LANai 9 NI with Pentium III-550 MHz Host

Bandwidth (MBytes/s)

Injection Size (Bytes)

Courtesy of C. Ulmer

the message path
The Message Path
  • Wire bandwidth is not the bottleneck!
  • Operating system and/or user level software limits performance

M

M

CPU

CPU

PCI

PCI

OS

OS

PCI

PCI

Memory

Memory

NI

NI

Network

universal performance metrics

Sender

Overhead

Transmission time

(size ÷ bandwidth)

From Patterson, CS252, UCB

Universal Performance Metrics

Sender

(processor

busy)

Time of

Flight

Transmission time

(size ÷ bandwidth)

Receiver

Overhead

Receiver

(processor

busy)

Transport Latency

Total Latency

Total Latency = Sender Overhead + Time of Flight+

Message Size ÷ BW +Receiver Overhead

Includes header/trailer in BW calculation?

simplified latency model

From Patterson, CS252, UCB

Simplified Latency Model
  • Total Latency ­ Overhead +Message Size / BW
  • Overhead = Sender Overhead + Time of Flight +

Receiver Overhead

  • Can relate overhead to network bandwidth utilization
scalable switching fabrics for internet routers
Scalable Switching Fabrics for Internet Routers

Router

  • Internet bandwidth growth  routers with
    • large numbers of ports
    • high bisection bandwidth
  • Historically these solutions have used
    • Backplanes
    • Crossbar switches
  • White paper: Scalable Switching Fabrics for Internet Routers, by W. J. Dally, http: //www.avici.com/technology/whitepapers/
requirements
Requirements
  • Scalable
    • Incremental
    • Economical  cost linear in the number of nodes
  • Robust
    • Fault tolerant  path diversity + reconfiguration
    • Non-blocking features
  • Performance
    • High bisection bandwidth
    • Quality of Service (QoS)
      • Bounded delay
switching fabric
Switching Fabric
  • Three components
    • Topology  3D torus
    • Routing  source routing with randomization
    • Flow control  virtual channels and virtual networks
  • Maximum configuration: 14 x 8 x 5 = 560
  • Channel speed is 10 Gbps
packaging
Packaging
  • Uniformly short wires between adjacent nodes
    • Can be built in passive backplanes
    • Run at high speed
      • Bandwidth inversely proportional to square of wire length
    • Cabling costs
    • Power costs

Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)

properties
Properties
  • Path diversity
    • Avoids tree saturation
    • Edge disjoint paths for fault tolerance
      • Heart beat checks (100 microsecs) + deflecting while tables are updated

Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)

properties65
Properties

Figures are from Scalable Switching Fabrics for Internet Routers, by W. J. Dally (can be found at www.avici.com)

use of virtual channels
Use of Virtual Channels
  • Virtual channels aggregated into virtual networks
    • Two networks for each output port
  • Distinct networks prevent undesirable coupling
    • Only bandwidth on a link is shared
    • Fair arbitration mechanisms
  • Distinct networks enable QoS constraints to be met
    • Separate best effort and constant bit rate traffic
summary
Summary
  • Distinguish between traditional networking and high performance multiprocessor communication
  • Hierarchy of implementations
    • Physical, switching and routing
    • Protocol families and protocol layers (the protocol stack)
  • Datapath and architecture of the switches
  • Metrics
    • Bisection bandwidth
    • Reliability
    • Traditional latency and bandwidth