1 / 30

300 likes | 433 Views

Wire Efficient Topology. In the 3D world. For n nodes, bisection area is O(n 2/3 ) For large n, bisection bandwidth is limited to O(n 2/3 ) Bill Dally, IEEE TPDS, [Dal90a] For fixed bisection bandwidth, low-dimensional k-ary n-cubes are better (otherwise higher is better)

Download Presentation
## Wire Efficient Topology

**An Image/Link below is provided (as is) to download presentation**
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.
Content is provided to you AS IS for your information and personal use only.
Download presentation by click this link.
While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

**In the 3D world**• For n nodes, bisection area is O(n2/3 ) • For large n, bisection bandwidth is limited to O(n2/3 ) • Bill Dally, IEEE TPDS, [Dal90a] • For fixed bisection bandwidth, low-dimensional k-ary n-cubes are better (otherwise higher is better) • i.e., a few short fat wires are better than many long thin wires**Equal cost in k-ary d-cubes**• Equal number of nodes? • Equal number of pins/wires? • Equal bisection bandwidth? • Equal area? Equal wire length? What do we know? • switch degree: d diameter = dk/2 • total links = Nd • pins per node = 2wd • bisection = 2kd-1= 2N/k links in each directions • 2Nw/k wires cross the middle**Latency(d) for N with Equal Link Width**• total links(N) = Nd • switch degree, number of channels, and channel length : for free**Latency with Equal Pin Count (total # of wires per node**constant) • Baseline d=2, has w = 32 (128 wires per node) • fix2dw pins => w(d) = 64/d: routing delayD=2 cyc • With more dimension, more channels, but thinner • distance down with d, but channel time increases**Wire Efficient Topology**• Selecting a network that gives us the lowest latency, T(0,L), for a given wiring density, BW, number of nodes, N, and message length, L. • trade off channel bandwidth Wagainst diameter D • wormhole routing makes latency the sum of two components D and L/W • the zero-load latency of the network • assuming a constant BW(N=2Nw/k), average distance : (k-1)d/2, channel width k/2 • the minimum latency occurs at a smaller d (larger k) than the point where D = L/W**Latency with Equal Bisection Width**Distance dominatesMessage length dominates Low-dimensional networksare even more attractive than this comparison indicates - It is not feasible to scale Bw linearly with N Bw = Nr: 0.5 £r£ 0.66 is more reasonable. - This comparison assumes that all channels are the same speed. Lower dimensional networks have shorter wires. - Low-dimensional networks better exploit locality. - Router complexity is proportional to the number of dimensions. T 256 nodes 160 140 120 100 80 60 40 20 16K nodes 1M nodes 0 5 10 15 20 d L=150, Bw =N**Larger Routing Delay (w/ equal pin)**• Dally’s conclusions strongly influenced by assumption of small routing delay D=20 D=2**Latency under Contention**• Optimal packet size? Channel utilization? 1000 nodes distance m4-- Message size: 4 phits Equal channel width N=1024(d2,k32), 1000(d3,k10)**Saturation**• Fatter links shorten queuing delays**350**300 D=2 250 200 Latency 150 100 50 n8, d3, k10 n8, d2, k32 0 0 0.05 0.1 0.15 0.2 0.25 Phits per cycle per processor Phits per cycle • higher degree network has larger available bandwidth • cost?**Discussion**• Rich set of topological alternatives with deep relationships • Design point depends heavily on cost model • nodes, pins, area, ... • Wire length or wire delay metrics favor small dimension • Long (pipelined) links increase optimal dimension • Need a consistent framework and analysis to separate opinion from design • Optimal point changes with technology**Outline**• Routing • Switch Design • Flow Control • Case Studies**Routing**• Recall: routing algorithm determines • which of the possible paths are used as routes • how the route is determined • R: N x N -> C, which at each switch maps the destination node nd to the next channel on the route • Issues: • Routing mechanism • arithmetic • source-based port select • table driven • general computation • Properties of the routes • Deadlock free**Routing Mechanism**• need to select output port for each input packet • in a few cycles • Simple arithmetic in regular topologies • ex: Dx, Dy routing in a grid • west (-x) Dx < 0 • east (+x) Dx > 0 • south (-y) Dx = 0, Dy < 0 • north (+y) Dx = 0, Dy > 0 • processor Dx = 0, Dy = 0 • Reduce relative address of each dimension in order • Dimension-order routing in k-ary d-cubes • e-cube routing in n-cube**Routing Mechanism (cont)**• Source-based • message header carries series of port selects • used and stripped en route • CRC? Packet Format? • CS-2, Myrinet, MIT Artic • Table-driven • message header carried index for next port at next switch o = R[i] • table also gives index for following hop o, i’ = R[i ] • ATM, HPPI P3 P2 P1 P0**Properties of Routing Algorithms**• Deterministic • route determined by (source, dest), not intermediate state (i.e. traffic) • Adaptive • route influenced by traffic along the way • Minimal • only selects shortest paths • Non-minimal • Deadlock free • no traffic pattern can lead to a situation where no packets move forward**Deadlock Freedom**• How can it arise? • necessary conditions: • shared resource • cyclic dependency • incrementally allocated • non-preemptible • think of a channel as a shared resource that is acquired incrementally • source buffer then dest. buffer • channels along a route • How do you avoid it? • constrain how channel resources are allocated • ex: dimension order • How do you prove that a routing algorithm is deadlock free?**Proof Technique**• resources are logically associated with channels • messages introduce dependences between resources as they move forward • need to articulate the possible dependences that can arise between channels • show that there are no cycles in Channel Dependence Graph • find a numbering of channel resources such that every legal route follows a monotonic sequence => no traffic pattern can lead to deadlock • network need not be acyclic, on channel dependence graph**Example: k-ary 2D array**• Thm: x,y routing is deadlock free • Numbering • +x channel (i,y) -> (i+1,y) gets i+1 • similarly for -x with 0 as most positive edge • +y channel (x,j) -> (x,j+1) gets N+j+1 • similary for -y channels • any routing sequence: x direction, turn, y direction is increasing**More examples:**• Why is the obvious routing on X deadlock free? • butterfly? • tree? • fat tree? • Any assumptions about routing mechanism? amount of buffering? • What about wormhole routing on a ring? 2 1 0 3 7 4 6 5**Deadlock free wormhole networks?**• Basic dimension order routing techniques don’t work for k-ary d-cubes • only for k-ary d-arrays (bi-directional) • Idea: add channels! • provide multiple “virtual channels” to break the dependence cycle • good for BW too! • Do not need to add links, or xbar, only buffer resources • This adds nodes the the CDG, remove edges?**Up*-Down* routing**• Given any bidirectional network • Construct a spanning tree • Number of the nodes increasing from leaves to roots • UP increase node numbers • Any Source -> Dest by UP*-DOWN* route • up edges, single turn, down edges • Performance? • Some numberings and routes much better than others • interacts with topology in strange ways**Turn Restrictions in X,Y**• XY routing forbids 4 of 8 turns and leaves no room for adaptive routing • Can you allow more turns and still be deadlock free +Y -X +X -Y**Minimal turn restrictions in 2D**+y -x +x north-last -y negative first**Example legal west-first routes**• Can route around failures or congestion • Can combine turn restrictions with virtual channels**Adaptive Routing**• Essential for fault tolerance • at least multipath • Can improve utilization of the network • Simple deterministic algorithms easily run into bad permutations • fully/partially adaptive, minimal/non-minimal • can introduce complexity or anomalies • little adaptation goes a long way!

More Related