1 / 21

Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2

Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2. Henk Corporaal www.ics.ele.tue.nl/~heco/courses/aca h.corporaal@tue.nl TUEindhoven 2012. Multi-Processor design decision . We have already discussed: Shared memory versus Message passing

acarrillo
Download Presentation

Advanced Computer Architecture 5MD00 / 5Z032 Multi-Processing 2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced Computer Architecture5MD00 / 5Z032Multi-Processing 2 Henk Corporaal www.ics.ele.tue.nl/~heco/courses/aca h.corporaal@tue.nl TUEindhoven 2012

  2. Multi-Processor design decision We have already discussed: • Shared memory versus Message passing • Coherence, Consistency and Synchronization issues Other extremely important decisions: • Processing units: • Homogeneous versus Heterogeneous? • Generic versus Application specific ? • Interconnect: • Bus versus Network ? • Type (topology) of network • Focus on Performance, Power or Cost ? • Memory organization ? ACA H.Corporaal

  3. Homogeneous or Heterogeneous Global Network-on-Chip Local inter-node connections • Homogenous: • replication effect • easy to desing • fault tolerance can be built-in • process migration possible • solve realization issuesonce and for all (i.e. highly tuning of nodes and network) • less flexible nodes • Reasoning: future chips are memory dominated any way, so adding some logic to make all nodes equal does not matter ACA H.Corporaal

  4. Homogeneous or Heterogeneous • Heterogeneous • better fit to application domain • smaller increments • more costly design • no replication advantage ACA H.Corporaal

  5. Homogeneous or Heterogeneous • Middle of the road approach • Flexible tiles • Fixed tile structure at top level ACA H.Corporaal

  6. Example: NoC with 2x4 mesh routing network node node node node R R R R node node node node R R R R Bus (shared) or Network (switched) • Network: • claimed to be more scalable • no bus arbitration • point-to-point connections • but router overhead ACA H.Corporaal

  7. Historical Perspective • Early machines were: • Collection of microprocessors • Communication was performed using bi-directional queues between nearest neighbors • Messages were forwarded by processors on path • “Store and forward” networking • There was a strong emphasis on topology in algorithms, in order to minimize the number of hops => minimize time ACA H.Corporaal

  8. Design Characteristics of a Network • Topology (how things are connected): • Crossbar, ring, 2-D and 3-D meshes or torus, hypercube, tree, butterfly, perfect shuffle, .... • Routing algorithm (path used): • Example in 2D torus: all east-west then all north-south (avoids deadlock) • Switching strategy: • Circuit switching: full path reserved for entire message, like the telephone. • Packet switching: message broken into separately-routed packets, like the post office. • Flow control and buffering (what if there is congestion): • Stall, store data temporarily in buffers • re-route data to other nodes • tell source node to temporarily halt, discard, etc. • QoS guarantees • Error handling • etc, etc. ACA H.Corporaal

  9. Switch / Network Topology Topology determines: • Degree: number of links from a node • Diameter: max number of links crossed between nodes • Average distance: number of links to random destination • Bisection: minimum number of links that separate the network into two halves • Bisection bandwidth = link bandwidth * bisection ACA H.Corporaal

  10. Bisection Bandwidth • Bisection bandwidth: bandwidth across smallest cut that divides network into two equal halves • Bandwidth across “narrowest” part of the network not a bisection cut bisection cut bisection bw= link bw bisection bw = sqrt(n) * link bw • Bisection bandwidth is important for algorithms in which all processors need to communicate with all others ACA H.Corporaal

  11. Common Topologies Type Degree Diameter Ave Dist Bisection 1D mesh 2 N-1 N/3 1 2D mesh 4 2(N1/2 - 1) 2N1/2 / 3 N1/2 3D mesh 6 3(N1/3 - 1) 3N1/3 / 3 N2/3 nD mesh 2n n(N1/n - 1) nN1/n / 3 N(n-1) / n Ring 2 N/2 N/4 2 2D torus 4 N1/2 N1/2 / 2 2N1/2 Hypercube Log2N n=Log2N n/2 N/2 2D Tree 3 2Log2N ~2Log2 N 1 Crossbar N-1 1 1 N2/2 N = number of nodes, n = dimension ACA H.Corporaal

  12. Linear and Ring Topologies • Linear array • Diameter = n-1; average distance ~n/3 • Bisection bandwidth = 1 (in units of link bandwidth) • Torus or Ring • Diameter = n/2; average distance ~ n/4 • Bisection bandwidth = 2 • Natural for algorithms that work with 1D arrays ACA H.Corporaal

  13. Two dimensional mesh Diameter = 2 * (sqrt( n ) – 1) Bisection bandwidth = sqrt(n) Two dimensional torus Diameter = sqrt(n ) Bisection bandwidth = 2* sqrt(n) Meshes and Tori • Generalizes to higher dimensions • Natural for algorithms that work with 2D and/or 3D arrays ACA H.Corporaal

  14. 110 111 010 011 100 101 000 001 Hypercubes • Number of nodes n = 2d for dimension d • Diameter = d • Bisection bandwidth = n/2 0d 1d 2d 3d 4d • Popular in early machines (Intel iPSC, NCUBE, CM) • Lots of clever algorithms • Extension: k-ary n-cubes • Greycode addressing: • Each node connected to others with 1 bit different ACA H.Corporaal

  15. Trees • Diameter = log n. • Bisection bandwidth = 1 • Easy layout as planar graph • Many tree algorithms (e.g., summation) • Fat trees avoid bisection bandwidth problem: • More (or wider) links near top • Example: Thinking Machines CM-5 ACA H.Corporaal

  16. Fat Tree example • A multistage fat tree (CM-5) avoids congestion at the root node • Randomly assign packets to different paths on way up to spread the load • Increase degree near root, decrease congestion ACA H.Corporaal

  17. O 1 O 1 O 1 O 1 Butterflies with n = (k-1)2k switches • Connecting 2k processors, with Bisection bandwidth = 2*2k • Cost: lots of wires • 2log(k) hop-distance for all connections, however blocking possible • Used in BBN Butterfly • Natural for FFT Switch PE Multistage butterfly network: k=3 Butterfly switch ACA H.Corporaal

  18. Topologies in Real Machines older newer Many of these are approximations: E.g., the X1 is really a “quad bristled hypercube” and some of the fat trees are not as fat as they should be at the top ACA H.Corporaal

  19. More examples Hypercube 2D-Grid/Mesh 2D-Torus Assume 64 nodes: ACA H.Corporaal

  20. QoS: Quality-of-Service • Hard and Soft Real-time applications require QoS guarantees • Predicatable delays • Guaranteed throughput Issues: • Different inter processor traffic service types: • GT: guaranteed throughput / latency traffic • BE: best effort • Resource manager • interface between applications and platform resources (processing elements, network, memory, i/o) • Do we allow caches • software controlled ACA H.Corporaal

  21. Generic or Specialized? Computational Efficiency 1 pJ/op ACA H.Corporaal

More Related