1 / 30

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010 . Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego. Networks -on-Chip. Chip-multiprocessors ( CMPs ) increasingly popular 2D-mesh networks often used as on-chip fabric

ledell
Download Presentation

Destination-Based Adaptive Routing for 2D Mesh Networks ANCS 2010

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Destination-Based Adaptive Routing for 2D Mesh NetworksANCS 2010 Rohit Sunkam Ramanujam Bill Lin Electrical and Computer Engineering University of California, San Diego

  2. Networks-on-Chip • Chip-multiprocessors (CMPs) increasingly popular • 2D-mesh networks often used as on-chip fabric • Routing algorithm central in determining performance Intel 48-core data center on die (ISSCC 2010) Tilera Tile64

  3. Classes of Routing Algorithms • Oblivious routing • Simple and fast router designs • Poor load balancing under bursty traffic • Adaptive routing • Better performance (throughput, latency) • Better fault tolerance • Higher router complexity

  4. Related Work • Oblivious Routing [Valiant, ROMM, O1TURN, Optimal oblivious routing] • Optimize for worst and average-case performance • Adaptive routing commercially used in multiprocessors from IBM, Cray, Compaq • On-chip routing very different from off-chip: • Lower power • Lower area • Lower router complexity

  5. Outline • Introduction • Motivation • Destination-Based Adaptive Routing (DAR) • Evaluation

  6. Outline • Introduction • Motivation • Destination-Based Adaptive Routing (DAR) • Distributed delay measurement • Split ratio adaptation • Scaling • Evaluation

  7. Distributed Delay Measurement • A node maintains: • Per-destination traffic split ratio through candidate output ports: W[p][j] • Delay to next-hop router/ejection interface through each output port (N, S, E, W, Ej): l[p]

  8. Distributed Delay Measurement • Every node estimates average delay to all other nodes in the network 12 13 14 15 • Delay from 10 to itself, Avg10[10] = l10[Ej] Avg10[10] • Avg10[10] propagated to neighbors Avg10[10] Avg10[10] 8 9 10 11 • Nodes 6, 9, 14, 11 add local delay to Avg10[10] to compute delay to node 10 Avg10[10] 4 5 6 7 • For example, at node 9, L[E][10] = l[E] + Avg10[10] Avg9[10] = L[E][10] 0 1 2 3

  9. Distributed Delay Measurement • Every node estimates delay to all other nodes in the network • Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors Avg14[10] Avg14[10] 12 13 14 15 • For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg6[10] A[N][10] = Avg9[10] Avg11[10] Avg9[10] Avg9[10] 8 9 10 11 Avg9[10] Avg11[10] • Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] + l[E] L[N][10] = A[N][10] + l[N] Avg6[10] Avg6[10] 4 5 6 7 Avg6[10] • Finally, average delay from node 5 to node 10 is computed as: Avg5[10] = W[E][10]L[E][10] + W[N][10]L[N][10] 0 1 2 3

  10. Distributed Delay Measurement • Every node estimates delay to all other nodes in the network • Nodes 6, 9, 14, 11 propagate estimated delay to node 10 to upstream neighbors 12 13 14 15 • For example, node 5 receives two delay updates, from nodes 9 and 6 A[E][10] = Avg6[10] A[N][10] = Avg9[10] 8 9 10 11 • Node 5 adds local link delay to received delay update: L[E][10] = A[E][10] + l[E] L[N][10] = A[N][10] + l[N] 4 5 6 7 • Finally, average delay from node 5 to node 10 is computed as: Avg5[10] = W[E][10]L[E][10] + W[N][10]L[N][10] 0 1 2 3

  11. Outline • Introduction • Motivation • Destination-Based Adaptive Routing (DAR) • Distributed delay measurement • Split ratio adaptation • Scaling • Evaluation

  12. Adaptation of Split ratio • Objective: Equalize delay on candidate output ports • If only one candidate output, split ratio is 1 • If two candidate outputs, • Let ph be the port with higher delay to destination j • Let plbe the port with lower delay to destination j • W[ph][j] + W[pl][j] = 1 • Δ traffic shifted from phto plevery T cycles • Δproportional to (L[ph][j]-L[pl][j])/L[ph][j]

  13. Outline • Introduction • Motivation • Destination-Based Adaptive Routing (DAR) • Distributed delay measurement • Split ratio adaptation • Scaling • Evaluation

  14. Look-ahead Window • Node S maintains delay estimate for MxM window centered at S. • Any node outside window mapped to closest node within window • A packet’s look-ahead window shifts as it is routed from source to destination 75 75 21 21 21 21 25 25 71 71 75 75 78 18 18 18 18 28 28 68 68 78 78 75 75 21 21 21 21 25 25 71 71 75 75 78 78 18 18 18 18 28 28 68 68 78 78 81 81 15 15 B15 15 PB 31 81 81 84 84 12 12 12 12 34 34 62 S15 84 84 87 87 9 9 9 9 37 37 59 59 87 87 90 90 6 6 6 6 40 40 56 56 PC 90 93 93 3 3 3 3 PA 43 53 53 93 93 96 96 0 0 0 0 46 46 50 50 96 96 93 93 3 3 AC 3 43 43 53 53 CC 93 96 96 0 0 0 0 46 46 50 50 96 96

  15. Window Size • Destination D guaranteed to be within window when packet is (M-1)/2hops away from D • Intuition: Packet has (M-1)/2 hops to route around congestion hot spots • 7x7 look-ahead window in 16x16 mesh has comparable performance to DAR (equivalent to 31x31 look-ahead window)

  16. Outline • Introduction • Related work • Destination-Based Adaptive Routing (DAR) • Evaluation

  17. Experimental setup • Compare DAR with RCA-1D, RCA-quadrant, Local adaptive • SPLASH-2 benchmarks + synthetic traffic patterns (uniform, transpose, shuffle) • Cycle-accurate NoC simulator models 3-stage router pipeline • 8 VC, 5 flit deep • 1 VC used as escape VC for deadlock prevention

  18. Splash results – 7x7 mesh 41%

  19. Splash results – 7x7 mesh 65%

  20. Uniform traffic – 8x8 mesh

  21. Transpose traffic – 8x8 mesh

  22. Shuffle traffic – 8x8 mesh

  23. SDAR - 16x16 mesh, 7x7 window Average latency over 100 permutation traffic patterns at 18% injection load Network saturation statistics at 18% injection load

  24. Summary • Destination-based Adaptive Routing (DAR) for 2D mesh networks • Scalable DAR (SDAR) uses look-ahead window and easily scales to large networks • DAR outperforms existing adaptive and oblivious routing • SDAR achieves comparable performance with significantly less overheads

  25. Thank you!!

  26. Key implementation details • Simple router implementation: low storage, low bandwidth • Synchronize delay updates to reuse delay computation and weight adaptation hardware • Approximate computations to simplify implementation

  27. Router architecture – Kim et al DAC ‘05 Quadrant . . . VC-1 N Credits Port Pre-select VC-v S Override Preferred Output Registers Congestion Value Registers . . . E Routing Unit W VC Allocator Credits XB Allocator . . . VC-1 In N S E W Ej VC-v

  28. DAR Router Avg[0] Storage Overhead . . . . . . Latency Propagation Logic Overhead Avg[N-1] Destination λ . . . VC-1 N L[px][0] W[px, py][0] p[0] L[py][0] W[px, py][1] p[1] Adapt Weights Port Pre-select VC-v . . . . . . . . . S L[px][N-1] L[py][N-1] . . . W[px, py][N-1] p[N-1] Preferred output registers Per-destination Split ratios E l[0] cnt[0] Increment/ Decrement VC Allocator l[1] cnt[1] Latency measurement . . . . . . XB Allocator . . . W l[P-1] cnt[P-1] N S E W Ej Exponentially averaged local delay Local delay . . . VC-1 . . . In A[px][0] A[px][N-1] A[py][0] A[py][N-1] VC-v

  29. Distributed delay measurement • A node maintains: • Per-destination traffic split ratio through candidate output ports: W[p][j] • Delay to next-hop router/ejection interface through each output port (N, S, E, W, Ej): l[p] • Using updates received from downstream nodes, a node computes: • L[p][j]: Average delay from current node to node j through output port p • Avg[j]: Average delay from current node to node j

  30. Destination-based Adaptive Routing (DAR) • Every router maintains per-destination split ratios which control traffic distribution to output ports • Split ratios adjusted every T cycles based on measured delay to D through the two ports 1 1 D Low congestion 0.2 Moderate congestion 0.8 High congestion 0.7 S 0.3

More Related