1 / 45

Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks

Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks. ______________________________ John Kim, William J. Dally &Dennis Abts Presented by: Evan Su. Outline. Basic metrics Basic topologies Why high-radix Router microarchitecture High-radix topologies.

larya
Download Presentation

Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Flattened Butterfly: A Cost-Efficient Topology for High-Radix Networks ______________________________ John Kim, William J. Dally &Dennis Abts Presented by: Evan Su

  2. Outline • Basic metrics • Basic topologies • Why high-radix • Router microarchitecture • High-radix topologies

  3. Topology Overview • Interconnection networks used to connect processors and memories in multiprocessors, as switching fabrics for high-end routers and switches, and for connecting I/O devices. • Definition: determines arrangement of channels and nodes in the network (road map) • Often first step in network design

  4. Performance Metrics • Average Hop Count • Average Latency • throughput • Bisection Bandwidth

  5. Hop Count • The number of links traversed between source and destination

  6. Latency • Defined as the time it takes for a packet to traverse the network • Latency= Header latency + serialization latency • Header latency: head arrives at input port • Serialization: time for rest of the packet to catch up

  7. Throughput • Data rate (bits/sec) that the network accepts per input port • Offered load - % of capacity network accepts

  8. Bisection Bandwidth • Split N nodes into two groups of N/2 nodes such that the bandwidth between these two groups is minimum • Why is it relevant: if traffic is completely random, the probability of a message going across the two halves is ½- tells how much traffic a network can support ( ½ of total traffic bandwidth)

  9. Topology Examples Hypercube Grid Torus

  10. Why High Radix? • Definition: number of inputs/outputs for each router • For past 20 years, used low-radix k-aryn-cubes (torus) • Routers didn’t have enough bandwidth to support high radix • Network routers have growth curve that obeys Moore’s law • Bandwidth increased • Packet length stayed the same • Latency gone down

  11. Why High Radix? • Approximately an order of magnitude increase in bandwidth every 5 years • Bandwidth growth result of: • Increase in signaling rate • Increase in number of signals

  12. High-Radix Routers

  13. High-Radix vs. Low-Radix • Cost • Power dissipation • latency

  14. Cost • Increasing radix of routers monotonically reduces overall cost • Network cost proportional to total router bandwidth • Router pins • Connectors • For fixed bisection bandwidth, cost proportional to hop count • High-radix => lower hop count

  15. Cost

  16. Power • Power dissipated decreases with increasing radix • Power proportional to number of router nodes • As radix increases, hop count decreases and router nodes decrease as well • Independent of individual router node • Router power due to I/O circuits, switch bandwidth. • arbitration logic more complex with higher radix but negligible fraction of total power

  17. Latency • H = # hops • tr = delay in router • L = length of packets • b = channel bandwidth • B = total Bandwidth • k = radix • Bandwidth (B) is divided among 2k input and output channels so b = B/2k

  18. Aspect Ratio • Differentiate by dT/dk and set equal to zero • Expression on right side determines router radix that minimizes network latency

  19. Optimal Latency

  20. Optimal Latency

  21. Optimal Latency

  22. Router Microarchitecture (VC) • Route computation (RC) – based on info stored in header, select output port • Virtual-channel allocation (VA)- packet must gain exclusive access to virtual channel of output port • Switch allocation (SA)- if there is a free buffer in channel, flit can vie for access to crossbar • Switch traversal (ST) – transfers flit from input to output buffers

  23. Router Microarchitecture (VC)

  24. Microarchitecture for High-radix • Routing computation – linear function of bandwidth • VC Allocator – quadratic function of input/ output ports because take bids from all ports • Switch Allocator- quadratic function of ports

  25. Baseline Performance • Due to head of line blocking • Before, overprovision switch because low cost

  26. Fully buffered crossbar • Separate the queuing up • Had to compete for input and output of switch • With crosspoint, decouples two allocations, always make forward progress

  27. Fully buffered crossbar • Trade performance for cost • Crosspoint buffering dominates chip area (quadratic)

  28. Hierarchical crossbar • Using subswitches, area grows O(vk2 / p) • Decouples allocation, reduces HoL blocking V = inputs K = radix P = number of subswitches

  29. Hierarchical crossbar Uniform random traffic Worst case performance

  30. Okay! Back to Topologies • Butterfly • Clos • Flattened Butterfly

  31. Butterfly Network • K-ary n-fly: kn network nodes • Example: 2-ary 3-fly • Routing from 000 to 010 • Dest address used to directly route packet • Bit n used to select output port at stage n 0 1 0 0 0 00 10 20 1 1 2 2 01 11 21 3 3 4 4 02 12 22 5 5 6 6 03 13 23 7 7

  32. Butterfly Network • Pros • Low hop count: H = log k N • Cons • Deterministic routing/ no path diversity • Doesn’t exploit traffic locality

  33. Clos Network rxr input switch nxm input switch mxn output switch rxr input switch nxm input switch mxn output switch rxr input switch nxm input switch mxn output switch rxr input switch nxm input switch mxn output switch rxr input switch

  34. Clos Network • Butterfly folded back on itself • Pros • Path diversity (good performance on both benign and adversarial) • Cons • Double cost of butterfly • H = 2 log k (N)

  35. Folded Clos Network (Fat Tree) • Similar to Clos • Exploits locality

  36. Flattened Butterfly Network • Routers in each row are combined 4-ary 2-fly 2-ary 4 -fly

  37. Flattened Butterfly Network • Routers in each row are combined

  38. Flattened Butterfly Network • On benign traffic • Approaches performance/cost of Butterfly • ½ cost of Clos network • Eliminates redundant hops when no need for load balancing • On adversarial traffic • Matches cost/performance of folded Clos • Order of magnitude better performance than Butterfly • Use non-minimal global-adaptive routing

  39. Routing • In Figure, there are two minimal routes between node 0 (00002) and node 10 (10102). • In general, if two nodes a and b have addresses that differ in j digits, then there are j! minimal routes between a and b. • This path diversity derives from the fact that a packet routing in a flattened butterfly is able to traverse the dimensions in any order.

  40. Routing Algorithms uniform random traffic worst case traffic pattern VAL = Valiant’s non-minimal oblivious algorithm MIN = minimal adaptive , UGAL = non-minimal adaptive algorithm UGAL-S = UGAL using sequential allocation CLOS AD = non-minimal adaptive routing in a flattened Clos

  41. Routing Algorithms • Valiant – picks random middle node b, and routes minimally from s to b and ten b to d. achieves only ½ network capacity, regardless of traffic • Minimal Adaptive- chooses minimal route • Adaptive Clos – minimum routing in benign traffic, folded- Clos routing in adversarial

  42. Performance Comparison • To compare the performance, a network of node size 1024 is taken and is constructed using the following topology by maintaining a constant bisection bandwidth.

  43. Performance Comparison Uniform random traffic Worst-case traffic

  44. Cost Comparison

  45. Conclusion • Use high-radix routers to take advantage of increased router bandwidth • Flattened Butterfly exploits high-radix routers and global adaptive routing to give cost-effective network • Flattened butterfly has lower hop count than folded Clos and better path diversity than conventional Butterfly • On adversarial traffic, exploits global adaptive routing to match performance of folded Clos with ½ the cost

More Related