1 / 24

DARD : D istributed A daptive R outing for D atacenter Networks

DARD : D istributed A daptive R outing for D atacenter Networks. Xin Wu, Xiaowei Yang. Multiple equal cost paths in DCN. core. A gg. ToR. s rc. d st. pod. Scale-out topology -> Horizontal expansion -> More paths. Suboptimal scheduling -> hot spot. s rc 1. dst 1. dst 2. s rc 2.

marcos
Download Presentation

DARD : D istributed A daptive R outing for D atacenter Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. DARD: Distributed Adaptive Routing for Datacenter Networks Xin Wu, Xiaowei Yang

  2. Multiple equal cost paths in DCN core Agg ToR src dst pod • Scale-out topology -> Horizontal expansion -> More paths

  3. Suboptimal scheduling -> hot spot src1 dst1 dst2 src2 • Unavoidable intra-datacenter traffic • Common services: DNS, search, storage • Auto-scaling: dynamic application instances

  4. To prevent hot spots • Distributed • ECMP & VL2: flow-level hashing in switches • Centralized • Hedera: compute optimal scheduling in ONE server Design Space Distributed: RobustbutNot Efficient Centralized: Efficient but Not Robust

  5. Goal: practical, efficient, robust • Practical • Using well-proven technologies • Efficient • Close to optimal traffic scheduling • Robust • No single point failure Design Space Centralized: Efficient but Not Robust Distributed: RobustbutNot Efficient Distributed: RobustandEfficient

  6. Contributions • Explore the possibility of distributedyet close-to-optimalflow scheduling in DCNs. • A working implementation in testbed. • Proven convergence upper bound.

  7. Intuition: minimize the maximum number of flows via a link dst1 src2 src3 src1 dst2 dst3 Step 0: maximum # of flows via a link = 3

  8. Intuition: minimize the maximum number of flows via a link dst1 src2 src3 src1 dst2 dst3 Step 1: maximum # of flows via a link = 2

  9. Intuition: minimize the maximum number of flows via a link dst1 src2 src3 src1 dst2 dst3 Step 2: maximum # of flows via a link = 1

  10. Architecture • Control loop runs on every server independently Monitor network states Compute next scheduling Change flow’s path

  11. Monitor network states • srcasks switches for the #_of_flowsand bandwidthof each link to dst. dst src • srcassemblies the link states to identify the most and least congested paths to dst.

  12. Distributed computation • Runs on every server 1.for each dst 2.{ 3.Pbusy: the most congested path from src to dst; 4.Pfree: the least congested path from src to dst; 5.if (moving one flow from pbusyto pfree won’t cause a more congested path than pbusy) 6. Move one flow from pbusyto pfree; 7. } • Steps to convergence is bounded

  13. Change path: using different src-dst pair core3 core2 core4 core1 3.0.0.0/8 4.0.0.0/8 2.0.0.0/8 1.0.0.0/8 agg1 1.1.0.0/16 2.1.0.0/16 agg2 agg1 tor2 tor1 1.1.1.0/24 2.1.1.0/24 3.1.1.0/24 4.1.1.0/24 tor1 src dst agg1’s down-hill table dst next hop 1.1.1.0/24 tor1 1.1.2.0/24 tor2 2.1.1.0/24 tor1 2.1.2.0/24 tor2 agg1’s up-hill table src next hop 1.0.0.0/8 core1 2.0.0.0/8 core2 src 1.1.1.2 2.1.1.2 3.1.1.2 4.1.1.2 dst 1.2.1.2 2.2.1.2 3.2.1.2 4.2.1.2 • src-dst address pair uniquely encodes a path • Static forwarding table

  14. Forwarding example: E2->E1 2.0.0.0/8 1.0.0.0/8 core1 agg2 agg1 tor2 tor1 1.1.1.2 1.2.1.2 E1 E2 agg1’s down-hill table dst next hop 1.1.1.0/24 tor1 1.1.2.0/24 tor2 2.1.1.0/24 tor1 2.1.2.0/24 tor2 agg1’s up-hill table src next hop 1.0.0.0/8 core1 2.0.0.0/8 core2 Packet header: src: 1.2.1.2, dst: 1.1.1.2

  15. Forwarding example: E1->E2 2.0.0.0/8 1.0.0.0/8 core1 agg2 agg1 tor2 tor1 1.1.1.2 1.2.1.2 E1 E2 agg1’s down-hill table dst next hop 1.1.1.0/24 tor1 1.1.2.0/24 tor2 2.1.1.0/24 tor1 2.1.2.0/24 tor2 agg1’s up-hill table src next hop 1.0.0.0/8 core1 2.0.0.0/8 core2 Packet header: src: 1.1.1.2, dst: 1.2.1.2

  16. Randomness: prevent path oscillation • Add a random time interval to the control cycle

  17. Implementation • DeterLabtestbed • 16-end-hosts fattree • Monitoring: OpenFlow API • Computation: daemon on end hosts • One NIC multiple addresses: IP alias • Static routes: OpenFlow forwarding table • Multipath: IP-in-IP encapsulation • ns-2 simulator • For different & larger topologies

  18. DARD fully utilizes the bisection bandwidth • Simulation, 1024-end-host fattree • pVLB: periodical flow-level VLB Bisection bandwidth (Gbps) Traffic Patterns

  19. DARD improves large file transfer time • Testbed, 16-end-host fattree Inter-pod dominant Intra-pod dominant random DARD vs. ECMP improvement # of new files per second

  20. DARD converges in 2~3 control cycles • Simulation, 1024-end-host fattree, static traffic patterns • One control cycle ≈ 10 seconds Inter-pod dominant Intra-pod dominant random Convergence time (seconds)

  21. Randomness prevents path oscillation • Simulation, 128-end-host fattree Intro-pod dominant random Inter-pod dominant Times a flow switches its paths

  22. DARD’s control overhead is bounded by the topology • control_traffic= #_of_serversx#_of_switches. • Simulation, 128-end-host fattree Control traffic (MB/s) DARD Hedera # of simultaneous flows

  23. Conclusion • DARD: Distributed Adaptive Routing for Datacenters • Practical: well-proven end-host-based technologies • Efficient: close to optimal traffic scheduling • Robust: no single point failure Monitor network states Compute next scheduling Change flow’s path

  24. Thank You! Questions and comments: xinwu@cs.duke.edu

More Related