1 / 56

CoFlow Scheduling: Optimizing Data Transfer in Cloud Data Centers

Explore the concept of CoFlow scheduling and its application in optimizing data transfer in cloud data centers. Learn about different scheduling algorithms and their impact on performance.

mahmed
Download Presentation

CoFlow Scheduling: Optimizing Data Transfer in Cloud Data Centers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS434/534: Topics in Network SystemsCoFlow Scheduling;Network Function Virtualization: ClickYang (Richard) YangComputer Science DepartmentYale University208A WatsonEmail: yry@cs.yale.eduhttp://zoo.cs.yale.edu/classes/cs434/

  2. Outline • Admin and recap • Cloud data center (CDC) applications/services • Fine-grained dataflow programming (e.g., Web apps) • Coarse-grained dataflow (e.g., data analytics) • Distributed machine learning using parameter server • DC cluster resource scheduling • DC transport scheduling • Overview • DCTCP • Fastpass • WAN scheduling • CoFlow scheduling • Carrier cloud • overview • network function programming • click

  3. Admin • Instructor office hours • Tuesday: 1:30-2:30 • Thursday: 3:00-4:00 • Fridays: 1:30-2:30 pm • Projects • Milestones (exactly 4 weeks left) • 4/17 (T+1 week): Finish reading major related work; a google doc listing related papers (at least 4 papers) • 4/24 (T+2 weeks): Finish architecture design ( slides/write up of architecture, including all key components) • 5/1 (T+3 weeks): Initial, preliminary evaluations (slides/write up, about experiment/analysis setup) • 5/8 (T+4 weeks) 5:30 pm, final report due • Remaining topics

  4. Recap: WAN Scheduling • Wide-area links are typically more expensive, less reliable and hence need better resource scheduling

  5. Recap: WAN Scheduling • Google B4 and MS SWAN address quite common issues but give quite different designs • How to specify resource requirements? • Bandwidth share (B4), SWAN (priority) • How to allocate bandwidth, compute routes? • B4 progressive filling (maxmin fairness) and path generation • SWAN 15 paths, approximate max-min fairness • How to scale? • B4 (FG at site level -> super node level); SWAN approximate alg, pre-generate paths • How to update? • Both consider consistency, SWAN introduces a notion called scratch bandwidth

  6. Recap: Solution Space Flow: Transfer of data from a source to a destination in a data center CSFQ D3 PDQ pFabric WFQ DeTail GPS RED ECN XCP RCP DCTCP D2TCP FCP 1980s 1990s 2000s 2005 2010 2015 Per-Flow Fairness Flow Completion Time Where does B4/SWAN fit?

  7. Outline • Admin and recap • Cloud data center (CDC) applications/services • Fine-grained dataflow programming (e.g., Web apps) • Coarse-grained dataflow (e.g., data analytics) • Distributed machine learning using parameter server • DC cluster resource scheduling • DC transport scheduling • Overview • DCTCP • Fastpass • WAN scheduling • CoFlow scheduling

  8. Data Parallel Applications • Structure: • Multi-stage dataflow • Computation interleaved with communication • Computation Stage (e.g., Map, Reduce) • Distributed across many machines • Tasks run in parallel • Communication Stage (e.g., Shuffle) • Between successive computation stages Reduce Stage A communication stage cannot complete until all the data have been transferred Shuffle Map Stage 8

  9. Per-flow vs Co-flow Scheduling Coflow 2 Coflow 1 Link 2 6 Units Link 1 2 Units 3 Units Smallest-Flow First1,2 The Optimal Fair Sharing L2 L2 L2 L1 L1 L1 Coflow1 comp. time = 3 Coflow2 comp. time = 6 Coflow1 comp. time = 5 Coflow2 comp. time = 6 Coflow1 comp. time = 5 Coflow2 comp. time = 6 1. Finishing Flows Quickly with Preemptive Scheduling, SIGCOMM’2012. 2. pFabric: Minimal Near-Optimal Datacenter Transport, SIGCOMM’2013. 2 2 2 6 6 6 4 4 4 time time time

  10. Example CoFlows • Q: What is a co-flow? Broadcast All-to-All Aggregation Single Flow Parallel Flows Shuffle

  11. CoFlow API

  12. Centralized master-slave architecture • Applications use a client library to communicate with the master Coflow transport timing and rates are determined by the coflow scheduler Architecture Sender Receiver Driver Put Get Reg Varys Daemon Varys Daemon Varys Daemon Varys Master Network Interface Usage Estimator Topology Monitor (Distributed) File System Coflow Scheduler TaskName Comp. Tasks calling Varys Client Library f

  13. Discussion • How may you compute the rate allocation to schedule a coflow?

  14. Varys Scheduling Architecture • A two step algorithm • Ordering Heuristic: Typically scheduling algorithms are based on ordering, i.e., considering the jobs in some order (called permutation scheduling) • Discussion: what are potential metrics to order the coflows? • Rate allocation algorithm: Allocates minimum required resources to each coflowto finish in minimum time

  15. Varys Ordering Metric: Bottleneck • Assume bottleneck only at the (ingress or egress) edge • Assume dij for amount of data from i to j • An estimation of finishing time for the coflow:

  16. Allocation Algorithm • Given estimation of finishing time for a coflow: • How much to allocate rate to each member flow (dij) of the coflow?

  17. 19

  18. Discussion • What do you take away from the Varys design? • What are issues of Varys?

  19. Key Issues of Varys • Assumes • Size of each flow is known • But pipelining may change the size • The total number of flows is known • But real system conducts speculative execution • The endpoints are known • But failure recovery may change endpoints

  20. Outline • Admin and recap • Cloud data center (CDC) applications/services • Fine-grained dataflow programming (e.g., Web apps) • Coarse-grained dataflow (e.g., data analytics) • Distributed machine learning using parameter server • DC cluster resource scheduling • DC transport scheduling • Overview • DCTCP • Fastpass • WAN scheduling • CoFlow scheduling • Varys • Aalo

  21. Aalo Goal • Dynamic scheduling of coflows according to their current states • an online framework • non-blocking API • no longer wait for all flows’ info to be registered to start

  22. Aalo Design I: Least-Attained Service (LAS) Ordering • Prioritize coflow that has sent the least total number of bytes • The more a coflow has sent, the lower its priority => Smaller coflows finish faster • Problems: • can lead to starvation • suboptimal for similar-size coflows

  23. Suboptimalfor SimilarCoflows Coflow1 Coflow2 • Reducestofairsharing • Doesn’tminimizeaveragecompletiontime 2 4 6 time Coflow1comp.time=6 Coflow2comp.time=6 • FIFOworkswellforsimilarcoflows • Optimalwhencflowsareidentical 2 4 6 time Coflow1comp.time= 3 Coflow2comp.time= 6

  24. Betweena“Rock”anda“HardPlace” FIFOschedule similarcoflows (not ping-pong among them) Prioritizeacross dissimilar(elephant and mice) coflows

  25. Aalo Idea: DiscretizedCoflow-Aware LAS(D-CLAS) Lowest- PriorityQueue Prioritydiscretization •Changeprioritywhentotal#ofbytessent exceedspredefinedthresholds Schedulingpolicies •FIFOwithinthesamequeue •Prioritizationacrossqueue Weightedsharingacrossqueues •Guaranteesstarvationavoidance FIFO QK … FIFO Q 2 FIFO Q1 Highest- Priority Queue

  26. Aalo DiscretizePriorities Q: Which queue is a coflow when it just first starts? Lowest- PriorityQueue FIFO QK Exponentiallyspacedthresholds:A×Ei •A, E:constants •1 ≤i≤K:thresholdconstant •K:numberofthequeues AEK-1 ∞ … FIFO Q2 AE2 AE FIFO Q1 Highest- AE 0 Priority Queue

  27. Remaining Issue: ComputingTotal#ofBytes Sent by a CoFlow • Why an issue? • D-CLAS requires to know total # of bytes sent over all flows of a coflow, but such distributed aggregation can be challenging over small time scales • D-LAS has worse performance Coflow2 Coflow1 6Units Link2 3Units 2Units Link1 D-LAS (decisionon#ofbytessentlocally) D-CLAS L2 L2 L1 L1 2 6 4 2 6 4 time time Coflow1comp.time= 3 Coflow2comp.time= 6 Coflow1comp.time=6 Coflow2comp.time=6

  28. AaloArchitecture Worker Coordinator milliseconds Sender1 Worker µs Worker Sender2 D-CLAS NetworkInterface Local/Global Scheduling Timescale

  29. Varys vs Aalo Similarforlargecoflowsbecausetheyarein slow- movingqueues Performancelossfor mediumcoflows bymischedulingthem 1 FractionofCoflows 0.5 Varys Varys Non-ClairvoyantScheduler Aalo 0 0.01 0.1 1 10 100 CoflowCompletionTime(Seconds) 1000 Improvementsforsmall coflows

  30. Offline Reading: Sincronia

  31. Summary: Cloud Datacenter Programming and Resource Scheduling • Key issues • Programming models • Acceleration/performance scaling techniques • Control architecture • Resource isolation (scheduling) mechanisms • Security isolation

  32. Summary: DC Programming Models • Noria • MapReduce

  33. Summary: DC Programming Models • Spark • Parameter server

  34. Summary: Acceleration Techniques • Local scheduling (MapReduce) • Working set in the memory (Spark) • Pipelining (spark) • Caching (Noria) • …

  35. Summary: DC Orchestration/Control Architecture

  36. Summary: DC Resource Scheduling Mechanisms • Cluster • Delay scheduling • Spark stage based scheduling • Fairness • Max-min fairness • Hadoop hierarchical scheduling • Dominant resource fairness (DRF) • Coflow scheduling • Transport scheduling • DCTCP • Fastpass • WAN scheduling • Coflow scheduling

  37. Summary: DC Security Isolation • Not really covered, assuming VMs and/or containers

  38. Outline • Admin and recap • Cloud data center (CDC) • Carrier cloud

  39. Major Trend • Programming and managing carrier network infrastructures (CN) in a similar way to programming and managing DC • Convert from expensive, hardware-centric CN architecture to software-centric CN architecture • Functions deployed called network functions • May have an major impact on existing carrier networks • Essential to 5G network architecture

  40. 5G Network Architecture

  41. Bigger Picture Cellular access Residential access ISP ISP Backbone ISP data center Campus access, e.g.,EthernetWiFi

  42. Discussion • What functions/apps/services may run in carrier clouds?

  43. Example: Cellular Architecture • MME/HSS: authentication, mobility management, … • PCRF: charging instruction, QoS info, … • SGW, PGW: standard network functions, NAT, QoS, policing, firewall, content cache, parent control, transcoding, …

  44. Discussion • What are some major differences between cloud data center and carrier network cloud?

  45. Discussion: Carrier Cloud Programming and Resource Scheduling • What may the following look like in carrier cloud? • Programming models • Acceleration/performance scaling techniques • Control architecture • Resource isolation (scheduling) mechanisms • Security isolation

  46. Outline • Admin and recap • Cloud data center (CDC) applications/services • Carrier cloud • overview • network function programming • click

  47. Click • One of the first major, modular network programming models • Highly influential for later designs

  48. Click Design Goals • Focus on single device • Flexibility • easy to add new features • enable experimentation • Openness • allow users/researchers to build and extend (In contrast to most commercial routers) • Modularity • Simplify the reuse and composition of existing features • Speed/efficiency • Run in OS Controlplane User-levelroutingdaemons Linuxkernel Click Forwardingplane

  49. Click Programming Structure: A Graph of Network Elements • Large number of small elements • Each performing a simple packet function • E.g., IP look-up, TTL decrement, buffering • Connected together in a directed graph • Elements inputs/outputs snapped together • Packet flow through a graph as main organizational primitive • Construct different graphs using the same element as the main reusability primitive

  50. Click Elements and Graph Specification

More Related