1 / 40

A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers. Yousi Zheng Dept. of ECE, The Ohio State University E-mail: zheng.540@osu.edu. J oint work with Ness B. Shroff Prasun Sinha. MapReduce. Data Centers Facility containing a very large number of machines

sandro
Download Presentation

A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A New Analytical Technique for Designing Provably Efficient MapReduce Schedulers Yousi Zheng Dept. of ECE, The Ohio State University E-mail: zheng.540@osu.edu Joint work with Ness B. Shroff PrasunSinha

  2. MapReduce • Data Centers • Facility containing a very large number of machines • They process very large datasets: MapReduce • MapReduce • Framework for processing highly parallel problems • Developed (popularized) by Google • Nearly ubiquitous (Google, IBM, Facebook, Microsoft, Yahoo, Amazon, eBay, twitter…) • Used in a variety of different applications • Distributed Grep, distributed sort, AI, scientific computation, Large scale pdf generation, Geographical data, Image processing…

  3. MapReduce • Map phase: • Takes an input job and divides into many small sub-problems • Operations can run in parallel on potentially different machines • Reduce phase • Combines the output of Map • Occurs after the Map phase is completed • Runs on parallel machines… Goal:Schedule these Map and Reduce tasks in order to minimize the total flow time in the system

  4. Model • Time is slotted • N “Machines”: Each machine runs 1 unit of workload per slot • Scheduling Constraint: Reduce job starts after all tasks in the Map job are completed. Each Reduce task can have multiple units of workload Ri(k)units of workload for task k for job i Each jobiconsists of Map tasks and Reduce tasks Ri(k) Each Map task has 1 unit of workload Total Mitasks for Map job i n jobs over T slots Reduce job i 1 Map Mi … units of workload for Reduce job i … Data Center with N machines

  5. Types of Schedulers: Treatment of Reduce Tasks Machines • Preemptive • Reduce tasks can be interrupted at the end of a slot • Remaining workload in the task can be executed on any machine(s) • Reasonable if overhead of data-migration is small • Non-preemptive R2 Machine C Machine B R1 R1 Machine A 0 1 2 3 4 5 6 7 Time

  6. Types of Schedulers: Treatment of Reduce Tasks Machines • Preemptive • Reduce tasks can be interrupted at the end of a slot • Remaining workload in the task can be executed on any machine(s) • Reasonable if overhead of data-migration is small • Non-preemptive • Once a Reduce task is started, it can’t be interrupted till the end of the task • An individual Reduce task cannot be completed on different machines • But different tasks from same job can be assigned to different machines • Note: Since Map tasks have unit workload, they cannot be interrupted. • In practice Map tasks may be > 1 unit, but small R2 R2 Machine C Machine B R1 R1 Machine A 0 1 2 3 4 5 6 7 Time

  7. Flow-Time Minimization Problem: Preemptive Scenario Minimize flow-time Total # machines is N Total map workload is Mifor job i Total reduce workload is Rifor job i Scheduling Constraint

  8. Flow-Time Minimization ProblemNon-Preemptive Scenario Minimize flow-time Total # machines is N Total map workload is Mifor job i Total reduce workload is R(k)ifor task k in job i Non-preemptive Both Problems are NP-hard in the strong sense

  9. Competitive Ratio • : set of all possible schedulers (including non-causal) • : Total Flow time of scheduler (sum of FT of all jobs) • Define • Competitive ratio: A given online policy Sis said to have a competitive ratio of c if for any arrival pattern Theorem For any constant c0 and any online scheduler S, there are sequences of arrivals and workloads in the non-preemptive case, such that the competitive ratio c is greater than c0  No policy gives a finite competitive ratio

  10. Efficiency Ratio Analysis • Efficiency ratio: A given online policy Sis said to have an efficiency ratio of  if (over some probability space of arrivals) Theorem • If the workload of Reduce tasks of each job is light-tailed distributed, then • Any work-conserving scheduler has a constant efficiency ratio, for both preemptive and non-preemptive scenarios.

  11. So Far • No scheduler has a finite competitive ratio in non-preemptive scenario • An evaluation metric efficiency ratio is introduced • All work conserving schedulers have a finite efficiency ratio (g) for both preemptive & non-preemptive scenarios • But the performance of these schedulers could still be quite poor (g could be very large) • Key Question: Can we design a simple scheduler with a provably small efficiency ratio in the MapReduce paradigm? Preemptive scenario: Yes! (ASRPT)

  12. ASRPT: Available Shortest Remaining Processing Time • SRPT: An infeasible scheduler • Priority given to jobs with smallest remaining workload • In SRPT, Map and Reduce for a given job can be scheduled in the same time-slot (infeasible in reality) • SRPT results in a lower flow time than any feasible (including non-causal) scheduler • ASRPT • Map tasks are scheduled according to SRPT (or sooner) • By running SRPT in a virtual manner • Reduce tasks greedily fill up the remaining machines • In the order determined by the shortest remaining processing time

  13. ASRPT: Available Shortest Remaining Processing Time 4 4 Theorem: If the job arrivals and workload are i.i.d., then ASRPT has an efficiency ratio of 2 in the preemptive scenario. R1 R1 R1 Number of machines, N Number of machines, N 3 3 R1 2 2 M1 M1 R2 1 1 M2 R1 R2 M2 1 1 3 3 2 4 2 4 Time Time SRPT : Total Flow-time 4 units : Total Flow-time 5 units ASRPT

  14. Simulation • N = 100 machines, total time slots T = 800 • #Reduce tasks in each job is 10 • Job arrival process: Poisson; λ = 2 jobs per time slot. • Preemptive Scenario • Schedulers • ASRPT • Fair • FIFO • ASRPT has smaller efficiency ratio than Fair and FIFO.

  15. Conclusion • Investigated the flow time minimization problem inMapReduce schedulers • No online policy has a finitecompetitive ratio in non-preemptive scenarios • All work-conserving schedulers have constant efficiency ratios • Preemptive and non-preemptive scenarios. • A specific algorithm (ASRPT) can guarantee an efficiency ratio of 2 in the preemptive scenario. • Is efficiency ratio of ASRPT similarly small in non-pre-emptive case? • Efficiency Ratio based analysis provides worse case (w.p.1) guarantees (compared to competitive ratio), but gives more design flexibility. • Has the potential to be applied to other problems/domains

  16. Future Work • Consider Shuffle Phase • Introduce more information (e.g., dependency Graph) to the scheduler • So that all reduce tasks don’t wait until Map tasks are completed • Consider cost of data migration • Combine preemptive and non-preemptive scenarios • Consider multiple phases • With phase precedence (for complex processing) • Develop Green Energy solutions • Combine with sleep-wake scheduling • Consider renewable energy fluctuations

  17. Thank You!

  18. Backup Slides

  19. Main Results • Introduced a metric Efficiency Ratio to evaluate the performance of schedulers in MapReduce • Proved that no online scheduler has a finite competitive ratio • Guarantee with probability 1 • May have greater modeling implications for analysis of algorithms • Proved that Efficiency Ratio is bounded by a constant for anywork-conserving scheduler • For both bounded and light-tailed workload for Reduce phase • For both preemptive and non-preemptive scenarios • Developed a specific work-conserving algorithm (ASRPT) that has an efficiency ratio of 2 for preemptive scenario.

  20. Efficiency Ratio: Preemptive & Non-preemptive • For any work-conserving scheduling policy: Preemptive: Non-preemptive: • Similar result holds but with different constants ( ) wherep0= Prob (a slot has no job arrivals)

  21. MapReduce:FIFO scheduler R1 FCFS scheduler with 4 machines, 2 jobs Map must finish before Reduce can start M1 M2 R2 4 R1 Number of machines 3 Flow-time of a job: time spent by job in the system FT of Job 1: 3 time slots FT of Job 2: 3 time slots Total FT : 6 time slots 2 M2 M1 1 R1 R2 1 3 2 4 Time

  22. MapReduce: Smarter Scheduler R1 M1 “Smarter Scheduler” M2 R2 4 R1 Number of machines 3 Flow-time: time in the system FT of Job 1: 3 time slots FT of Job 2: 2 time slots Total FT: 5 time slots R2 2 M1 R1 1 M2 1 3 2 4 Time Goal: Minimize total flow-time

  23. Future Work • Consider Shuffle Phase • Introduce more information (e.g., dependency Graph) to the scheduler • So that all reduce tasks don’t wait until Map tasks are completed • Consider cost of data migration • Combine preemptive and non-preemptive scenarios • Consider multiple phases • With phase precedence (for complex processing) • Develop Green Energy solutions • Combine with sleep-wake scheduling • Consider renewable energy fluctuations

  24. Analysis Outline • Divide up time into repeating intervals • Interval corresponds to the consecutive times slots until a time-slot occurs with an idle machine(s) • Bound the moments of these intervals with constants • Bound the efficiency ratio using the moments • Analysis applies to: • Preemptive and non-preemptive scenarios • Allows jobs to have infinite but light-tailed support

  25. Preemptive Scenario: An Example Incomplete time-slot (all machines not used) Observations for a job arriving in interval j(work-conserving) • Map task finishes in interval j (otherwise last time-slot would be complete) • Reduce tasks finish in interval jor interval j+1 •  Flow time per job is bounded, wp1, if the interval sizes (moments) are bounded •  g is bounded wp1. Complete time-slot (all machines used) Kj: Length of j-thinterval (ends in incomplete slot)

  26. Non-preemptive Scenario: An Example Observation for reduce tasks: • Unlike preemptive scenario, in the non-preemptive scenario, each job arriving in interval j will start (not finish) all its reduce tasks in intervaljor interval j+1 • Reduce tasks must be finished by the end of interval j+1 plus time Ri-1. • Hence, if (all moments) of the interval are bounded then the flow times (per job) are bounded wp1  g is bounded wp1.

  27. Bounds on the Moments Lemma: If the scheduler is work-conserving, then for any given number , there exists a constant , such that , for all j. where Rough idea: • Since the traffic intensity ρ<1, and the workload in each slot is light-tailed, the number of consecutive complete time slots will decay exponentially (Chernoff’s Theorem) •  The moments of interval K are bounded

  28. Many unanswered questions

  29. Main schedulers proposed for MapReduce framework • FIFO scheduler: • Default in Hadoop • Suffers from the well known head-of-line blocking problem • Fair scheduler [Zaharia’09] • Mitigate head-of-line blocking problem of FIFO • Jobs stick to the machines on which they are initially scheduled • Coupling scheduler [Tan’12] • Mitigate the problem that jobs stick to machines

  30. Main schedulers proposed for MapReduce framework • Consider machines with speed-up [Moseley’11] • O(1/ ε5) competitive algorithm with (1 + ε) speed for the online case, where 0 < ε ≤ 1 • Speed-up of machines is necessary in the algorithm (if ε decreases to 0, the competitive ratio will increase to infinity correspondingly). • Consider minimizing total completion time [Chang’11, Chen’12] • Minimizing total completion time = minimizing total flow time. • Competitive ratio of “minimizing total completion time” ≠ competitive ratio of “minimizing total flow time”.

  31. ASRPT • ASRPT runs SRPT in a virtual fashion and keeps track of how SRPT would have scheduled jobs in any given slot. • ASRPT assigns machines to the tasks in the following priority order (from high to low): • (Only in the non-preemptive scenario) The previously scheduled Reduce tasks which are not yet finished, • The Map tasks which are scheduled in the SRPT, • The available Reduce tasks, • The available Map tasks. • For each priority group, the algorithm greedily assign machines to the corresponding tasks through the sorted available workload.

  32. Light-tailed Distributed Workload where

  33. Simulations

  34. Main Results • Competitive Ratio: Deterministic guarantee under adversarial arrivals • Unbounded for all non-preemptive scheduler • Efficiency Ratio • Asymptotic • Guarantee with probability 1 • May have greater modeling implications for analysis of algorithms • Efficiency Ratio is bounded by a constant for anywork-conserving scheduler • For bounded workload for Reduce phase (Ri < Rmax) • For light-tailed workload for Reduce phase • An Algorithm (ASRPT) with efficiency ratio of 2 for preemptive scenario.

  35. Types of Schedulers: Treatment of Reduce Tasks • Non-preemptive • One task can only be allocated to one machine (preserves data locality) • Once started cannot be interrupted till the end of the task

  36. Types of Schedulers: Treatment of Reduce Tasks • Preemptive • Every task can be interrupted at the end of a slot • Remaining workload can be executed on any machine

  37. More in Practice Shuffle Phase

  38. ASRPT: Available Shortest Remaining Processing Time [Example] 4 4 R1 R1 R1 Number of machines, N Number of machines, N 3 3 R2 2 2 R1 M1 M1 1 1 M2 R2 M2 1 1 3 3 2 4 2 4 Time Time SRPT : Total Flow-time 4 units : Total Flow-time 5 units ASRPT

  39. ASRPT: Available Shortest Remaining Processing Time [Example] 4 4 R1 R1 Number of machines, N Number of machines, N 3 3 R2 2 2 M2 R1 M1 M1 1 1 R2 R1 M2 1 1 3 3 2 4 2 4 Time Time : Total Flow-time 6 units : Total Flow-time 5 units FIFO ASRPT

  40. Bounds on the Moments Lemma: If the scheduler is work-conserving, then for any given number , there exists a constant , such that , for all j. where

More Related