1 / 24

Combating Outliers in map-reduce

Combating Outliers in map-reduce. Srikanth Kandula Ganesh Ananthanarayanan  , Albert Greenberg, Ion Stoica  , Yi Lu, Bikas Saha  , Ed Harris .  . . l og(size of cluster). 10 5. 10 4. mapreduce. 10 3. HPC, || databases. 10 2. e.g., the Internet, click logs, bio/genomic data.

tejana
Download Presentation

Combating Outliers in map-reduce

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Combating Outliers in map-reduce Srikanth Kandula Ganesh Ananthanarayanan, Albert Greenberg, Ion Stoica, Yi Lu, Bikas Saha, Ed Harris  

  2. log(size of cluster) 105 104 mapreduce 103 HPC, || databases 102 e.g., the Internet,click logs, bio/genomic data 101 1 log(size of dataset) GB 109 TB 1012 PB 1015 EB 1018 • map-reduce • decouples operations on data (user-code) from mechanisms to scale • is widely used • Cosmos (based on SVC’s Dryad) + Scope @ Bing • MapReduce @ Google • Hadoop inside Yahoo! and on Amazon’s Cloud (AWS)

  3. An Example What the user says: SELECT Query, COUNT(*) AS Freq FROMQueryTable HAVINGFreq > X GoalFind frequent search queries to Bing How it Works: job manager assign work, get progress file block 0 task file block 1 output block 0 task Local write task file block 2 output block 1 task task file block 3 Reduce Read Map

  4. We find that: Outliers slow down map-reduce jobs File System Map.Read 22K Map.Move 15K Map 13K Barrier Reduce 51K • Goals • speeding up jobs improves productivity • predictability supports SLAs • … while using resources efficiently

  5. This talk… Identify fundamental causes of outliers • concurrency leads to contention for resources • heterogeneity (e.g., disk loss rate) • map-reduce artifacts Current schemes duplicate long-running tasks Mantri: A cause-, resource-aware mitigation scheme • takes distinct actions based on cause • considers resource cost of actions Resultsfrom a production deployment

  6. Why bother? Frequency of Outliers stragglers = Tasks that take  1.5 times the median task in that phase recomputes = Tasks that are re-run because their output was lost straggler straggler Outlier • The median phase has 10% stragglers and no recomputes • 10% of the stragglers take >10X longer

  7. Why bother? Cost of outliers (what-if analysis, replays logs in a trace driven simulator) At median, jobs slowed down by 35% due to outliers

  8. Why outliers? Problem: Due to unavailable input, tasks have to be recomputed map reduce Delay due to a recompute Delay due to a recompute readily cascades sort

  9. Why outliers? Problem: Due to unavailable input, tasks have to be recomputed (simple) Idea: Replicate intermediate data, use copy if original is unavailable Challenge(s) What data to replicate? Where? What if we still miss data? • Insights: • 50% of the recomputes are on 5% of machines

  10. Why outliers? Problem: Due to unavailable input, tasks have to be recomputed (simple) Idea: Replicate intermediate data, use copy if original is unavailable Challenge(s) What data to replicate? Where? What if we still miss data? • Insights: • 50% of the recomputes are on 5% of machines • cost to recompute vs. cost to replicate t = predicted runtime of task r = predicted probability of recompute at machine trep = cost to copy data over within rack M1 M2 tredo = r2(t2 +t1redo) Mantripreferentially acts on the more costly recomputes

  11. Why outliers? Problem: Tasks reading input over the network experience variable congestion Reduce task Map output uneven placement is typical in production • reduce tasks are placed at first available slot

  12. Why outliers? Problem: Tasks reading input over the network experience variable congestion Idea: Avoid hot-spots, keep traffic on a link proportional to bandwidth Challenge(s) Global co-ordination across jobs? Where is the congestion? • Insights: • local control is a good approximation (each job balances its traffic) • link utilizations average out on the long term and are steady on the short term If rack i has di map output and ui,vi bandwidths available on uplink and downlink, Place ai fraction of reduces such that:

  13. Why outliers? Persistently slow machines rarely cause outliers Cluster Software (Autopilot) quarantines persistently faulty machines

  14. Why outliers? Problem: About 25% of outliers occur due to more dataToProcess Solution: Ignoring these is better than the state-of-the-art! (duplicating) In an ideal world, we could divide work evenly… We schedule tasks in descending order of dataToProcess Theorem [due to Graham, 1969] Doing so is no more than 33% worse than the optimal

  15. Why outliers? Problem: 25% outliers remain,likely due to contention@machine Idea: Restart tasks elsewhere in the cluster Challenge(s) The earlier the better, but to restart outlier or start a pending task? trem Save time and resources iff Running task (a) Potential restart (tnew) (b) (c) time now If predicted time is much better, kill original, restart elsewhere Else, if other tasks are pending, duplicate iff save both time and resource Else, (no pending work) duplicate iff expected savings are high Continuously, observe and kill wasteful copies

  16. Summary (a) (b) (c) (d) (e) preferentially replicate costly-to-recompute tasks each job locally avoids network hot-spots quarantine persistently faulty machines schedule in descending order of data size restart or duplicate tasks, cognoscent of resource cost. Prune. Theme: Cause-, Resource- aware action Explicit attempt to decouple solutions, partial success

  17. Results Deployed in production cosmos clusters Prototype Jan’10 baking on pre-prod. clusters  release May’10 Trace driven simulations thousands of jobs mimic workflow, task runtime, data skew, failure prob. compare with existing schemes and idealized oracles

  18. In production, restarts… improve on native cosmos by 25% while using fewer resources

  19. Comparing jobs in the wild 340 jobs that each repeated at least five times during May 25-28 (release) vs. Apr 1-30 (pre-release) CDF % cluster resources CDF % cluster resources

  20. In trace-replay simulations, restarts… CDF % cluster resources CDF % cluster resources are much better dealt with in a cause-, resource- aware manner

  21. Protecting against recomputes CDF % cluster resources

  22. Outliers in map-reduce clusters • are a significant problem • happen due to many causes • interplay between storage, network and map-reduce • cause-, resource- aware mitigation improves on prior art

  23. Back-up

  24. Network-aware Placement

More Related