1 / 14

Reduce Task Suspension for Priority Scheduling in Hadoop

Reduce Task Suspension for Priority Scheduling in Hadoop. Brian Cho and Philbert Lin. Goal. Suspend Hadoop reduce t asks. Motivation. Cost ($). (Safe region). (Missed deadlines). (Ideal). ?. Cost of missed deadlines. Cost of under-utilization (e.g. server costs). Cluster Load.

solana
Download Presentation

Reduce Task Suspension for Priority Scheduling in Hadoop

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Reduce Task Suspension for Priority Scheduling in Hadoop Brian Cho and Philbert Lin

  2. Goal Suspend Hadoop reduce tasks

  3. Motivation Cost ($) (Safe region) (Missed deadlines) (Ideal) ? Cost of missed deadlines Cost of under-utilization (e.g. server costs) Cluster Load Cost of a production cluster

  4. Example and Design Goals Production cluster reduce trace:1 Current approaches kill tasks → Lose all our work!  • Want to preempt tasks, by suspending (not killing) • Production jobs get resources quickly • Research jobs don’t lose work • Focus on reduce rather than map • Reduce tasks take longer, so more work to lose (median map 19s vs reduce 231s)2 # reduce slots Run research job on unused slots [1] Yahoo: private correspondence [2] Facebook: Zaharia, et. al., “Delay scheduling: a simple technique for achieving locality and fairness in cluster scheduling,” Eurosys 2010 time

  5. MapReduce 2.0 YARN Overview RM = Resource Manager NM = Node Manager Scheduler AM = Application Master NM NM NM NM NM RM App. Man. NM Container1,1 AM1 Container2,2 Container2,1 Container1,2 AM2 Container2,3 Based on: http://www.cloudera.com/blog/2012/02/mapreduce-2-0-in-hadoop-0-23/

  6. Reduce Stages sorted map outputs copy merge reduce outdir/part-00000

  7. Suspend Lifecycle Suspend Job 2 Tasks Resume Job 2 Tasks = Production Job Scheduler = Research Job Production Job3 NM NM NM NM RM NM App. Man. NM Container1,1 AM1 Container2,2 Container3,1 Container2,2R Container2,1 Container1,2 AM3 AM2 Container2,3 Container2,3R

  8. Suspending and Resuming Tasks Resume Task parse suspended log [Attempt 0] [Attempt 1] (exited) (skip) copy merge Suspend Task log: merged file location log: reduce progress reduce outdir/part-00000-00 outdir/part-00000-01

  9. Suspending and Resuming Tasks Resume Task parse suspended log [Attempt 0] [Attempt 1] (exited) (skip) copy • Suspend/Resume takes advantage of existing intermediate data • File locations and progress indicators written to local logs • Fast! merge Suspend Task log: merged file location log: reduce progress reduce outdir/part-00000-00 outdir/part-00000-01

  10. Current Status and Limitations • Implemented reduce stage • Most impactful stage • Other stages will use similar approach • If state saved across reduce calls, it will be lost • Suspend/resume done through job client • These functions can be used as-is within a scheduler

  11. Future Work:Preemptive Priority Scheduler Design • Suspend workflow • RM asks AM for containers within T deadline • AM decides: should I kill or suspend this reduce task? • Tsuspend< Tleft to deadline • Resume workflow • RM tells AM that containers are available • Either local or remote to suspended task • AM decides: should I restart or resume this suspended task? • Twaiting for local container + Tresume < Tcatch up on a newreduce • Tmigrating intermediate data + Tresume < Tcatch up on a newreduce

  12. Big Picture • Design of a preemptive, priority scheduler • Time-sensitive high-priority jobs can scale up instantly • Long-running low-priority jobs can scale down gracefully, without losing work • Design of estimated output over partial data • Jobs can output estimates when time is short • Naïve implementation: suspend() and use partial data output up to that point • More rigorous estimates: custom, programmer-provided estimate() function • … all combined into the same framework?

  13. Conclusions • Implemented a low-overhead mechanism to suspend/resume reduce tasks • Supports the claim that suspending tasks can be an essential tool for a responsive priority scheduler • Future work • Implement suspend/resume for all stages of reduce task • Add user-defined function to save cross-reduce state • Design a scheduler that incorporates suspension • Experimental evaluation

More Related