1 / 14

Using Map-reduce to Support MPMD

Using Map-reduce to Support MPMD. Peng (chenpeng@umail.iu.edu) Yuan(yuangao@umail.iu.edu). Our Motivation.

Download Presentation

Using Map-reduce to Support MPMD

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Using Map-reduce to Support MPMD Peng (chenpeng@umail.iu.edu) Yuan(yuangao@umail.iu.edu)

  2. Our Motivation • The default job scheduler in Hadoop has a first-in-first-out queue of jobs for each priority level. The scheduler always assigns task slots to the first job in the highest-level priority queue that is in need of tasks. • Problems: • Difficult to share a MapReduce cluster between users (Multi-tasks) • Difficult to implement a composite tasks having more that one jobs with inter-dependency. • A strong motivation to improve the Hadoop framework to • support Multi-tasks • support Composite-tasks

  3. Multi-tasks Problem • One solution to this problem is to create separate MapReduce clusters for different user groups with Hadoop On-Demand, but this hurts system utilization because a group's cluster may be mostly idle for long periods of time. • Advanced solution: • Facebook Fair Scheduler • Yahoo Capacity Scheduler

  4. Facebook Fair Scheduler • Jobs are placed into named “pools”. • Each pool can have a “guaranteed capacity” that is specified through a config file, which gives a minimum number of map slots and reduce slots to allocate to the pool. When there are pending jobs in the pool, it gets at least this many slots, but if it has no jobs, the slots can be used by other pools. • Excess capacity that is not going toward a pool’s minimum is allocated between jobs using fair sharing. • Fair sharing splits up compute time proportionally between jobs that have been submitted, emulating an "ideal" scheduler that gives each job 1/Nth of the available capacity.

  5. Yahoo Capacity Scheduler • Define a number of named queues. Each queue has a configurable number of map and reduce slots. • The scheduler gives each queue its capacity when it contains jobs, and shares any unused capacity between the queues. However, within each queue, FIFO scheduling with priorities is used, except for one aspect – you can place a limit on percent of running tasks per user, so that users share a cluster equally.

  6. There is still a Problem! • Both Yahoo and Facebook’s scheduler assigns dedicated map and reduce slots to those tasks, they are not in compliance with “Moving computation to data” • Out solution: • Turning Hadoop into MPMD (computation resource sharing): • Different users can submit multiple tasks which will be assigned to different mappers/reducers and run simultaneously. • Load balancing achieved by keeping the computing nodes busy with tasks

  7. Using the traditional Map-reduce to support MPMD MapProcedure ReduceProcedure Lookup the code for Data Data 1 RunnerMap Output 1 Data 2 RunnerReduce RunnerMap Output 2 Data 3 Output …… …… RunnerReduce …… Output n Data n RunnerMap

  8. Running WordCount and Hadoop Blast using extended framework RunnerReduce RunnerMap blast_input_1.fa:edu.indiana.cs.b649.BlastMapProcedure wordcount_input_1.txt:edu.indiana.cs.b649.WordCountMapProcedure WordCountMapProcedure Extends Abstract class MapProcedure BlastMapProcedure Extends Abstract class MapProcedure WordCountReduceProcedure Extends Abstract class ReduceProcedure

  9. Composite task problem • To support a composite task having more that one jobs with inter-dependency.

  10. Support Composite-tasks in out out0 blast_intput_0.fa File 1 Map blast_intput_1.fa File 2 emtpy_file.out empty_file Map File 3 Map Reduce Part-r-00000 File 4

  11. Demo • Running Hadoop Blast + Advanced WordCount • Single node mode: 2 mappers + 2 reducers • Input files: • blast_input_0.fa • blast_input_1.fa • wordcount_input_0.txt • wordcount_input_1.txt • empty_file • Output files: • blast_input_0.fa.out • blast_input_1.fa.out • empty_file.out

  12. Performance Test Task execution time (ms) = job launching time + job execute time

  13. Roles of team member • Peng • Implemented the framework to support Multi-tasks • Yuan • Improved the framework to support Composite-tasks

  14. Q&A • Thanks!

More Related