A dynamic mapreduce scheduler for heterogeneous workloads chao tian haojie zhou yongqiang he li zha
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

簡報人:碩資工一甲 董耀文 PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian , Haojie Zhou , Yongqiang He,Li Zha. 簡報人:碩資工一甲 董耀文 . Outline. Background Question? So! Related work MapReduce procedure analysis MR-Predict Schedule policys Evaluation Conclusion. Background.

Download Presentation

簡報人:碩資工一甲 董耀文

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

A dynamic mapreduce scheduler for heterogeneous workloads chao tian haojie zhou yongqiang he li zha

A Dynamic MapReduce Scheduler for Heterogeneous WorkloadsChao Tian, Haojie Zhou , YongqiangHe,LiZha

簡報人:碩資工一甲 董耀文



  • Background

  • Question?

  • So!

  • Related work

  • MapReduce procedure analysis

  • MR-Predict

  • Schedule policys

  • Evaluation

  • Conclusion



  • As the Internet scale keeps growing up, enormous data needs to be processed in many Internet Service Providers.

  • MapReduce framework is now becoming a leading example solution, it’s designed for building large commodity cluster, which consist of thousands of nodes by using commodity hardware.



  • The performance of a parallel system like MapReduce system closely ties to its task scheduler.

  • Current scheduler in Hadoopuses a single queue for scheduling jobs with a FCFS method.

  • Yahoo’s capacity scheduler as well as Facebook’s fair scheduler uses multiple queues for allocation differnet resource in the cluster.



  • In practical, different kinds of jobs often simultaneously run in the data center.These different jobs make different workloads on the cluster, including the I/O-bound and CPU-bound workloads.



  • The characters of workloads are not aware by Hadoop's scheduler which prefers to simultaneously run map tasks from the same job on the top of queue.

  • This may reduce the throughput of the whole system which seriously influences the productivity of data center, because tasks from the same job always have the same character.



  • How to improve the hardware utilization rate when different kinds of workloads run on the clusters in MapReduceframework?



  • They design a new triple-queue scheduler which consist of a workload predict mechanism MR-Predict and three different queues (CPU-bound queue, I/O-bound queue and wait queue).

  • They classify MapReduceworkloads into three types, and their workload predict mechanism automatically predicts the class of a new coming job based on this classification.

  • Jobs in the CPU- bound queue or I/O-bound queue are assigned separately to parallel different type of workloads.

  • Their experiments show that can Approach could increase the system throughput up to 30%

Related work

Related work

  • Scheduling algorithms in parallel system [11,…]

  • Applications have different workloads

    • large computation and I/O requirements [10].

  • How I/O-bound jobs affect system performance[6].

  • A gang schedule algorithm which parallel the CPU- bound jobs and IO-bound jobs to increasing the utilization of hardware[7].

R elated work

Related work

  • The schedule problem in MapReduceattracted many attentions[2,10].

  • Yahoo and Facebook designed schedulers of Hadoop as capacity scheduler [4] and Fair scheduler [5].

Mapreduce procedure analysis

MapReduce procedure analysis

  • Map-shuffle phase

    • Init input data

    • Compute map task

    • Store ouput result to local disk

    • Shuffle map tasks result data out

    • Shuffle reduce input data in

Mapreduce procedure analysis1

MapReduce procedure analysis

  • Reduce-Compute phase

    • tasks run the application logic

Mr predict


Schedule policys

Schedule policys

Schedule policys1

Schedule policys

E valuation


  • Environment

    • 6 node connect gigabyte Etherent.

    • DELL1950

      • CPU: 2 Quard Core 2.0GHz

      • Memory: 4GB

      • Disk: 2 SATA disk

    • Input data: 15GB

    • map slots & reduce slot: 8

    • DIOR: 31.2 MB/s (without reduce phase in Hadoop)



  • Resource utilizations


Total order sort (sequential I/O )benchmark

8 ( 64MB + 64 MB ) / 8 >= 31.2 MB/s



  • Resource utilizations


use [.]* as the regular expression.

8 ( 64MB + 1MB + 1MB + SID ) / 92 >= 31.2 MB/s



  • Resource utilizations


It splits the input text into words, shuffles every word in map phase and counts its occupation number in reduce phase.

8 ( 64MB + 64 MB + 64MB + SID ) / 35 >= 31.2 MB/s



  • Triple queue scheduler experiments

    • Every job runs five times & total 15 jobs will run



  • Scheduler correctly distributes jobs into different queues in most situations.

  • Triple Queue Scheduler could

    • increase the map tasks throughput 30%

    • save the makespan 20%


Thank you for listening.

  • Login