a dynamic mapreduce scheduler for heterogeneous workloads chao tian haojie zhou yongqiang he li zha n.
Skip this Video
Download Presentation
簡報人:碩資工一甲 董耀文

Loading in 2 Seconds...

play fullscreen
1 / 22

簡報人:碩資工一甲 董耀文 - PowerPoint PPT Presentation

  • Uploaded on

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian , Haojie Zhou , Yongqiang He,Li Zha. 簡報人:碩資工一甲 董耀文 . Outline. Background Question? So! Related work MapReduce procedure analysis MR-Predict Schedule policys Evaluation Conclusion. Background .

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about '簡報人:碩資工一甲 董耀文' - delora

Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a dynamic mapreduce scheduler for heterogeneous workloads chao tian haojie zhou yongqiang he li zha

A Dynamic MapReduce Scheduler for Heterogeneous WorkloadsChao Tian, Haojie Zhou , YongqiangHe,LiZha

簡報人:碩資工一甲 董耀文

  • Background
  • Question?
  • So!
  • Related work
  • MapReduce procedure analysis
  • MR-Predict
  • Schedule policys
  • Evaluation
  • Conclusion
  • As the Internet scale keeps growing up, enormous data needs to be processed in many Internet Service Providers.
  • MapReduce framework is now becoming a leading example solution, it’s designed for building large commodity cluster, which consist of thousands of nodes by using commodity hardware.
  • The performance of a parallel system like MapReduce system closely ties to its task scheduler.
  • Current scheduler in Hadoopuses a single queue for scheduling jobs with a FCFS method.
  • Yahoo’s capacity scheduler as well as Facebook’s fair scheduler uses multiple queues for allocation differnet resource in the cluster.
  • In practical, different kinds of jobs often simultaneously run in the data center.These different jobs make different workloads on the cluster, including the I/O-bound and CPU-bound workloads.
  • The characters of workloads are not aware by Hadoop's scheduler which prefers to simultaneously run map tasks from the same job on the top of queue.
  • This may reduce the throughput of the whole system which seriously influences the productivity of data center, because tasks from the same job always have the same character.
  • How to improve the hardware utilization rate when different kinds of workloads run on the clusters in MapReduceframework?
  • They design a new triple-queue scheduler which consist of a workload predict mechanism MR-Predict and three different queues (CPU-bound queue, I/O-bound queue and wait queue).
  • They classify MapReduceworkloads into three types, and their workload predict mechanism automatically predicts the class of a new coming job based on this classification.
  • Jobs in the CPU- bound queue or I/O-bound queue are assigned separately to parallel different type of workloads.
  • Their experiments show that can Approach could increase the system throughput up to 30%
related work
Related work
  • Scheduling algorithms in parallel system [11,…]
  • Applications have different workloads
    • large computation and I/O requirements [10].
  • How I/O-bound jobs affect system performance[6].
  • A gang schedule algorithm which parallel the CPU- bound jobs and IO-bound jobs to increasing the utilization of hardware[7].
r elated work
Related work
  • The schedule problem in MapReduceattracted many attentions[2,10].
  • Yahoo and Facebook designed schedulers of Hadoop as capacity scheduler [4] and Fair scheduler [5].
mapreduce procedure analysis
MapReduce procedure analysis
  • Map-shuffle phase
    • Init input data
    • Compute map task
    • Store ouput result to local disk
    • Shuffle map tasks result data out
    • Shuffle reduce input data in
mapreduce procedure analysis1
MapReduce procedure analysis
  • Reduce-Compute phase
    • tasks run the application logic
e valuation
  • Environment
    • 6 node connect gigabyte Etherent.
    • DELL1950
      • CPU: 2 Quard Core 2.0GHz
      • Memory: 4GB
      • Disk: 2 SATA disk
    • Input data: 15GB
    • map slots & reduce slot: 8
    • DIOR: 31.2 MB/s (without reduce phase in Hadoop)
  • Resource utilizations


Total order sort (sequential I/O )benchmark

8 ( 64MB + 64 MB ) / 8 >= 31.2 MB/s

  • Resource utilizations


use [.]* as the regular expression.

8 ( 64MB + 1MB + 1MB + SID ) / 92 >= 31.2 MB/s

  • Resource utilizations


It splits the input text into words, shuffles every word in map phase and counts its occupation number in reduce phase.

8 ( 64MB + 64 MB + 64MB + SID ) / 35 >= 31.2 MB/s

  • Triple queue scheduler experiments
    • Every job runs five times & total 15 jobs will run
  • Scheduler correctly distributes jobs into different queues in most situations.
  • Triple Queue Scheduler could
    • increase the map tasks throughput 30%
    • save the makespan 20%