A dynamic mapreduce scheduler for heterogeneous workloads chao tian haojie zhou yongqiang he li zha
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

簡報人:碩資工一甲 董耀文 PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

A Dynamic MapReduce Scheduler for Heterogeneous Workloads Chao Tian , Haojie Zhou , Yongqiang He,Li Zha. 簡報人:碩資工一甲 董耀文 . Outline. Background Question? So! Related work MapReduce procedure analysis MR-Predict Schedule policys Evaluation Conclusion. Background.

Download Presentation

簡報人:碩資工一甲 董耀文

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A dynamic mapreduce scheduler for heterogeneous workloads chao tian haojie zhou yongqiang he li zha

A Dynamic MapReduce Scheduler for Heterogeneous WorkloadsChao Tian, Haojie Zhou , YongqiangHe,LiZha

簡報人:碩資工一甲 董耀文


Outline

Outline

  • Background

  • Question?

  • So!

  • Related work

  • MapReduce procedure analysis

  • MR-Predict

  • Schedule policys

  • Evaluation

  • Conclusion


Background

Background

  • As the Internet scale keeps growing up, enormous data needs to be processed in many Internet Service Providers.

  • MapReduce framework is now becoming a leading example solution, it’s designed for building large commodity cluster, which consist of thousands of nodes by using commodity hardware.


Background1

Background

  • The performance of a parallel system like MapReduce system closely ties to its task scheduler.

  • Current scheduler in Hadoopuses a single queue for scheduling jobs with a FCFS method.

  • Yahoo’s capacity scheduler as well as Facebook’s fair scheduler uses multiple queues for allocation differnet resource in the cluster.


Background2

Background

  • In practical, different kinds of jobs often simultaneously run in the data center.These different jobs make different workloads on the cluster, including the I/O-bound and CPU-bound workloads.


Background3

Background

  • The characters of workloads are not aware by Hadoop's scheduler which prefers to simultaneously run map tasks from the same job on the top of queue.

  • This may reduce the throughput of the whole system which seriously influences the productivity of data center, because tasks from the same job always have the same character.


Question

Question

  • How to improve the hardware utilization rate when different kinds of workloads run on the clusters in MapReduceframework?


2408284

SO!

  • They design a new triple-queue scheduler which consist of a workload predict mechanism MR-Predict and three different queues (CPU-bound queue, I/O-bound queue and wait queue).

  • They classify MapReduceworkloads into three types, and their workload predict mechanism automatically predicts the class of a new coming job based on this classification.

  • Jobs in the CPU- bound queue or I/O-bound queue are assigned separately to parallel different type of workloads.

  • Their experiments show that can Approach could increase the system throughput up to 30%


Related work

Related work

  • Scheduling algorithms in parallel system [11,…]

  • Applications have different workloads

    • large computation and I/O requirements [10].

  • How I/O-bound jobs affect system performance[6].

  • A gang schedule algorithm which parallel the CPU- bound jobs and IO-bound jobs to increasing the utilization of hardware[7].


R elated work

Related work

  • The schedule problem in MapReduceattracted many attentions[2,10].

  • Yahoo and Facebook designed schedulers of Hadoop as capacity scheduler [4] and Fair scheduler [5].


Mapreduce procedure analysis

MapReduce procedure analysis

  • Map-shuffle phase

    • Init input data

    • Compute map task

    • Store ouput result to local disk

    • Shuffle map tasks result data out

    • Shuffle reduce input data in


Mapreduce procedure analysis1

MapReduce procedure analysis

  • Reduce-Compute phase

    • tasks run the application logic


Mr predict

MR-Predict


Schedule policys

Schedule policys


Schedule policys1

Schedule policys


E valuation

Evaluation

  • Environment

    • 6 node connect gigabyte Etherent.

    • DELL1950

      • CPU: 2 Quard Core 2.0GHz

      • Memory: 4GB

      • Disk: 2 SATA disk

    • Input data: 15GB

    • map slots & reduce slot: 8

    • DIOR: 31.2 MB/s (without reduce phase in Hadoop)


Evaluation

Evaluation

  • Resource utilizations

TeraSort:

Total order sort (sequential I/O )benchmark

8 ( 64MB + 64 MB ) / 8 >= 31.2 MB/s


Evaluation1

Evaluation

  • Resource utilizations

Grep-Count:

use [.]* as the regular expression.

8 ( 64MB + 1MB + 1MB + SID ) / 92 >= 31.2 MB/s


Evaluation2

Evaluation

  • Resource utilizations

WordCount:

It splits the input text into words, shuffles every word in map phase and counts its occupation number in reduce phase.

8 ( 64MB + 64 MB + 64MB + SID ) / 35 >= 31.2 MB/s


Evaluation3

Evaluation

  • Triple queue scheduler experiments

    • Every job runs five times & total 15 jobs will run


Conclusion

Conclusion

  • Scheduler correctly distributes jobs into different queues in most situations.

  • Triple Queue Scheduler could

    • increase the map tasks throughput 30%

    • save the makespan 20%


2408284

Thank you for listening.


  • Login