paratimer a progress indicator for mapreduce dags l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ParaTimer : A progress Indicator for MapReduce DAGs PowerPoint Presentation
Download Presentation
ParaTimer : A progress Indicator for MapReduce DAGs

Loading in 2 Seconds...

play fullscreen
1 / 46

ParaTimer : A progress Indicator for MapReduce DAGs - PowerPoint PPT Presentation


  • 240 Views
  • Uploaded on

ParaTimer : A progress Indicator for MapReduce DAGs. Kristi Morton, Magdalena Balazinska, and Dan Grossman Computer Science and Engineering Department, University of Washington. Advisor Martin Theobald. Isha Khosla Masters in Informatics. Overview. Parallel Database Management Systems.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ParaTimer : A progress Indicator for MapReduce DAGs' - brasen


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
paratimer a progress indicator for mapreduce dags

ParaTimer : A progress Indicator for MapReduce DAGs

Kristi Morton, Magdalena Balazinska, and Dan Grossman

Computer Science and Engineering Department, University of Washington

Advisor

Martin Theobald

Isha Khosla

Masters in Informatics

parallel database management systems
Parallel Database Management Systems
  • Designed to process massive scale datasets.
  • Parallelism speeds up query execution.
  • Example of programming models or framework that provides parallelism of data sets.
    • Map Reduce
    • Pig latin
mapreduce
MapReduce
  • MapReduce is a programming model for processing large data sets.
  • Each MapReduce job contain seven phases of execution.
    • Split phase reads the data from file and split it.
    • Record reader phase iterates through the data set and generate key value pairs.
    • Map function process these records by appropriate operators.

Split

Record Reader - -- Map ---- Combine

Map Task

File

mapreduce5
MapReduce
  • Combine phase sorts and preaggregates the data and writes the records locally.
  • Copy phase copies the relevant data from the node where the map executed.
  • Sort phase merges all the files and passes the data to reducer phase.
  • Reducer applies appropriate operators and writes the data to disk.

Split

Record Reader---- Map ----- Combine

Copy

Sort

Reduce

Map Task

Reduce Task

File

Storage

Storage

piglatin
Piglatin
  • Extension of MapReduce framework.
  • Provides declarative interface to MapReduce.
  • Transform a SQL query…to Pig query.

Suppose we have a table urls: (url,category, pagerank). The following is a simple SQL query that finds, for each sufficiently large category, the average pagerank of high-pagerank urls in that category.

good_urls = FILTER urls BY pagerank > 0.2;

groups = GROUP good_urls BY category;

big_groups = FILTER groups BY COUNT(good_urls)>106;

output = FOREACH big_groups GENERATE

category, AVG(good_urls.pagerank);

SELECT category, AVG(pagerank)

FROM urls WHERE pagerank >0.2

GROUP BY category HAVING COUNT(*) > 106

summarizing pig latin query
Summarizing Pig Latin Query
  • visits = load ‘/data/visits’ as (user, url, time);
  • gVisits = group visits by url;
  • visitCounts = foreach gVisits generate url, count(visits);
  • urlInfo = load ‘/data/urlInfo’ as (url, category, pRank);
  • visitCounts = join visitCounts by url, urlInfo by url;
  • gCategories = group visitCounts by category;
  • topUrls = foreach gCategories generate top(visitCounts,10);
  • store topUrls into ‘/data/topUrls’;
parallel environment
Parallel Environment
  • To improve ..Parallel DBMSs
    • Resource Allocation
    • Enable query debugging
    • Tune the cluster configuration.

What we need?

framing the situation
Framing the situation
  • Given a magnitude of data and queries.
  • Need more than efficient query processing
  • What else user needs?
    • Accurate, time based progress estimation.
    • Intra-query fault tolerance
    • Query scheduling and resource management.
  • All this without too much runtime overhead
challenges
Challenges
  • Accurate progress estimation in parallel environment is a challenging task..
  • Yes! It is..
  • Parallel environments..
    • Distribution.
    • Concurrency
    • Failures
    • Data skew
parallax progress indicator for parallel queries
Parallax-Progress Indicatorfor parallel queries
  • Accurate time remaining estimate for parallel queries.
  • Why is accurate progress important?
    • Users need to plan their time.
    • Users need to know when to stop queries
  • Parallel queries are translated into sequence of map-reduce jobs.
  • Assumption- uniform data distribution, absence of node failures.
parallex
Parallex
  • Breaks the query into pipelines, which are groups of interconnected operators.
  • From the 7 phases of MapReduce..Parallex considers three pipelines
  • Breaks the query into pipelines, which are groups of interconnected operators.
  • From the 7 phases of MapReduce..Parallex considers three pipelines

1

1

2

2

3

3

Split

Split

Record Reader Map Combine

Record Reader Map Combine

Copy

Copy

Sort

Sort

Reduce

Reduce

Reduce Task

Reduce Task

Map Task

Map Task

Storage

Storage

File

File

Storage

Storage

time estimation parallex
Time estimation - Parallex
  • Let, N =Total number of tuples that pipeline must process.
  • K= number of tuples processed so far
  • Work remaining, w= N-K
  • For each pipeline p, given Np , Kp and pis the estimated processing cost.
  • So, the time remaining for the pipeline is
time estimation parallex19
Time estimation - Parallex

Given J, the setof all MapReduce jobs, and Pj , the set of all pipelines within job j belongs to J, the progress of a computation is thus given by the following formula, where Njp and Kjp values are aggregated across all partitions of the same pipeline and Setupremaining is the overhead for the unscheduled map and reduce tasks.

pig progress indicator
Pig Progress Indicator
  • Considers only record reader, copy, and reducer phases.
  • Limited accuracy.
  • Assumes that all operators (within and across jobs) perform the same amount of work.
  • Ignores high degree of parallelism.
slide21

It is found that none of the progress indicators so

discussed are not efficient for

large parallelism and consider node failures and

non-uniform data distribution.

problem statement
Problem Statement

Requires a time remaining indicator for broader class of queries

that handles real system challenges such as

failures and data skew.

slide23

Overview

  • Motivation

Solution: ParaTimer

paratimer
ParaTimer
  • A progress indicator for parallel queries that take the form of directed acyclic graphs (DAGs) of MapReduce jobs.
  • To handle complex shaped query plans in the form of trees, ParaTimer adopts the strategy of identifying and tracking the critical path in the query plan.
critical path based progress estimation
Critical-Path based progress estimation
  • Step 1: Computing the task schedule.
    • FIFO scheduler is considered, jobs are launched one after the other in sequence.
    • Consider the cluster with capacity of 5 concurrent map and 5 concurrent reduce tasks.
    • Assume,

Job1= 2 map tasks +1 reduce task.

Job2= 6 map tasks + 1 reduce task.

Job3 = 1map task + 1 reduce task

Job3

Job1

Job2

Pig latin query plan with a join operator

critical path based progress estimation26
Critical-Path based progress estimation
  • Step 1: Computing the task schedule.
    • Job1= 2 map tasks +1 reduce task.
    • Job2= 6 map tasks + 1 reduce task.
    • Job3 = 1map task + 1 reduce task.
    • 5 concurrent map and reduce nodes in a cluster.

Given a DAG of MapReduce jobs, Para timer computes a schedule S,

such as shown..

Job1

m11

m24

m3

m12

m25

Job2

m21

m26

m22

m23

r3

r1

r2

critical path based progress estimation27
Critical-Path based progress estimation
  • Step 2: Breaking a schedule into path fragments.
    • Typically, batches of task are scheduled at the same time.

Given a schedule S, a task round, T, is a set of tasks t belongs to S that all begin within a time x1of each other and end within a time x1of each other.

Batch3

Batch1

m11

m24

m3

m12

m25

m21

m26

Batch2

m22

m23

r3

r2

r1

critical path based progress estimation28
Critical-Path based progress estimation
  • Step 2: Breaking a schedule into path fragments.

A path fragment is a set of tasks all of the same type (i.e., either maps or reduces) that execute in consecutive rounds. In a path fragment, all rounds have the same width (i.e., same number of parallel tasks) except the last round, which can be either full or not.

    • Six path fragments are scheduled, i.e

P1= {m11,m12, m24, m25}

P2= {m21, m22, m23, m26}

P3= {r1}

P4= {r2}

P5= {m3}

P6= {r3}

how these path fragments represent parallel query execution
How these path fragments represent parallel query execution?
  • Case 1 : If a query comprises only sequence of MapReduce jobs

P1

P2

P3

P4

Paths

Map1

Reduce1

Map2

Reduce2

Critical path is a sequence of all these path fragments. Here it is equivalent t

Parallex

how these path fragments represent parallel query execution30
How these path fragments represent parallel query execution?
  • Case 2 : If a query comprises parallel MapReduce jobs
  • Identify the critical paths
critical path based progress estimation31
Critical-Path based progress estimation
  • Step 3: Identifying the critical path fragments.

Given a schedule and an assignment of tasks to path fragments,

it is easy to derive a schedule in terms of path fragments

where each path fragment is accompanied by a start

time and a duration.

Case 1: If two overlapping path fragments start at the same time, keep only the one expected to take longer. In the example, p1 and p2 execute in parallel. Hence, the shorter p1 fragment can be ignored.

  • P1= {m11,m12, m24, m25}
  • P2= {m21, m22, m23, m26}
critical path based progress estimation32
Critical-Path based progress estimation
  • Step 3: Identifying the critical path fragments.

Case 2: If two overlapping path fragments start at different times, keep the one that starts earlier. Remove the other one, but add back its extra time. In our example, p2 and p3 overlap. Because the overlap is total, p3’s time can be ignored. However, if r1 stretched past the end of m26, the extra time would be taken into account on the critical path..

  • P2= {m21, m22, m23, m26}
  • P3= {r1}

Critical path : P2= {m21, m22, m23, m26}

.

critical path based progress estimation33
Critical-Path based progress estimation
  • Step 4: Estimating the time remaining at run-time.

ParaTimer could monitor only a thread of tasks within the path fragment (or

some subset of these threads), where a thread is a sequence of tasks from

the beginning to the end of a path fragment.

slide34

Overview

  • Motivation
  • Solution: ParaTimer

Key Contributions

contributions of paratimer
Contributions of ParaTimer
  • Handling failures
    • Failures affect progress estimation.

There is no way to predict the running time for a query accurately if

Failures occur.

How to estimate the remaining time for queries?

Proposed approach..

Comprehensive progress estimation – users should be shown

multiple guesses about the remaining query time

comprehensive progress estimation
Comprehensive Progress estimation
  • Std Estimator + Pessimistic Failure estimator
  • Pessimistic Failure estimator
    • The longest remaining task must be the one to fail.
    • The task must fail right before finishing as this adds the greatest delay.
    • The task must have been scheduled in the last round of tasks for the given job and phase.

PFE..estimates time remaining for this schedule

StdEstimator estimates the time remaining for this schedue

comprehensive progress estimation37
Comprehensive Progress estimation
  • Pessimistic Failure estimator

.

Pipelines are scheduled or blocked

Failure adds a path fragment but does not change latency

handling data skew
Handling Data skew
  • Uneven distribution of data to partitions
  • In MapReduce, skew due to uneven distribution can occur only in reduce tasks.
  • Example..
  • No longer wide path fragments.
  • Each slot in cluster becomes its own path fragment.
slide39

Overview

  • Motivation
  • Solution: ParaTimer
  • Key Contributions

Evaluation

experimental setup
Experimental Setup
  • Configuration
    • 8-node cluster configured with Hadoop-17 and Pig Latin.
    • Each node =2.00GHz dual quad-core Intel Xeon CPU, 16 GB of RAM.
    • Parallelism=16 concurrent map and reduce tasks.
  • Assumptions, for each pipeline
    • Input cardinality estimates=N
    • Processing rate estimated=
  • Both Parallex and ParaTimer are assumed in two forms..
    • Perfect..which uses values from a prior run over the entire data set.
    • 1%..which uses collected from a single prior run over a1% sampled subset.
experimental setup41
Experimental Setup
  • Experiment script
    • ParaTimer handles the following PigLatin script that contain a join operator and yields a query plan with concurrent map reduce jobs.
    • Job1-1GB data,4 parallel map and 16 reduce task.
    • Job2 – 4.2 GB data, 17 map and 16 reduce.
percent time complete estimates for parallel query with join 4 2 gb and 1 gb data sets
Percent-time complete estimates for parallelquery with join. 4.2 GB and 1 GB data sets

Instantaneous error is computed as:

fi- percent time done estimate.

ti- current time.

tn-time when the query completes.

slide43

Overview

  • Motivation
  • Solution: ParaTimer
  • Key Contributions
  • Evaluation
conclusion
Conclusion
  • What is ParaTimer ?,

a system for estimating time remaining for parallel queries consisting of multiple map reduce jobs running on a cluster.

  • Key idea..

Identifying the critical path for the entire query and producing multiple estimates.

related work
Related work
  • Parallel DBMSs provides coarse grain indicators for running parallel queries.
  • DB2. SQL/monitoring facility.http://www.sprdb2.com/SQLMFVSE.PDF, 2000.
  • DB2. DB2 Basics: The whys and how-tos of DB2 UDB monitoring. http://www.ibm.com/developerworks/db2/library/ techarticle/dm-0408hubel/index.html, 2004.
  • Parallex.. Already discussed
  • Pig progress indicator