Create Presentation
Download Presentation

Download Presentation
## Stochastic DAG Scheduling using Monte Carlo Approach

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**Stochastic DAG Scheduling using Monte Carlo Approach**Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China RizosSakellariou SchoolofComputerScience,TheUniversityofManchester,UK**Previous Presentation (9/06/13)**• Research Area: Scheduling workflows under heterogeneous environment with variable performance.**Introduction**• General DAG Scheduling assumption: • Estimated Execution time for each task is known in advance. • Several techniques of estimation: e.g. average over several runs • Similarly, estimated data transfer time is known in advance. • A study* has shown, there might be significant deviations in observed performance in Grids. • To address this deviations, Two approaches are prevalent • Just-In-Time (high overhead) • RunTime (static schedule + runtime changes) (hypothesis**: might waste resources and increase makespan if static schedule is not very good) • * A. Lastovetsky, J. Twamley, Towards a realistic performance model for networks of heterogeneous computers, in:M.Ng,A.Doncescu,L.Yang,T.Leng (Eds.), High Performance Computational Science and Engineering, in: IFIP InternationalFederationforInformationProcessing,vol.172,Springer,Boston, 2005,pp.39–57. • ** R.Sakellariou,H.Zhao,A low-cost rescheduling policy for efficient mapping of workflows on grid systems, Sci. Program. 12(4) (2004) 253–262**Problem Addressed**• Generating a better (minimize makespan) “Static” schedule based on the stochastic model of the variations in the performance (execution time) of individual tasks in the graph.**Background and Related Work**• Heterogeneous Earliest Finish Time heuristic (discussed in the previous presentation) • List based scheduling. • Prioritize tasks based on the “bLevel” (essentially, tasks on the critical path get higher priority) • Once task is chosen, map it to “best” available resource. bLevel(i) = wi + max j∈Succ(i){wi→j +bLevel(j)}**Problem Description**• G = (N, E) -> DAG with one entry, one exit node. • R -> set of heterogeneous resources • Eti,p-> Random variable for execution time • Assumption: Network bandwidth is constant. • M -> Makespan = finish time of exit node. Goal: Find schedule Ω to minimize makespan (assign N to R, no overlap, no preemption, no migration)**Methodology**• Assumption: Analytical methods that solve the probabilistic optimization problem are too expensive. • Use Monte Carlo Sampling (MCS) method. • Define a space comprising possible input values • IG ={ETi,p :i∈N,p∈R}. • Take an independent sample randomly from the space • PG =fsmp(IG) ={ti,p :i∈N,p∈R} • Perform deterministic computation using the sample input (store the result) • ΩG =Static_SchedulingHEFT(G,PG) • Repeat 2 and 3 till some exit condition (no. of repetitions) • Aggregate the stored results of the individual computations into the final result.**MCS Based Scheduling**• Complexity: • Depends on the deterministic scheduling algorithm • For HEFT it is O(v + e * r) = O(e*r) • First loop: O(e*r*m) • Second loop: O(e * n * k) • Total = O(e*r*m + e*n*k)**Example**10,000 iterations - production phase (Gaussian Distribution) 200 iterations - selection phase 20% reduction in makespan Absolute increase in algorithm time: 1.2s**Evaluation**• Graphs**Makespan performance evaluation**• Static HEFT (baseline) with Mean ET values • Autopsy – Static HEFT With known ET values • MCS - Static • ReStatic • ReMCS • Graph Generation (random generator of given type) • Task Execution Time for different runs • Select “Mean” for each task. • Use a probability distribution to select actual execution time. The variation is bounded by Quality of Estimation (QoE) (0<QoE<1)**Summary**• It is possible to obtain a good full-ahead static schedule that performs well under prediction inaccuracy, without too much overhead. • MCS, which has a more robust procedure for selecting an initial schedule, generally results in better performance when rescheduling is applied