Stochastic DAG Scheduling Using Monte Carlo Approach for Heterogeneous Computing Systems

Stochastic DAG Scheduling using Monte Carlo Approach Heterogeneous Computing Workshop (at IPDPS) 2012 Extended version: Elsevier JPDC (accepted July 2013, in Press) Wei Zheng Department of Computer Science, Xiamen University, Xiamen, China RizosSakellariou SchoolofComputerScience,TheUniversityofManchester,UK

Previous Presentation (9/06/13) • Research Area: Scheduling workflows under heterogeneous environment with variable performance.

This Presentation

Introduction • General DAG Scheduling assumption: • Estimated Execution time for each task is known in advance. • Several techniques of estimation: e.g. average over several runs • Similarly, estimated data transfer time is known in advance. • A study* has shown, there might be significant deviations in observed performance in Grids. • To address this deviations, Two approaches are prevalent • Just-In-Time (high overhead) • RunTime (static schedule + runtime changes) (hypothesis**: might waste resources and increase makespan if static schedule is not very good) • * A. Lastovetsky, J. Twamley, Towards a realistic performance model for networks of heterogeneous computers, in:M.Ng,A.Doncescu,L.Yang,T.Leng (Eds.), High Performance Computational Science and Engineering, in: IFIP InternationalFederationforInformationProcessing,vol.172,Springer,Boston, 2005,pp.39–57. • ** R.Sakellariou,H.Zhao,A low-cost rescheduling policy for efficient mapping of workflows on grid systems, Sci. Program. 12(4) (2004) 253–262

Problem Addressed • Generating a better (minimize makespan) “Static” schedule based on the stochastic model of the variations in the performance (execution time) of individual tasks in the graph.

Background and Related Work • Heterogeneous Earliest Finish Time heuristic (discussed in the previous presentation) • List based scheduling. • Prioritize tasks based on the “bLevel” (essentially, tasks on the critical path get higher priority) • Once task is chosen, map it to “best” available resource. bLevel(i) = wi + max j∈Succ(i){wi→j +bLevel(j)}

Problem Description • G = (N, E) -> DAG with one entry, one exit node. • R -> set of heterogeneous resources • Eti,p-> Random variable for execution time • Assumption: Network bandwidth is constant. • M -> Makespan = finish time of exit node. Goal: Find schedule Ω to minimize makespan (assign N to R, no overlap, no preemption, no migration)

Methodology • Assumption: Analytical methods that solve the probabilistic optimization problem are too expensive. • Use Monte Carlo Sampling (MCS) method. • Define a space comprising possible input values • IG ={ETi,p :i∈N,p∈R}. • Take an independent sample randomly from the space • PG =fsmp(IG) ={ti,p :i∈N,p∈R} • Perform deterministic computation using the sample input (store the result) • ΩG =Static_SchedulingHEFT(G,PG) • Repeat 2 and 3 till some exit condition (no. of repetitions) • Aggregate the stored results of the individual computations into the final result.

MCS Based Scheduling • Complexity: • Depends on the deterministic scheduling algorithm • For HEFT it is O(v + e * r) = O(e*r) • First loop: O(e*r*m) • Second loop: O(e * n * k) • Total = O(e*r*m + e*n*k)

Example

Example 10,000 iterations - production phase (Gaussian Distribution) 200 iterations - selection phase 20% reduction in makespan Absolute increase in algorithm time: 1.2s

Evaluation • Graphs

Threshold Calculation

Convergence (no. of repetitions)

Convergence

Makespan performance evaluation • Static HEFT (baseline) with Mean ET values • Autopsy – Static HEFT With known ET values • MCS - Static • ReStatic • ReMCS • Graph Generation (random generator of given type) • Task Execution Time for different runs • Select “Mean” for each task. • Use a probability distribution to select actual execution time. The variation is bounded by Quality of Estimation (QoE) (0<QoE<1)

Makespan performance evaluation

Summary • It is possible to obtain a good full-ahead static schedule that performs well under prediction inaccuracy, without too much overhead. • MCS, which has a more robust procedure for selecting an initial schedule, generally results in better performance when rescheduling is applied

Stochastic DAG Scheduling Using Monte Carlo Approach for Heterogeneous Computing Systems

Stochastic DAG Scheduling Using Monte Carlo Approach for Heterogeneous Computing Systems

Presentation Transcript

Monte Carlo Simulation

High Resolution Models using Monte Carlo

Monte Carlo Simulation

Monte Carlo Simulation

Monte Carlo

Monte Carlo tuning using ATLAS data

Monte Carlo Simulation

Monte Carlo Methods

Significance Testing Using Monte Carlo Techniques

Monte Carlo Simulations

Stochastic Linear Programming by Series of Monte-Carlo Estimators

Nonlinear Stochastic Programming by the Monte-Carlo method

A Monte Carlo Approach for Testability Analysis

Monte Carlo Simulation using @Risk

Monte Carlo Integration

Event Simulation using Monte Carlo Methods

Monte Carlo Issues

Modelling stochastic fish stock dynamics using Markov Chain Monte Carlo