Predictions for parallel applications and systems
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

Predictions for Parallel Applications and Systems PowerPoint PPT Presentation


  • 54 Views
  • Uploaded on
  • Presentation posted in: General

Predictions for Parallel Applications and Systems. Sathish Vadhiyar Grid Applications Research Laboratory (GARL). GARL Research. Grid Applications Climate Modeling Gene Mutations Performance Modeling Rescheduling Others Prediction of queue wait times. GARL Research. Grid Applications

Download Presentation

Predictions for Parallel Applications and Systems

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Predictions for parallel applications and systems

Predictions for Parallel Applications and Systems

Sathish Vadhiyar

Grid Applications Research Laboratory (GARL)


Garl research

GARL Research

  • Grid Applications

    • Climate Modeling

    • Gene Mutations

  • Performance Modeling

  • Rescheduling

  • Others

    • Prediction of queue wait times


Garl research1

GARL Research

  • Grid Applications

    • Climate Modeling

    • Gene Mutations

  • Performance Modeling

  • Rescheduling

  • Others

    • Prediction of queue wait times


Rescheduling

Rescheduling

  • The base is a parallel checkpointing library called SRS

  • Checkpointing? – storing application’s state so as to continue from the previous state after interruption

  • Interruption either by a scheduler or system faults

  • SRS allows processor reconfiguration


Application progress

Application Progress

System 1

System 2

Storage


Optimal checkpoint interval

Optimal Checkpoint Interval

  • Storing checkpoints periodically will help in fault-tolerance

  • How periodic?

  • What is the optimal checkpoint interval?

    • More checkpointing will lead to increased checkpoint overhead

    • Less checkpointing frequency will lead to increase times for recovery from failures


Illustration

Illustration


Dynamic determination of optimal checkpointing intervals

Dynamic Determination of Optimal Checkpointing Intervals

  • Start the application on a set of resources

  • Predict the next failure on the set of resources

  • Checkpoint “just before” the next failure

  • The prediction has to be really accurate

  • But no prediction can be 100% accurate


Probability distribution of failures

Probability Distribution of Failures

  • Use a probability distribution of failures on the resources

  • Need to know: The next time of failure with x% certainty

  • But more certainty is also not good


Markov chains

Markov Chains

For parallel M-M checkpointing

In SRS, there is almost no system down phase

For sequential applications

In SRS, transition from state 0 can lead to many states


Garl research2

GARL Research

  • Grid Applications

    • Climate Modeling

    • Gene Mutations

  • Performance Modeling

  • Rescheduling

  • Others

    • Prediction of queue wait times


Motivation for queue wait times

Motivation for Queue Wait Times

  • A Grid consisting of number of batch queues

  • A meta system that will:

    • predict the wait times and execution times of jobs

    • Decide which queue is “most suitable” for the job


What is a good predictor

What is a good predictor?

  • There are number of prediction strategies

  • Evaluating a predictor’s goodness:

    • Mean Absolute Percentage Error (MAPE)

    • Upper bound for actual/predicted

    • Average of (actual-predicted) [absolute error]

    • Absolute error/actual wait time [relative error]

    • Average error/average queue wait time

    • Coefficient of correlation

  • Each of these metrics has flaws


Illustration1

Illustration

Method 1

Method 2

Metric 3 value of Method 1 < Metric 3 value of Method 2

i.e. Method 1 is better


Our goals

Our goals

  • To define useful metrics that can clearly say whether a method is “good” or “bad”

  • Goodness of predictors

    • In terms of absolute wait times

    • In terms of execution times

    • In terms of resource demand


Illustration prediction errors versus absolute wait times

Illustration:Prediction errors versus absolute wait times

y1

x1, y1

(A-P)/A%

f(x)

x2, y2

Wait times


Reality

Reality??


What we want to do

What we want to do…

  • Define metrics that can evaluate a method in the “absolute” sense, not “comparative” sense

    • Stare at a single graph and ask “Is this graph good” as much as possible

  • In some cases, it may just not be possible

    • Use comparisons

  • Evaluate the existing methods on these sets of metrics

  • Come up with a method that performs the best in terms of all of the defined metrics


Garl research3

GARL Research

  • Grid Applications

    • Climate Modeling

    • Gene Mutations

  • Performance Modeling

  • Rescheduling

  • Others

    • Prediction of queue wait times


Motivation

Motivation

  • Certain large computational phases of climate modeling (CCSM) are done only by some processors

  • Load balancing – offload work from these processors to other processors

    • Increased processor utilization

    • Decreased execution time

  • How much offloading?

    • Need to predict workload based on previous computations


What is happening

What is happening…

Proc 0

Proc 1

Proc 2

Proc 3

Proc 4

Phase 1

Phase 2


What should happen

What should happen…

Proc 0

Proc 1

Proc 2

Proc 3

Proc 4

Phase 1

Phase 2

For this, we need to know the workload in phase 1

We predict the workload based on previous time steps


Advantages

Advantages


Garlians

GARLians

  • Yadnyesh Joshi (M.Sc)

  • Karthikeyan Raman (M.Tech, jointly with Prof. Govindarajan)

  • H.A. Sanjay (Ph.D, jointly with Prof. Ravi Nanjundiah, CAOS)

  • Sivagama Sundari (Ph.D)

  • Ashish Srivatsava (Project Assistant)

  • Alumni

    • 1 student intern from INSA, Lyon, France

    • Summer interns

    • Project assistants

    • 2 M.Scs


Questions

Questions ????

Thank U

http://garl.serc.iisc.ernet.in


  • Login