slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications PowerPoint Presentation
Download Presentation
The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications

Loading in 2 Seconds...

play fullscreen
1 / 49

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications - PowerPoint PPT Presentation


  • 243 Views
  • Uploaded on

The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications. Peter A. Dinda Carnegie Mellon University http://www.cs.cmu.edu/~pdinda. High Level Goals.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'The Running Time Advisor A Resource Signal-based Approach to Predicting Task Running Time and Its Applications' - sandra_john


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

The Running Time AdvisorA Resource Signal-based Approach to Predicting Task Running Time and Its Applications

Peter A. Dinda

Carnegie Mellon University

http://www.cs.cmu.edu/~pdinda

high level goals
High Level Goals

Build systems that use statistics to help distributed applications adapt to highly variable resource availability

Focus on information

  • Application-level performance predictions
    • Running time of compute-bound tasks
  • Adaptation advice
    • Host selection to meet soft real-time deadline
  • Resource signal approach
    • Host load signals

This Talk

outline
Outline
  • Bird’s eye view
      • Adapting to highly variable resource availability
      • Dv/QuakeViz
      • Real-time scheduling advisor
      • Running time advisor
      • Confidence intervals
      • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
      • Traces, structure, linear models, evaluation
      • RPS Toolkit
  • Conclusion
a universal challenge in high performance distributed applications
A Universal Challenge in High Performance Distributed Applications

Highly variable resource availability

  • Shared resources
  • No reservations
  • No globally respected priorities
  • Competition from other users - “background workload”

Running time can vary drastically

Adaptation

a universal problem
A Universal Problem

Which host should the application send the task to so that its running time is appropriate?

?

Task

Known resource

requirements

What will the running time be if I...

dv framework for distributed interactive visualization
DV Framework For Distributed Interactive Visualization
  • Large datasets (e.g., earthquake simulations)
  • Distributed VTK visualization pipelines
  • Active frames
      • Encapsulate data, computation, path through pipeline
      • Launched from server by user interaction
      • Annotated with deadline
      • Dynamically chose on which host each pipeline stage will execute and what quality settings to use

http://www.cs.cmu.edu/~dv

example dv pipeline for quakeviz
Example DV Pipeline for QuakeViz

local

display

and

user

Logical View

resolution

contours

ROI

interpolation

isosurface

extraction

Simulation

Output

reading

rendering

scene

synthesis

interpolation

morphology

reconstruction

Physical View

interpolation

isosurface

extraction

scene

synthesis

deadline

deadline

deadline

Active Frame n+2

Active Frame n+1

Active Frame n

?

?

?

real time scheduling advisor
Real-time Scheduling Advisor
  • Distributed interactive applications
      • Examples: CMU Dv/QuakeViz, BBN OpenMap
  • Assumptions
      • Sequential tasks initiated by user actions
      • Aperiodic arrivals
      • Resilient deadlines (soft real-time)
      • Compute-bound tasks
      • Known computational requirements
  • Best-effort semantics
      • Recommend host where deadline is likely to be met
      • Predict running time on that host
      • No guarantees
running time advisor
Running Time Advisor

Predicted Running Time

Application notifies advisor of task’s computational requirements (nominal time)

Advisor predicts running time on each host

Application assigns task to most appropriate host

?

Task

nominal time

real time scheduling advisor10
Real-time Scheduling Advisor

Application notifies advisor of task’s computational requirements (nominal time) and its deadline

Advisor acquires predicted task running times for all hosts

Advisor recommends one of the hosts where the deadline can be met

Predicted Running Time

deadline

?

Task

nominal time

deadline

variability and prediction
Variability and Prediction

Prediction

resource

High Resource

Availability Variability

t

Low Prediction

Error Variability

Predictor

resource

error

t

t

Characterization

of variability

ACF

t

Exchange high resource availability variability

for low prediction error variability

and a characterization of that variability

confidence intervals to characterize variability
Confidence Intervals to Characterize Variability

“3 to 5 seconds with 95% confidence”

Application specifies confidence level (e.g., 95%)

Running time advisor predicts running times as a confidence interval (CI)

Real-time scheduling advisor chooses host where CI is less than deadline

CI captures variability to the extent the application is interested in it

Predicted Running Time

deadline

?

Task

nominal time

deadline

95% confidence

confidence intervals and predictor quality
Confidence Intervals And Predictor Quality

Bad Predictor

No obvious choice

Good Predictor

Two good choices

Predicted Running Time

Predicted Running Time

deadline

Good predictors provide smaller CIs

Smaller CIs simplify scheduling decisions

overview of research results
Overview of Research Results
  • Predicting CIs is feasible
      • Host load prediction using AR(16) models
      • Running time estimation using host load predictions
  • Predicting CIs is practical
      • RPS Toolkit (inc. in CMU Remos, BBN QuO)
      • Extremely low-overhead online system
  • Predicting CIs is useful
      • Performance of real-time scheduling advisor

Measured performance of real system

Statistically rigorous analysis and evaluation

experimental setup
Experimental Setup
  • Environment
    • Alphastation 255s, Digital Unix 4.0
    • Workload: host load trace playback
    • Prediction system on each host
  • Tasks
    • Nominal time ~ U(0.1,10) seconds
    • Interarrival time ~ U(5,15) seconds
  • Methodology
    • Predict CIs / Host recommendations
    • Run task and measure
predicting cis is feasible
Predicting CIs is Feasible

Near-perfect CIs on typical hosts

3000 randomized tasks

predicting cis is practical rps system
Predicting CIs is Practical - RPS System

<2% of CPU At Appropriate Rate

1-2 ms latency from measurement to prediction

2KB/sec transfer rate

predicting cis is useful real time scheduling advisor
Predicting CIs is Useful - Real-time Scheduling Advisor

Host With

Lowest Load

Predicted CI < Deadline

Random Host

16000 tasks

predicting cis is useful real time scheduling advisor19
Predicting CIs is Useful - Real-time Scheduling Advisor

Predicted CI < Deadline

Host With Lowest Load

Random Host

16000 tasks

outline20
Outline
  • Bird’s eye view
      • Adapting to highly variable resource availability
      • Dv/QuakeViz
      • Real-time scheduling advisor
      • Running time advisor
      • Confidence intervals
      • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
      • Traces, structure, linear models, evaluation
      • RPS Toolkit
  • Conclusion
design space
Design Space

Can the gap between the resources and the application can be spanned? yes!

resource signals
Resource Signals
  • Characteristics
      • Easily measured, time-varying scalar quantities
      • Strongly correlated with resource availability
      • Periodically sampled (discrete-time signal)
  • Examples
      • Host load (Digital Unix 5 second load average)
      • Network flow bandwidth and latency

Leverage existing statistical signal analysis and prediction techniques

rps toolkit
RPS Toolkit
  • Extensible toolkit for implementing resource signal prediction systems
  • Easy “buy-in” for users
      • C++ and sockets (no threads)
      • Prebuilt prediction components
      • Libraries (sensors, time series, communication)
  • Users have bought in
      • Incorporated in CMU Remos, BBN QuO
      • Research users: Bruce Lowekamp, Nancy Miller, LeMonte Green

http://www.cs.cmu.edu/~pdinda/RPS.html

prototype system
Prototype System

RPS components can be composed in other ways

research results
Host load on real hosts has exploitable structure

Strong autocorrelation, self-similarity, epochal behavior

Trace database and host load trace playback

Host load is predictable using simple linear models

Recommendation: AR(16) models or better for 1-30 sec predictions

RPS Toolkit for low overhead systems (<2% of CPU)

C++, ported to 5 OSes, incorporated in CMU Remos, BBN QuO

Running time CIs can be computed from load predictions

Load discounting, error covariances

Effective real-time scheduling advice can be based on CIs

Know if deadline will be met before running task

Research Results
outline26
Outline
  • Bird’s eye view
      • Adapting to Highly variable resource availability
      • Dv/QuakeViz
      • Real-time scheduling advisor
      • Running time advisor
      • Confidence intervals
      • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
      • Traces, structure, linear models, evaluation
      • RPS Toolkit
  • Conclusion
questions
Questions
  • What are the properties of host load?
  • Is host load predictable?
  • What predictive models are appropriate?
  • Are host load predictions useful?
overview of answers
Overview of Answers
  • Host load exhibits complex behavior
      • Strong autocorrelation, self-similarity, epochal behavior
  • Host load is predictable
      • 1 to 30 second timeframe
  • Simple linear models are sufficient
      • Recommend AR(16) or better
  • Predictions are useful
      • Can compute effective CIs from them
host load traces
Host Load Traces
  • DEC Unix 5 second exponential average
      • Full bandwidth captured (1 Hz sample rate)
      • Long durations
if host load was random white noise
If Host Load Was “Random” (White Noise)...

Time domain

Autocorrelation

Frequency domain

Spectrogram

host load has exploitable structure
Host Load Has Exploitable Structure

Time domain

Autocorrelation

Frequency domain

Spectrogram

linear time series models
Linear Time Series Models

Pole-zero / state-space models capture autocorrelation parsimoniously

(2000 sample fits, largest models in study, 30 secs ahead)

evaluation methodology
Evaluation Methodology
  • Ran ~190,000 randomly chosen testcases on the traces
    • Evaluate models independently of prediction/evaluation framework
      • No monitoring
    • ~30 testcases per trace, model class, parameter set
  • Data-mine results

Offline and online systems implemented using RPS Toolkit

testcases
Testcases
  • Models
    • MEAN, LAST/BM(32)
    • Randomly chosen model from: AR(1..32), MA(1..8), ARMA(1..8,1..8), ARIMA(1..8,1..2,1..8), ARFIMA(1..8,d,1..8)
evaluating a testcase
Evaluating a Testcase

Measurements in Fit Interval

Model Type

<zt-m,...,zt-2 ,zt-1>

Modeler

z’t+2,t+2+w

z’t+1,t+1+w

z’t,t+w

...

Model

...

...

...

z’t+2,t+4

z’t+1,t+3

Measurements in Test Interval

z’t,t+2

...

z’t+2,t+3

z’t+1,t+2

Load

Predictor

z’t,t+1

...

zt+n-1,…,zt+1 ,zt

Prediction Stream

Error Estimates

Characterization of variation

Evaluator

One-time use

Measurement of

variation

Production

Stream

Error Metrics

measured prediction variance mean squared error

(z’t+i,t+i+w - zt+i+w)2

(z’t+i,t+i+2 - zt+i+2 )2

Measured Prediction Variance: Mean Squared Error

z’t+2,t+2+w

z’t+1,t+1+w

z’t,t+w

...

w step ahead predictions

...

...

...

...

Load

Predictor

z’t+2,t+4

z’t+1,t+3

z’t,t+2

…,zt+1 ,zt

...

2 step ahead predictions

z’t+2,t+3

z’t+1,t+2

z’t,t+1

...

1 step ahead predictions

s2z =

(m - zt+i)2

Variance of z

s2aw=

w step ahead mean squared error

...

...

s2a2=

2 step ahead mean squared error

(z’t+i,t+i+1 - zt+i+1 )2

s2a1=

1 step ahead mean squared error

Good Load Predictor :s2a1,s2a2 ,…,s2aw << s2z

unpaired box plot comparisons
Unpaired Box Plot Comparisons

Inconsistent

low error

Consistent high error

97.5%

Mean Squared Error

75%

Consistent low error

Mean

50%

25%

Model A

Model B

Model C

2.5%

Good models achieve consistently low error

1 second predictions all hosts
1 second Predictions, All Hosts

97.5%

75%

Mean

50%

25%

2.5%

Predictive models clearly worthwhile

30 second predictions all hosts
30 second Predictions, All Hosts

97.5%

75%

Mean

50%

25%

2.5%

Predictive models clearly beneficial

even at long prediction horizons

30 second predictions high load dynamic host
30 Second Predictions, High Load, Dynamic Host

97.5%

75%

Mean

50%

25%

2.5%

Predictive models clearly worthwhile

Begin to see differentiation between models

outline41
Outline
  • Bird’s eye view
      • Adapting to highly variable resource availability
      • Dv/QuakeViz
      • Real-time scheduling advisor
      • Running time advisor
      • Confidence intervals
      • Performance results (feasible, practical, useful)
  • Prototype system
  • Host load prediction
      • Traces, structure, linear models, evaluation
      • RPS Toolkit
  • Conclusion
related work
Related Work
  • Distributed interactive applications
      • QuakeViz/ Dv, Aeschlimann [PDPTA’99]
  • Quality of service
      • QuO, Zinky, Bakken, Schantz [TPOS, April 97]
      • QRAM, Rajkumar, et al [RTSS’97]
  • Distributed soft real-time systems
      • Lawrence, Jensen [assorted]
  • Workload studies for load balancing
      • Mutka, et al [PerfEval ‘91]
      • Harchol-Balter, et al [SIGMETRICS ‘96]
  • Resource signal measurement systems
      • Remos [HPDC’98]
      • Network Weather Service [HPDC‘97, HPDC’99]
  • Host load prediction
      • Wolski, et al [HPDC’99] (NWS)
      • Samadani, et al [PODC’95]
      • Hailperin [‘93]
  • Application-level scheduling
      • Berman, et al [HPDC’96]
      • Stochastic Scheduling, Schopf [Supercomputing ‘99]
conclusions
Conclusions
  • Help applications adapt tohighly variable resource availability
  • Resource signal prediction
  • Predict running times as confidence intervals
    • Predicting CIs is feasible
      • Host load prediction using AR(16) models
      • Running time estimation using host load predictions
    • Predicting CIs is practical
      • RPS Toolkit (inc. in CMU Remos, BBN QuO)
      • Extremely low-overhead online system
    • Predicting CIs is useful
      • Performance of real-time scheduling advisor
future work
Future Work
  • New resource signals
    • Network bandwidth and latency (Remos)
  • New prediction approaches
    • Wavelets, nonlinearity, cointegration
  • Resource scheduler models
    • Better Unix scheduler model
    • Network models
  • Adaptation advisors
  • Applications and workloads
    • DV/QuakeViz, GIMP, Instrumentation
tools venues for future work
Tools/Venues for Future work
  • Resource signal methodolgy
  • RPS Toolkit
  • Remos
  • QuakeViz/DV
  • Grid Forum
future work long term
Future Work (Long Term)
  • Experimental computer science research
  • Application-oriented view
  • Measurement studies and analysis
  • Statistical approach
  • Application services
  • Systems building

systems X applications X statistics

teaching
Teaching
  • “Signals, systems, and statistics for computer scientists”
  • “Performance data analysis”
  • “Introduction to computer systems”