prediction of cpu idle busy activity pattern n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Prediction of CPU Idle-Busy Activity Pattern PowerPoint Presentation
Download Presentation
Prediction of CPU Idle-Busy Activity Pattern

Loading in 2 Seconds...

play fullscreen
1 / 22

Prediction of CPU Idle-Busy Activity Pattern - PowerPoint PPT Presentation


  • 170 Views
  • Uploaded on

Prediction of CPU Idle-Busy Activity Pattern. Author: Qian Diao, Justin Song Presented by: Justin Song Intel Corporation 14 th International Symposium on High-Performance Computer Architecture Salt Lake City, UT - Feb 18, 2008. Agenda. Introduction Usage model

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Prediction of CPU Idle-Busy Activity Pattern' - osman


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
prediction of cpu idle busy activity pattern

Prediction of CPUIdle-Busy Activity Pattern

Author: Qian Diao, Justin Song Presented by: Justin Song

Intel Corporation

14th International Symposium onHigh-Performance Computer Architecture

Salt Lake City, UT - Feb 18, 2008

agenda
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
problem
Problem
  • C-state: CPU idle state (no instr being executed)
  • C-state based CPU power management: potentially big benefit
    • Workloads rarely saturate multi-core CPU
    • C-state technology being matured
      • lower power, higher compute efficiency, Si support
  • How to use C-state: broken
    • Today: only OSPM selects C-state for logical CPU (core/thread)
    • A lot of wrong decisions – performance regression, or power waste
    • Performance concern may prevent deep C-state enabling

ACPI table

Linux C-state policy

  • Case last C:
  • C1: 4 consecutive idles > C2.lat, choose C2 for next C
  • C2: 10 consecutive idles > C3.lat, choose C3 for next C
  • C2/C3: last idle < C2/C3.lat, demote
good prediction helps
GOOD Prediction Helps
  • No worry for perf drop
    • Possible causes for deep C-state to degrade perf
      • Coming C0% too high (e.g. >90%); no headroom to accommodate deep C
        • Equivalent statement: coming idle duration too short; deep C’s latency cannot be amortized)
      • Deep C, under some circumstance, prevents proprietary Si optimization for perf compensation from working
      • Thread context loss
      • On-core/pkg cache flush
  • Deep C-state’s power benefit maximized
our methodology
Our Methodology
  • Modeling problem
    • Use easy-to-observe metrics
    • Need domain knowledge assistance (Si PM optimization)
  • Prediction Model: DBN (Dynamic Bayesian Networks)
    • Generalization of HMM and LDS (KFM)
    • Combine natural mechanism for expressing domain knowledge with efficient algorithms for learning and inference
  • Model evaluation
  • Model simplification
    • For deployment in SW/FW/HW
  • Power benefit / perf impact quantification
agenda1
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
usage model
Usage Model

Use activity prediction result to direct C-state usage and performance compensation

agenda2
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
cpu package activity state
CPU Package Activity State
  • Package activity – all cores idle-busy activity
    • All-core-idle
    • All-core-busy
    • Package partial idle (at least one core idle and one core busy)
      • All other 2^N-2 states (N=# of cores)
  • How PM benefits from the definition
    • Idle-busy (not OSPM selected C-state) pattern reflects workload timing nature
    • Aligned with shared-power-lane design
      • Only when all cores are idle, package’s mem and I/O control logic can go to lower power state
      • Only when at least one core idle, active cores’ performance can only be possibly compensated
      • Break-down of package partial idle  core location information

Quad-core CPU package activity state change over time

cpu idle pattern
CPU Idle Pattern
  • Definition: residency% of each package activity state during an observation time slot
  • How prediction benefits from the definition
    • Prediction of package idle pattern: random variable becomes discrete
      • Prediction of idle duration: hard to use discrete prediction model
    • Single-core’s idle duration prediction cannot help the whole CPU package power saving and performance compensation
      • Hard to know if cores’ idles overlap

Dual-core CPU package activity pattern over time

agenda3
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
prediction algorithm
Prediction Algorithm
  • Kalman Filter Model used for prediction
    • Time series (observed CPU package patterns) is Markov process
    • Observation made every 500us
    • KFM generalized in Dynamic Bayesian Networks
      • Explicit probability definition (Bayesian theory)
      • Good network structure description (graph theory)
  • Algorithm
    • Inputs
      • T observed history CPU package patterns. Each state’s percentage series is defined as an independent variable.
      • A-priori state transition, deviation, observation covariance
    • Interim outputs
      • Hidden conditional probability distribution
    • Final outputs
      • Prediction for (T+1)th CPU package pattern
    • Inference
      • Forward operator (1 to T)
      • Backward operator (T+1 backto 1)
    • Complexity (T: # history observations; N: # of activity states)
      • O(TN^3)
algorithm simplification
Algorithm Simplification
  • 2^N states  3 states (all busy, all idle, partial idle)
  • One step forward and backward computation
    • Forward: storing (T-1)’s intermediate results
    • Backward: just compute (T+1)
  • Complexity of simplified algorithm
    • Best case: O(1)
    • Worst case: O(T), when need to discard history intermediate results and start over

Co-processor based prediction time estimate

agenda4
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
result

For DP CPU, 4 variables: (busy,busy)%, (busy, idle)%, (idle, busy)%, (idle, idle)%; 3 of them are independent; no aggregation for partial idles

Result

Grand-truth value

Predicted value

Distance from grand truth is prediction error

Smoothed follows observed very well

result cont d
Result – Cont’d

All states prediction: useful for location aware optimization

All-busy, all-idle, partial-idle prediction: useful for shared power plane optimization

agenda5
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
benefit analysis method
Benefit Analysis Method
  • Tracing idle-busy events on real quad-core processor
  • Simulate OSPM C-state decision making (baseline)
  • Simulate C-state decision based on prediction result
    • Prediction error injected
  • Cycle-by-cycle C-state’s power and transition energy accumulated
  • Accumulated energy / run time = average power
  • Compare prediction based c-state selection against OSPM baseline
benefit result
Benefit Result

*: power delta < transition energy (if OSPM selects C2/C3) or idle length > C2/C3 latency (if OSPM selects C1)

**: based on power numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real power of our experimentation processor.

***: based on latency numbers in figure 1 (source: ACPI spec [1]). It doesn’t represent real latency of our experimentation processor.

agenda6
Agenda
  • Introduction
  • Usage model
  • CPU idle pattern
  • Prediction algorithm
  • Result
  • Benefit analysis
  • Summary & future work
summary future work
Summary & Future Work
  • Good problem modeling and prediction is key for fully taking advantage of deep C-state’s power benefit
  • KFM model works for CPU package pattern prediction for SPECWeb
  • To evaluate more workloads with more general assumptions