Phase Detection

Phase Detection Jonathan Winter Casey Smith CS 612 04/05/05

Motivation • Large-scale phases exist (order of millions of instructions) • For many programs, if we look at any interesting metric (cache misses, IPC, etc.), we see repeating behavior • Call the regions with similar behavior “phases” • Knowledge of phase-based behavior can be used for adaptive optimization • Current hardware doesn’t exploit phase behaviors • For instance • A region of execution may only need a small cache—save power/increase performance by shrinking • A region of execution may benefit from data structure reorganization Phase Detection 2Jonathan Winter and Casey Smith

Basic Methodology • Identify phase boundaries • Classify phases • Determine what optimizations to perform for each phase When can each step be performed? Run time, compile time, offline Phase Detection 3Jonathan Winter and Casey Smith

Overview • We’ll focus two papers on phase detection • Sherwood, Sair, and Calder, “Phase Tracking and Prediction,” ISCA 2003 • Shen, Zhong, and Ding, “Locality Phase Prediction,” ASPLOS 2004 Phase Detection 4Jonathan Winter and Casey Smith

Sherwood et al. 2003 • Classifies the behavior of a program into phases based on code execution • Finds strong correlations between code execution phases and important performance and energy metrics • Simulates hardware for real-time detection and prediction of phases • Demonstrates usefulness through a variety of optimization techniques made possible by phase detection Phase Detection 5Jonathan Winter and Casey Smith

Definition of a Phase • Previously (stemming from Denning 1972), a phase was defined as an interval of execution where a measured program metric stayed relatively constant. • Sherwood et al. consider all sections of code with similar values for the program metric to be part of the same phase even if the intervals are spread out over the course of the programs execution. Phase Detection 6Jonathan Winter and Casey Smith

Key Program Metrics • Instructions per cycle (IPC), energy, branch prediction accuracy, data cache misses, instruction cache misses, L2 cache misses are all vital statistics for optimizing speed and power consumption Phase Detection 7Jonathan Winter and Casey Smith

Single Unified Metric • Goal: find a single metric that • Uniquely distinguishes phases • Guides optimization and policy decisions • Need some section of code on which to measure this metric—pick 10M instructions • Much longer time span than typical architectural techniques handle • Long enough to capture large-scale behavior • Short enough to capture detailed phase behavior • Size of an OS timeslice Phase Detection 8Jonathan Winter and Casey Smith

Metric for Classification • Based on Basic Blocks • Basic blocks are a section of code with one entry point and one exit point • Basic Block Vector • Count the number of times each basic block is executed in the 10M interval • Entries in the vector are the product of the number times each basic block is executed and the block length (BB1*L1, BB2*L2, BB3*L3, …) • This vector is a signature of the phase which correlates well with other metrics of interest: IPC, cache misses, etc. Phase Detection 9Jonathan Winter and Casey Smith

Advantages of BBVs • Independent of architectural measures and thus unaffected by optimizations • Weighting biases the signatures to more frequently executed instructions • Creates unique signatures which execute the same code but in different proportions Phase Detection 10Jonathan Winter and Casey Smith

Hardware Implementation • Don’t want to store and examine the whole vector: compress to a 32-entry vector (footprint) Phase Detection 11Jonathan Winter and Casey Smith

Visualization of the Footprints Footprints for different intervals of gzip Phase Detection 12Jonathan Winter and Casey Smith

What do we do with our footprint? • Store a small sample of representative footprints as phase signatures • Compare the current footprint to previously stored footprints • If we have a close enough match, we classify them as the same phase • If not, we store the new footprint as the representative member of a new phase Phase Detection 13Jonathan Winter and Casey Smith

Comparing Footprints • To save space, only store the top 6 bits of each entry in the 32-vector • Counters were saturating 24-bit counters • The smallest value that the maximum entry could have would occur if all 10M instructions were distributed evenly across the 32 entries • In this case the top six bits means that a counter value of 10M/32 would have a value of 1 • Distance between footprints is defined as the Manhattan distance: the sum of the absolute difference between corresponding entries in two vectors Phase Detection 14Jonathan Winter and Casey Smith

Finding a Match • If the Manhattan distance is less than a threshold, two footprints are classified as being in the same phase • Determine threshold by false positives/ false negatives as compared to an offline oracle tool. • Threshold of 220 chosen Phase Detection 15Jonathan Winter and Casey Smith

Opportunity • These classification methods are oversimplified • Opportunity to apply better machine learning techniques Phase Detection 16Jonathan Winter and Casey Smith

Within Phase Homogeneity • Within a phase, architectural metrics have nearly constant values (this is what we were aiming for) Phase Detection 17Jonathan Winter and Casey Smith

Phase Prediction • Once we’ve been through an interval, we can identify the phase easily • But we want to know what phase we’re going to go to next • We need to know what phase we will be in before the interval starts in order to perform useful optimizations (such as changing the cache size) Phase Detection 18Jonathan Winter and Casey Smith

Simple Prediction • We could just predict that the next phase would be the same as the current phase • The program tends to change phases more slowly than our 10M intervals, so this actually gives reasonable accuracy • However, we can do better • Note: standard hardware predictors have not been tried (branch prediction, memory disambiguation, etc.) Phase Detection 19Jonathan Winter and Casey Smith

Markov Model Predictor • Phase changes depend on the set of previous phases and the duration of their execution • Phases tend to last many intervals, therefore studying recent previous history doesn’t provide more information than the current state • Need to encode how long we’ve been in the current state • Predict the length of phase to be the same length it was previously Phase Detection 20Jonathan Winter and Casey Smith

Run Length Encoding Phase Detection 21Jonathan Winter and Casey Smith

Opportunity • RLE Markov model is overly simple • Better prediction techniques exist • Make use of the order of previous states rather than just the length of the current state Phase Detection 22Jonathan Winter and Casey Smith

Prediction Accuracy Phase Detection 23Jonathan Winter and Casey Smith

Applications • Frequent Value Locality • Certain data values form bulk of loads • Compress to save energy • Specialize code segments to common values • Dynamic cache size adaptation • Shrink cache size to save energy • Dynamic processor width adaptation • Fetch/Decode/Issue fewer instructions per cycle when IPC will be low anyway Phase Detection 24Jonathan Winter and Casey Smith

Frequent Value Locality Phase Detection 25Jonathan Winter and Casey Smith

Cache Size Adaptation Phase Detection 26Jonathan Winter and Casey Smith

Processor Width Adaptation Phase Detection 27Jonathan Winter and Casey Smith

Summary of BBV method • Divide program into 10M instruction intervals • Characterize each interval by footprint approximation to basic block vector • Classify intervals as phases based on footprint • Predict future phases based on RLE Markov predictor • Use information about phases to improve frequent value locality and optimize cache size and processor width for performance/energy Phase Detection 28Jonathan Winter and Casey Smith

Bottom Line • Classifying phases based on the frequency of executed basic blocks is effective at partitioning the program into regions of homogenous architectural behavior • Significant energy savings with small performance degradation can be achieved by applying phase specific optimizations. Phase Detection 29Jonathan Winter and Casey Smith

Shen et al. 2004 • Defines phases in a totally different way • Phases have variable lengths (not 10M intervals) • Detects phases by finding likely phase boundaries • Uses offline analysis of programs on test inputs to predict behavior on other inputs Phase Detection 30Jonathan Winter and Casey Smith

Metric of Interest • For optimizing cache size, what we really care about is the locality of reference • Measure the locality directly, and classify phases based on that • Independent of optimizations performed: phases recovered are independent of the hardware it runs on. Phase Detection 31Jonathan Winter and Casey Smith

Reuse Distance • Define the reuse distance as the number of distinct data elements (locations in memory) touched between two consecutive references to the same element. • Define the reuse distance at the second reference • Example: abcbbac ---1022 • Also called LRU Stack Distance Phase Detection 32Jonathan Winter and Casey Smith

Overview • Simulate a test run and record reuse distance throughout the program • Use this to separate the program into “phases” • Insert phase markers into binary code • Predict when phase changes will occur • Use information about phases to adjust cache size or other hardware parameters Phase Detection 33Jonathan Winter and Casey Smith

New Definition of Phase • Here, a phase is a unit of repeating behavior, rather than a unit of nearly uniform behavior • A phase change is an abrupt change in the data reuse pattern Phase Detection 34Jonathan Winter and Casey Smith

Reuse Trace Phase Detection 35Jonathan Winter and Casey Smith

Why Offline Analysis? • Compilers cannot fully analyze data locality in programs with indirect referencing or dynamic structures • Hardware methods like the one presented earlier require many severe approximations for real-time analysis • Solution: take method offline and analyze program behavior on test inputs. Phase Detection 36Jonathan Winter and Casey Smith

Phase Detection Process • Record reuse trace • Perform signal processing techniques to extract useful information from the trace • Use the extracted information to find good places for phase transitions Phase Detection 37Jonathan Winter and Casey Smith

1) Record Reuse Trace • Nontrivial programs access data locations so many times that an actual full trace would be overwhelming • Just sample a representative set of memory locations/reuse distances • Threshold to reduce trace size and remove irrelevant data • Throw out short distances (C[i] = C[i] + 2) • Throw out references to nearby memory locations Phase Detection 38Jonathan Winter and Casey Smith

2) Signal Processing • Use wavelet filtering to find abrupt changes in reuse distance for each recorded memory location Phase Detection 39Jonathan Winter and Casey Smith

3) Phase Partitioning • Now we have points representing locations of abrupt changes in reuse distance for individual memory locations • Want to divide the list with two things in mind: • Maximize phase length • Minimize repetitions of memory locations within a phase (no multiple abrupt changes) • Example: abcdeefabdfccabef abcde efabdfc cabef Phase Detection 40Jonathan Winter and Casey Smith

Missing Link • So now we have locations of phase transitions. • How do detect which regions are the same phase? Doesn’t say. • Missing section in paper? • Assume we can somehow classify the regions into phases Phase Detection 41Jonathan Winter and Casey Smith

Phase Markers • We know how often a phase occurs and approximately where its boundaries are • Goal: find markers that tell us when we’re entering a particular phase • For each phase, look for basic blocks that occur once near each of its beginning boundaries, and only near the beginnings of its boundaries. • Use that basic block as a marker to tell when the program enters that phase Phase Detection 42Jonathan Winter and Casey Smith

Using Phases • Now we know what basic blocks signal phase entry points • Run the program with new input • When we enter a phase for the first time, we record how long it lasts and its locality properties • Assume that these properties will hold for all subsequent executions of the same phase Phase Detection 43Jonathan Winter and Casey Smith

Phase Prediction Performance

Negative Examples • Not all programs have phases of repeating behavior that can be identified from test runs Phase Detection 45Jonathan Winter and Casey Smith

Applications • Adaptive Cache Resizing • Potential performance increase • Potential power savings • Memory Remapping • Reorder data in memory to speed up execution Phase Detection 46Jonathan Winter and Casey Smith

Adaptive Cache Resizing • Shrink cache without increasing miss ratio • Phases have repeating behavior, not uniform behavior • Divide phases into 10K intervals • First couple of times we execute a phase follow test properties • Apply those cache sizes to subsequent executions of the phase Phase Detection 47Jonathan Winter and Casey Smith

Cache Size Reductions Phase Detection 48Jonathan Winter and Casey Smith

Cache Size Reductions with 5% Miss Increase Phase Detection 49Jonathan Winter and Casey Smith

Memory Remapping • Reorder data in memory to speed up execution • For example, we might interleave arrays that tend to be accessed together. • Options: • Analyze whole program to find array affinities • Analyze by phase and reorganize data during execution (should take into account cost of remapping, but the authors don’t) Phase Detection 50Jonathan Winter and Casey Smith

Phase Detection

Phase Detection

Presentation Transcript

Phase

PHASE I AND PHASE II PROCESSES

Phase

Phase

planning design phase implementation phase

Phase

PHASE – I & PHASE - II

Phase Detection and Prediction on Real Systems for Workload-Adaptive Power Management

Stationary phase Mobile phase

On The Move Pop-Up Target Detection Phase 2

Phase Detector/Phase frequency Detector

TEVATRON LONGITUDINAL PHASE DETECTION METER

Phase Changes & Phase Diagrams

Phase

Phase

Phase

Feature-level Phase Detection for Execution Trace Using Object Cache

Phase Transformations - Vocabulary Phase Allotropes Phase Transitions Phase Diagram

DLF Capital Greens Phase 1, Phase 2 & Phase 3

Solid phase – aqueous phase equilibrium

Phase 3. Design Phase

Phase Detection

Phase Detection

Presentation Transcript

Phase

PHASE I AND PHASE II PROCESSES

Phase

Phase

planning design phase implementation phase

Phase

PHASE – I &amp; PHASE - II

Phase Detection and Prediction on Real Systems for Workload-Adaptive Power Management

Stationary phase Mobile phase

On The Move Pop-Up Target Detection Phase 2

Phase Detector/Phase frequency Detector

TEVATRON LONGITUDINAL PHASE DETECTION METER

Phase Changes &amp; Phase Diagrams

Phase

Phase

Phase

Feature-level Phase Detection for Execution Trace Using Object Cache

Phase Transformations - Vocabulary Phase Allotropes Phase Transitions Phase Diagram

DLF Capital Greens Phase 1, Phase 2 & Phase 3

Solid phase – aqueous phase equilibrium

Phase 3. Design Phase

PHASE – I & PHASE - II

Phase Changes & Phase Diagrams