Green Governors: A Framework for Continuously Adaptive DVFS

Green Governors: A Framework for Continuously Adaptive DVFS Vasileios Spiliopoulos, Stefanos Kaxiras Uppsala University, Sweden

Introduction • Optimize power efficiency • Reduce power without harming performance • Goal: minimize power efficiency metrics • Energy delay product (EDP), energy delay square product (ED2P) etc. • Exploit memory slack • Applications with many LLC misses  memory becomes bottleneck • Performance insensitive to processor frequency • Scaling frequency down  high energy benefit at low performance cost • Develop analytical models to predict impact of frequency scaling • No empirical parameters • No training period • Suitable for run-time use

Modeling DVFS • Theoretical (work in simulator) • Extend previous Interval-based models (Karkhanis and Smith, ISCA 2004, Eyerman et. al , ACM TOCS, 2010)  Two models for runtime DVFS management • Miss-based & Stall-based models  differ in accuracy and ease of implementation • Estimate energy benefits – performance loss • G. Keramidas, V. Spiliopoulos, and S. Kaxiras. Interval-Based Models for Run-Time DVFS Orchestration in SuperScalar Processors. Proc. of Int. Conference on Computing Frontiers, 2010 • Implementation in real hardware • Apply model for power-performance adaptation in real processors • Case study: Intel Core i7 • Approximate models based on available performance monitoring hardware • Estimate power characteristics of real hardware • V. Spiliopoulos, S. Kaxiras, G. Keramidas "Green governors: A framework for Continuously Adaptive DVFS" International Green Computing Conference (IGCC'11).

LLC Miss (off-chip) Data Miss (on-chip) Inst. Miss (on-chip) Branch MissPred. Steady-State IPC Interval-based Performance Model • Break the execution time of a program to intervals • Steady-state intervals: the IPC is limited by the machine width and program’s ILP • Miss-intervals: introduce stall cycles due to branch mispredictions, on-chip instruction/data misses, LLC misses (off-chip misses) Instr. rate (IPC) cycles 4

LLC Miss (off-chip) Data Miss (on-chip) Instr Miss (on-chip) Branch MissPred. Steady-State IPC Interval-based DVFS Model (step 1) • Miss Intervals and Frequency scaling (time measured in cycles) • Branch-MissPredictions Miss Intervals  • same penalty (in cycles) in all frequencies • On-chip data/instruction Miss-Intervals • same penalty (in cycles) in all frequencies • LLC (off-chip) Miss intervals • for DVFS only account for this interval Instr. rate (IPC) cycles 5

Interval-based DVFS Model (step 2) • LLC Miss Interval and Frequency scaling • Model core frequency scaling as change in memory latency in cycles • Example: memory access time = 100ns f = 1GHz  T = 1ns  mem_lat = 100 cycles f = 500MHz  T = 2ns  mem_lat = 50 cycles 6

Interval-based DVFS Model (step 2) LLC Miss Interval and Frequency scaling Model core frequency scaling as change in memory latency in cycles Mem. latency IQ Drain Ramp-up LLC Miss (off-chip) LLC Miss Full-stall RoB fill Steady-State IPC Instr. rate (IPC) cycles 7 7

Mem. latency Mem. latency IQ Drain Ramp-up Ramp-up LLC Miss RoB fill Steady-State IPC Frequency scaling == Change in memory latency •  Frequency: •  memory latency,  full stall area • Other areas (ROB–fill, IQ-drain and ramp-up) remain intact Instr. rate (IPC) Full-stall cycles 8

DVFS target: Eliminate the slack  Memory latency up to ROB fill time No more available slack due to off chip misses Further reduction performance penalty Mem. latency Mem. latency Instr. rate (IPC) Instr. rate (IPC) IQ Drain Ramp-up Ramp-up Ramp-up LLC Miss LLC Miss Full-stall RoB fill RoB fill Steady-State IPC Steady-State IPC cycles cycles 9

Mem. latency Instr. rate (IPC) IQ Drain Ramp-up LLC Miss Full-stall RoB fill Steady-State IPC cycles Elastic and Non-Elastic Areas • Target: Eliminate “slack” by reducing Memory Latency but: • ROB fill area: DOES NOT shrink  inelastic area • Full-stall, IQ drain and Ramp-up: DO shrink  elastic areas 10

Two Simple Interval-Based Models • Stall-based Model • Fed by in-core information • Assumes all stalls scale with frequency • Disregards ROB fill area • Can be used in real hardware • Miss-based Model • Fed by information from the memory system • Accounts for both elastic-inelastic areas • Required information not available in current hardware 11

Mem. latency LLC Miss RoB fill Steady-State IPC Stall-based Model • Assume (all) stalls scale with f • Not true due to RoB Fill • Exec cycles at f/k: cinit – stalls + (stalls/k) Instr. rate (IPC) cycles stalls 12 12

Mem. latency LLC Miss RoB fill Steady-State IPC Miss-based Model • Assumes whole miss interval scales with f • Exec cycles at f/k: cinit – misses*mem_lat + (misses*mem_lat/k) Instr. rate (IPC) cycles 13 13

Miss1 d Steady-State IPC Miss-based Model, more … • But important implication for overlapping misses! • Stalls of misses under a miss do not scale because of the inelastic Rob fill • Miss based model predicts execution cycles based on the number of clusters of misses Mem. latency Mem. latency d d Instr. rate (IPC) Miss2 Mem. latency Mem. latency cycles 14

Real Hardware Approximations • Cannot apply miss-based model • No cluster of misses counter available • Cannot apply stall-based model as it is • No stalls due to LLC misses counter available • Approximate stall-based model • Approximate LLC stalls with the minimum between all pipeline stalls and worst case stalls due to LLC misses (LLC misses * mem_lat) • Good accuracy • Predict execution time going from fmin to fmax and vice versa • Less than 5% avg error

Measuring power

Power prediction • Previous researchers correlated total power (P = a C f V2 + Pstatic) with performance counter events • We correlate effective capacitance(P = a C f V2 + Pstatic) with performance counter events • Run a set of benchmarks • Compute effective C of benchmark i as • Estimate Ci as • Minimize

Power prediction • Only need to train the model for a single frequency: • Prediction in other frequencies: • Events monitored • Uops executed • L2 misses • L2 accesses • Resource stalls • FP operations • Branch mispredictions

Implementing Linux Frequency Governors • Linux kernel module that selects frequency • Window-based approach • Run application for a time window • Estimate performance (using stall-based model) and power in any frequency • Scale frequency based on policy of interest • Implement different policies • Optimize EDP/ED2P with/without performance constraints • Single & multi-process management • Experimental framework • Intel Core i7 • SPEC2006 benchmark suite

Intel i7 single process (OptEDP)

Intel i7 single process (OptEDPlimit)

Intel i7 multi-process (OptEDP)

Conclusions • DVFS modeling in simulators • Implement the model in real processors • Apply, explain and validate our model for SPEC2006 • Contribution: optimize power efficiency using linux frequency governors • Other uses of the models • PowerSleuth: combine models with phase detection to characterize the power behavior of applications • Future work • Multi-threading applications

Thank You! Any questions? 24

Green Governors: A Framework for Continuously Adaptive DVFS

Green Governors: A Framework for Continuously Adaptive DVFS

Presentation Transcript

Green Politics - PSC 101

Building an Adaptive Enterprise through Customer Focus A New Business Framework and Approach

Authoring of Adaptive (and Adaptable) Educational Hypermedia A 3 EH Course ; day2 part2/2

Green Procurement

Adaptive Headlight System

Adaptive Design Methods in Clinical Trials

Mechanics of Machines- I

A Plea for Adaptive Data Analysis: An Introduction to HHT for Nonlinear and Nonstationary Data

GREEN BUILDING

Adaptive Project Framework

C# 資料庫程式設計

Green Software Engineering

Robot Framework – Basic Level.

Exercising Adaptive Leadership

ZEND FRAMEWORK Advanced Training

M415 Using Adaptive Server Anywhere and UltraLite with Visual Basic

Adaptive Query Processing with Eddies

The Adaptive Immune System

Welcome to the 2007 Governors-elect Training Seminar

The Immune System: Innate and Adaptive Body Defenses: Part B

Mobile Software Development Framework: Adaptive Mobile Applications

Green Algae (Chlorophytes)