1 / 22

Profile-based Dynamic Voltage Scheduling with Program Checkpoints

Profile-based Dynamic Voltage Scheduling with Program Checkpoints. The COPPER Team: Ana Azevedo, Ilya Issenin, Radu Cornea, Rajesh Gupta , Nikil Dutt, Alex Nicolau, Alex Veidenbaum. The COPPER Context. Compiler-controlled Power-Performance Management

mahina
Download Presentation

Profile-based Dynamic Voltage Scheduling with Program Checkpoints

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profile-based Dynamic Voltage Scheduling with Program Checkpoints The COPPER Team: Ana Azevedo, Ilya Issenin, Radu Cornea, Rajesh Gupta, Nikil Dutt, Alex Nicolau, Alex Veidenbaum

  2. The COPPER Context Compiler-controlled Power-Performance Management • Develop efficient architectural support and compiler techniques for power management • continuously -- as an application runs • targeted for high performance/VLIW machines • Coordinated management of multiple techniques • reduction in power with little or no loss of performance. • Develop techniques for dynamic compilation to actively trade off performance and power consumption • Develop a retargetable, ADL-based, power-aware system simulation capability.

  3. Approach • Compiler Strategies for Power Management • Compiler-directed architectural “configuration” • generate embedded “configuration code” • code “adapts” to new architectural organization at runtime • JIT vs multi-version compilation techniques • dynamic, on-demand optimization • Code annotation for dynamic compilation • trade-off compilation overhead for quality of generated code • Power-use Estimation for Compiler Control • static analysis to select “optimal” configuration • profile-based selection techniques • static or dynamic prediction methods

  4. Power/Performance “Knobs” • Memory hierarchy • Instruction issue logic & issue width for VLIW m/c • Dynamic Register File Reconfiguration • Frequency and Voltage scaling

  5. Timing Constraints • We consider timing constraints as bounds on operation intervals • upper and lower bounds • (determination of optimum interval separation possible statically) • Time constraints specified via checkpoints • User-defined checkpoints are inserted in the source code and time constraints between checkpoints are defined. • The problem addressed here: • Given a profile of power availability and a constraints on specified operation intervals minimize total processor energy consumption while meeting timing and power profile constraints.

  6. Constrained Dynamic F/V Scaling • Power-performance profiling compiler • Estimates max energy/cycle ratio and cycle count between checkpoints • Compiler-inserted (frequency adjustment points) and user-inserted checkpoints (time constraints) • Run-time scheduler • Calculates run-time freq limit based on available power and energy profile between curr. chp. and all possible next chps. • Calculates optimal target freq based on both time constraints and run-time freq limit between curr. chp. and all possible next chps. • Final target freq is selected so that the code runs as slow as possible within the imposed time constraints.

  7. foo(){ read(i); if (i > 5) { i = i - calc_new_i(i); } else a++; } i = 36; for (j = 0; j < i, j++) { k = k*sin(j/100 + k/10); } } calc_new_i(int I){ for (k = 0; k < limit, k++){ i += new_i[k]; show_value(i); } } foo(){ read(i); CHECKPOINT(1); if (i > 5) do { i = i - calc_new_i(i); } else { a++; } i = 36; k = i + a; CHECKPOINT(2); for (j = 0; j < i, j++) { CHECKPOINT (3); k = k*sin(j/100 + k/10); } CHECKPOINT(4); } CDB Checkpoint Min Time Max Time Transition (ms) (ms) 1-2 10 30 2-3 20 20 3-3 50 200 3-4 200 200 (c) Checkpoint Database (CDB). (a) Original code. (b) Transformed foo code with checkpoints 1, 2, 3 and 4 carrying time constraints. Program Checkpoints Program Checkpoints are generated at compile time and indicate places in the code where the processor speed/voltage should be re-calculated; checkpoints also carry user-defined time constraints

  8. Basic Approach • Compiling phase: Checkpoint profiling • Estimate max energy/cycle ratio and cycle count between checkpoints • set time constraints • e.g., devices response time, WCET • Scheduling phase • At program checkpoints and power profile change points, dynamically adjust frequency and voltage

  9. Example Calculating optimal frequency Frequency limit (determined by available power profile) is lower than potential optimal frequency

  10. 0 1 if 2 func 3 4 9 6 5 loop 10 7 8 end end (b) Hierarchical control flow graph. Exploiting Runtime Slack Checkpoint Database (CDB) Checkpoint Max Time Transition (ms) 0-3 50 1-8 300 9-10 10 CHECKPOINT(0); read(i); CHECKPOINT(1); if (i > 5) do { CHECKPOINT(2); i = i - calc_new_i(i); } else { CHECKPOINT(3); a++; } CHECKPOINT(4); i = 36; k = i + a; CHECKPOINT(5); for (j = 0; j < i, j++) { CHECKPOINT (6); k = k*sin(j/100 + k/10); CHECKPOINT (7); } CHECKPOINT(8); (c) Checkpoint Database (CDB). calc_new_i(i){ CHECKPOINT(9); for (k = 0; k < limit, k++){ i += new_i[k]; show_value(i); } CHECKPOINT(10); } (a) Transformed code with checkpoints carrying time constraints (0, 1, 3, 8, 9 and 10) and extra checkpoints for exploiting run-time slack.

  11. Slack-based Checkpointing • Compiling phase • Build a hierarchical CFG (HCFG) program representation • Insert checkpoints at function calls, loops, if-statements • Checkpoint profiling and removal • Estimate max energy/cycle ratio and cycle count between checkpoints, maximum iteration number for loops • Prune the HCFG removing unnecessary checkpoints • Nodes with low maximum execution cycle count • Nodes with small variation in the execution cycle count • Annotate the HCFG with the profiling information • Scheduling • Determine active checkpoint transitions from precomputed information • Estimate the number of cycles from current node to the ends of active time constraints. This is minimum of the statically computed longest path to the time constraint and execution delay update on the profiling information (if available)

  12. 1 6 2 3 4 7 Y1 cycles 5 X1 cycles 10 8 X2 cycles 9 Our Approach: Slack Algorithm • Algorithm at work Current checkpoint(with I iterations left) CDB Time Max Time Constraints 1-9 T1 6-10 T2 Checkpoint Database (CDB) Calculating estimated cycles C Method1: C(7-10) = Y1 C(7-9) = X2+Y1+I*cycle_per_iter Method2: C(7-10) = cycle_per_iter – elapsed(6)C(7-9) = X1 – elapsed(5)

  13. Available Power Power Power Scheduler Profiler Chosen Code Version Hardware Power Config Estimate Parameterizable Cycle-Level Power Models Performance Cycle-by-Cycle Simulator Hardware Access Counts Code Performance Versions Power Simulator Estimate Application Time Constraints Compiler COPPER framework • MIPS R10K like processor, Wattch power models

  14. Results • Power consumption highlighting time constraints for parafffins (f=600 MHz) Power

  15. Results: Slack-based DVS for paraffins • Calculated target frequencies satisfying time and power constraints using Formula 1 for paraffins • Time constraint on checkpoint transition 4-7 52% energy savings Frequency Power

  16. Results • Calculated target frequencies satisfying time and power constraints using Formula 2 for paraffins • Slack-based DVS for paraffins 82% energy savings Frequency Power

  17. Summary • While average power reduction is important, effective control of dynamic power consumption is essential • especially for software management of power and performance • The hard problem here is • identification of effective architectural mechanisms and their deterministic control through software • COPPER approach • use architectural features common to a range of processor architectures • memory hierarchy, register files, instruction issue. • Coordinate with technology and OS strategies • frequency and voltage scaling.

  18. Our Approach : Base Algorithm • Scheduling phase • Create list of events • Calculate frequency limit • Calculate optimal frequency • Case 1: One future checkpoint transition • Case 2: Frequency limit lower than potential optimal frequency • Case 3: Several possible future checkpoints

  19. 800 600 400 Frequency 200 0 0 5 10 15 20 25 30 Time Checkpoint 1 Checkpoint 2 Checkpoint 3 Frequency limit Optimal frequency ch1 - ch2 Optimal frequency ch1 - ch3 Final frequency values Our Approach : Base Algorithm • Calculate optimal frequency (cont’d) a) Calculating optimal frequency, Case 1. One future checkpoint transition (c) Calculating optimal frequency, Case 3.Several possible future checkpoints (b) Calculating optimal frequency, Case 2.Frequency limit lower than potential optimal frequency

  20. Baseline Architecture • A MIPS R10K like processor • 4-wide issue, out-of-order (OOO) processor • 5-stage pipeline: fetch, dispatch, issue, writeback, commit • 32b integers, 64b f.p. numbers • register files: 32 integer and 32 FP registers • 32K L1 instruction cache, 32K L1 data cache • 32B L1 line size, • 512K L2 unified cache • 64B L2 line size • 2 int ALUs, 1 FP adder, 1 FP multiplier • 512-entry BTB, 2K entry branch predictor

  21. Power Management by F/V Scaling • 4 available versions (600MHz,2.2V-500MHz,2.0V-400MHz,1.8V-300MHz,1.6V)

  22. Related Work • DVS Theoretical Studies and Simulations • [Weiser94], [Govil95], [Yassura98], [Lee98], [Pering98], [Mosse00], • Practical DVS Implementations • Transmeta Crusoe, Intel XScale, lpARM • Interval-based and inter-task DVS techniques under OS control • [Weiser94], [Govil95], [Yao95], [Ishihara98], [Hong99], [Manzak00], [Sinha01], [Poulwelse01] • Intra-task DVS techniques under compiler control • [Shin01], [Hsu01], [Krshna00], [Lee00]

More Related