540 likes | 762 Views
Power Control for Chip Multiprocessors. Xue Li Oct 27, 2009. Outline. Two ways to control power of chip multiprocessors MPC control with online model estimation Simple closed loop control with risk evaluation.
E N D
Power Control for Chip Multiprocessors Xue Li Oct 27, 2009 ECE 692 2009
Outline • Two ways to control power of chip multiprocessors • MPC control with online model estimation • Simple closed loop control with risk evaluation ECE 692 2009
Temperature-Constrained Power Control for Chip Multiprocessors with Online Model Estimation Yefu Wang, Kai Ma, Xiaorui Wang ECE 692 2009
Introduction • Power and thermal are the major constraints for further throughput improvement of CMP • Peak power consumption of a CMP should be controlled to enable higher computing densities. • The temperature of a CMP should be kept lower than a threshold in case of thermal failures. • Performance delivered per watt needs to be maximized. ECE 692 2009
State of the Art • Power control for CMP • Open-loop search or optimization [Isci’06], [Teodorescu’08], etc. • Highly dependent on the accuracy of the system model • Heuristics [Isci’06], [Meng’08], etc. • No theoretical guarantee of control accuracy/stability • Chip-wide DVFS (Dynamic Voltage and Frequency Scaling) [McGowen’06], [Floyd’07], etc. • Suboptimal in performance • Dynamic thermal management • Heuristics or feedback control theory [Brooks’01], [Skadron’03], etc. • Power and temperature are controlled separately ECE 692 2009
Challenges and Solutions • Multiple cores may need to be manipulated simultaneously to control both power and temperature. Multi-Input-Multi-Output (MIMO) control • Optimal control algorithms need to be designed for power shifting among different cores. Model predictive control (MPC) theory • Different cores may be coupled together. Specific design constraints • Workload is unpredictable at design time. Online parameter estimation • Control accuracy and system stability is critical Theoretically guaranteed control performance and stability ECE 692 2009
Temperature-Constrained Power Control • MIMO control loop invoked periodically • Power monitor sends the chip-level power consumption to the controller • Controller reads temperature and performance metrics of each core • Controller computes new DVFS levels based on MPC control theory • New per-core DVFS levels are sent to the cores • Online model estimator updates the power model ECE 692 2009
Steps of Model Predictive Control • System modeling • Power model • Controller design • MPC controller design • Constrains: • Frequency range • Power budget • Temperature • Other design requirements • System stability analysis ECE 692 2009
Steps of Model Predictive Control • System modeling • Power model • Controller design • MPC controller design • Constrains: • Frequency range • Power budget • Temperature • Other design requirements • System stability analysis ECE 692 2009
System Modeling: Power Model (1) • Power consumption of one core • A • Estimated system parameters • Initial value can be defined by system identification • May change for different workloads • Can be updated by online estimation ECE 692 2009
System Modeling: Power Model (2) • Total power consumption of CMP • Power model validation Total power consumption of the chip ECE 692 2009
Steps of Model Predictive Control • System modeling • Power model • Controller design • MPC controller design • Constrains: • Frequency range • Power budget • Temperature • Other design requirements • System stability analysis ECE 692 2009
Controller Design: MPC Controller • Control objective: minimize the cost function Control accuracy Performance optimization Model prediction Measured from power meter: feedback ECE 692 2009
Controller Design: Constraints (1) • Physical frequency range • Power budget for each core • Other design requirements ECE 692 2009
Controller Design: Constraints (2) • Model between temperature and frequency • Temperature & power • Power & frequency • Temperature constraint ECE 692 2009
Steps of Model Predictive Control • System modeling • Power model • Controller design • MPC controller design • Constrains: • frequency range • power budget • temperature • other design requirements • System stability analysis ECE 692 2009
Controller Design: Stability Analysis • Stability: • Converge to desired bounds from any initial condition • Unknown system gain: • Actual system parameter , estimating system parameter • The bigger range, the better system adaptability. • The system is proved to be stable in a wide range • Uniform workload • 0< g ≤ 8.83 • Different workload • 0 < g1 ≤ 15.7 • 0 < g2 ≤ 17.6 The model can work as long as the real parameter of a system is less than 8.83 times of the value used to design the system. ECE 692 2009
Online Model Estimation • Recursive Least Square (RLS) estimator to update the model periodically • RLS estimator records and • The estimator calculates and • The estimator updates with in the system model ECE 692 2009
System Implementation Power lines (Current signal) USB interface Current probe (1mv/A) ECE 692 2009
Experimentation • Baselines • Empirical results • Control accuracy • Application performance • Temperature constraints • Online model estimator • Simulation results • Control accuracy • Application performance ECE 692 2009
Experimentation • Baselines • Empirical results • Control accuracy • Application performance • Temperature constraints • Online model estimator • Simulation results • Control accuracy • Application performance ECE 692 2009
Priority Per-core DVFS Heuristic based Power > budget DVFS decreases by 1 Power < budget DVFS increases by 1 Improved priority Priority with safety margin MaxBIPS Per-core DVFS Predictive based: uses a typical workload to build a static table offline Exhaustive search from combination of DVFS levels for all cores Baselines Workload sensitive ECE 692 2009
Baselines: MaxBIPS • Define two N*M matrices: Power and BIPS • N: number of cores • M: number of power modes • Fill in the matrices with actual and predictive values • Power: cubic scaling • BIPS: linear scaling • Find out the power and core combination to achieve best BIPS under power budget Actual value Power Matrix BIPS Matrix If power budget is 32, last one will be selected ECE 692 2009
Experimentation • Baselines • Empirical results • Control accuracy • Application performance • Temperature constraints • Online model estimator • Simulation results • Control accuracy • Application performance ECE 692 2009
Empirical Results: Control Accuracy (1) • Comparison of steady state errors • Steady state error: violation of power budget at different power level. • MPC follows the set point well. ECE 692 2009
Empirical Results: Control Accuracy (2) • MPC V.S. MaxBIPS / Priority / Improved Priority Oscillates around the set point Much lower than the set point Fits well Exceeds the budget at times ECE 692 2009
Empirical Results: Application Performance • SPEC performance between MPC, MaxBIPS and improved priority under different power budgets. • MPC achieves better performance because MPC can precisely achieve the set-point power. • Average improvement of MPC is 9.69% over MaxBIPS and 8.95% over Improved Priority. ECE 692 2009
Empirical Results: Temperature Constraints • Emulate a thermal emergency by lowering the temperature constraint • Figure (a) shows that the temperature of cores are quickly constrained to the lower bound. • Figure (b) shows that the temperature constraints works effectively to reduce power consumption. ECE 692 2009
Empirical Results: Online Model Estimator • MPC V.S. MPC with estimator • Workload may change significantly at run time. • Estimator can correct system parameters dynamically. • MPC without estimator suffers large oscillations. ECE 692 2009
Experimentation • Baselines • Empirical results • Control accuracy • Application performance • Temperature constraints • Online model estimator • Simulation results • Control accuracy • Application performance ECE 692 2009
Simulation Results: Control Accuracy • Simulation with more cores (4, 8, 16) • Average power and standard deviation of different control method. • MPC precisely converges to the budget. MaxBIPS’ absence of 16 due to exponentially increase of static prediction table ECE 692 2009
Simulation Results: Application Performance • SPEC benchmark performance comparison under different number of cores (Set point = 95%, 85%) ECE 692 2009
Conclusion • A temperature-constrained chip-level power controller • Designed based on MPC control theory • Accurately controls power consumption • Temperatures of the cores are limited to stay below the constraint. • An online model estimator periodically updates the system model • Compared with state-of-the-art work • More accurate power control • Better application performance ECE 692 2009
Multi-Optimization Power Management for Chip Multiprocessors Ke Meng, Russ Joseph, Robert P. Dick Northwestern University Li Shang University of Colorado ECE 692 2009
Introduction • Power is still a first-class design constraint in CMP era. • Higher transistor density • Higher leakage power • Power is still a precious computing resource • When power is limited, maximizing the chip-wide performance requires global and local coordination. High power density Thermal Issues ECE 692 2009
System Framework Soft-limit budget Select power optimizations and allowable power modes Analyze, search and tune Collect data from sensors and counters; calculate power /performance. ECE 692 2009
Optimization Pool (1) • DVFS • Simple models • Frequency: linear with voltage • Power: changes cubically with voltage • Performance: roughly linear with frequency • High efficiency • Cubical relationship between frequency and power ECE 692 2009
Optimization Pool (2) • Cache resizing • Large leakage: big savings • Workload variety: unused private capacity ECE 692 2009
Models and Experimentations • Models • Dynamic voltage / frequency scaling (DVFS) • Cache resizing • Unified analytic models • Risk evaluation • Search algorithms • Experimentation • Configuration • Model validation • Model evaluation • Power violation ECE 692 2009
Models and Experimentations • Models • Dynamic voltage / frequency scaling (DVFS) • Cache resizing • Unified analytic models • Risk evaluation • Search algorithms • Experimentation • Configuration • Model validation • Model evaluation • Power violation ECE 692 2009
Analytic Models: DVFS • DVFS modeling • CPI stack counters: counts computing stalls and L2 miss stalls • Computing stalls: changes with frequency • L2 miss stalls: constant in spite of frequency • Performance model Power: Cubic with frequency ECE 692 2009
Analytic Models: Cache Resizing • Cache resizing modeling • Non-stall cycles • Stall cycles due to cache misses Power: Average leakage power of a cache way times number of active ways ECE 692 2009
Analytic Models: Unification • Unified analytic models with DVFS and cache resizing • Performance • Weak interaction among multiple optimization allow independent speed-ups • Power • DVFS has a strong influence • Additive contribution of cache resizing ECE 692 2009
Analytic Models: Risk Evaluation • Why to do risk evaluation? • Some optimizations are more prone to phase adjustment. • Severe performance loss and power violation. • How to do risk evaluation? • DVFS: assume zero risk. • Cache resizing: cache activities variation threshold. ECE 692 2009
Analytic Models: Search Algorithms(1) • Brute-force search • Traverse all possible power modes • Always find the best combination • Slow when search space are large • Greedy search • Take currently best step available • Current best step: power mode with the maximal delta power/performance ratio. • Fast • Can get stuck in local minima Results show it happens rarely ECE 692 2009
Models and Experimentations • Models • Dynamic voltage / frequency scaling (DVFS) • Cache resizing • Unified analytic models • Risk evaluation • Search algorithms • Experimentation • Configuration • Model validation • Model evaluation • Power violation ECE 692 2009
Experiment: Configuration ECE 692 2009
Experiment: Model Validation • Cache CPI model validation ECE 692 2009
Experiment: Model Evaluation (1) • Modeling-greedy vs. modeling-global / trial-and-error • Trial-and-error (DVFS + cache resizing): • Starting trial-stage when entering a stable phase • Only works with workloads possessing stable phases (Group C). • Analytical modeling (DVFS + cache resizing): • 8% perf loss vs. 35% power saving • Greedy search works extremely well ECE 692 2009
Experiment: Model Evaluation (2) • Modeling with risk management vs. MaxBIPS • Simple (DVFS + cache resizing): Analytical modeling without risk evaluation. • With risk evaluation: Results either better or almost unchanged. • MaxBIPS (only DVFS): Not always the worst. Difficult to manage multiple optimizations Even with risk evaluation, errors can be made before risk being identified. ECE 692 2009