1 / 39

Coordinated Performance and Power Management

Coordinated Performance and Power Management. Yefu Wang. Power/Performance Problems in Datacenters. Power related problems Power/thermal control (Capping) Power optimization Performance related problems Performance control Performance optimization Problem Scale Datacenter level

gerik
Download Presentation

Coordinated Performance and Power Management

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Coordinated Performance and Power Management Yefu Wang

  2. Power/Performance Problems in Datacenters • Power related problems • Power/thermal control (Capping) • Power optimization • Performance related problems • Performance control • Performance optimization • Problem Scale • Datacenter level • Cluster level • Server level • Application level

  3. Co-Con: Coordinated Control of Power and Application Performance for Virtualized Server Clusters Xiaorui Wang and Yefu Wang Department of EECSUniversity of Tennessee, Knoxville

  4. Power and Performance Control • Most prior work on power/performance control: control one and optimize the other • Power control: Power capping to avoid power overload or thermal failures due to increasing high server density. • Performance control: provide guarantees for Service-Level Agreements Power measurement Power target Performance measurement Performance target Performance-Oriented Controller Power-Oriented Controller Control Decision (Minimize Power) Control Decision: (Maximize performance) May violate power constraint Performance is not guaranteed Power-oriented [Minerick'02], [Lefurgy'08], [Wang'08], [Juang'05],etc. Performance-oriented [Chase’01], [Chen’05], [Elnozahy’02] , [Sharma’03], [Wang’08], etc.

  5. Coordinated Control of Power and Performance Cluster-level CPU Resource Coordinator VM1 VM2 VM3 VM4 CPU allocation Performance Monitors Performance Controllers Performance Monitors Performance Monitors Performance Monitors Performance Monitors Performance Monitors Performance Monitors … Performance Requirements Power Controller [HPCA’08] Power Budget

  6. Response Time Controller • Workload variation • Frequency variation Error: 50ms Increase 2.4% Response time set point Response Time Controller Response time CPU allocation 750ms 700ms 700ms VM • PID (Proportional-Integral-Differential) controller • System modeling • Controller design • Controller analysis

  7. Response Time Model • PID controller • System modeling • Controller design • Controller analysis • Response time model • System identification • Model orders • Parameters Model Orders and Error

  8. System Identification in Practice open T, "white_noise.log"; while(<T>){ chomp; $rand = int(40 + 10 * $p ); $cpu = 180 -40 -40 -$rand; allocate $cpu; $t=get_response_time; log $cpu, $t; sleep $step; } • Operational point • Linearize the systme model locallly • White noise • Generating a white noise • Least squares method • Given find which makes the model best fits the measured data

  9. Controller Design • PID controller • System modeling • Controller design • Controller analysis • PID controller • Proportional • Integral • Differential • Design: Pole placement Response time set point Error Response time CPU allocation VM

  10. Coordination • PID controller • System modeling • Controller design • Controller analysis • Coordination of the two control loops ? Power control loop works CPU frequency changes Response time model changes Response time control loop still works? 1GHz 3GHz Stability range: Settling time < 24s The control period of the power control loop is selected to be longer than the settling time of the response time control loop.

  11. System Implementation Server2 Server2 • Servers • 2 Intel servers • 2 AMD servers • Storage server (NFS) • VMs • 512Mb RAM, 10Gb storage via NFS, 2 VCPUs • Xen 3.1 with Credit scheduler • CPU allocation: cap in credit scheduler • Workload: • PHP + Apache benchmark Server1 Server4 Storage (NFS)

  12. Response Time Control 700ms Workload increase on VM2 Response time of VM2 is controlled to 700ms by increasing its CPU resource allocation.

  13. Response Time Control • Set point: 700ms • Standard deviation: 51 Change CPU frequency • Set point: 700ms • Standard deviation: 57 Change workload

  14. Coordination: Power Budget Reduction • Compare with baseline: Power control only Power control only: Violation of performance requirements Co-Con Baseline • Performance control only : • Power budget violation • Undesired server shutdown [Minerick'02], [Lefurgy'08], [Wang'08], [Juang'05],etc. Power and response time guarantee

  15. Conclusion • Co-Con: Coordinated control of power and application performance • Simultaneous control of power and performance • Cluster-level power budget guarantee for server racks • Application-level performance guarantee • Effective control despite workload/ CPU frequency variations

  16. No “Power” Struggles: Coordinated Multi-levelPower Management for the Data Center Ramya Raghavendra*, Parthasarathy Ranganathan†, Vanish Talwar†, Zhikui Wang†, Xiaoyun Zhu† *University of California, Santa Barbara †HP Labs, Palo Alto

  17. Rack heterogeneity X VM Enclosure Server CPU CHAOS!! (“Power” Struggle) X VM-res.all Peak thermal power OS-wlm Peak electrical power Average power Vmotion OS-gwlm Local optima LSF X SIM global optima X The Problem X X X X X X performance performance X X X performance X X X X X X X X X

  18. Research Questions • Co-ordination Design • How to ensure correctness, stability, efficiency? • How to make local decisions with incomplete global info? • How to build in support for dynamism? • Implications of Co-ordination • Can we simplify or consolidate controllers? • Do we revisit policies and mechanisms of the controllers? • How sensitive is the design to apps and systems considered?

  19. A “Representative” Subset of Problems • Overlap in objective functions • Overlap in actuators • Different time constants • Different problem formulations

  20. Solution in This Paper • First unified architecture for data center power management • Interfaces and information exchange between loops • Leverages feedback control theory • Evaluation on real-world traces: significant savings • Insights on design trade-offs • Architectural alternatives for various objective functions • Implementation alternatives (time constants and hw/sw) • Mechanisms (p-states, VMs) & policies (pre-emptive, fair-share, …)

  21. System Models Power model: Performance model:

  22. Unified and Extensible Architecture

  23. Coordination VMC: Use "real utilization"; use power budgets as constraints EM: Expose API to GM to change power budget SM:Expose API to EM and GM to change power budget EC:Expose API to SM to change r_ref

  24. Implementation • Not implemented in hardware testbed • Requires many servers • Requires DVFS support • Each controller must be individually configured • Requires real world applications • Simulation • Trace-driven simulation • Power / performance models from real hardware

  25. Results : Benefits from coordination: Compared by a baseline without control

  26. VM Migration vs. Local Power Control Coordinated solution provides the most power savings

  27. Guaranteeing Stability (1) • This paper provides stability guarantee for EC and SM • Server-level performance and power control • Stability of EC • Assumptions • CPU frequency is continues • Frequency demand of workloads is a constant • CPU utilization is defined as • Control law • Stability proof • Since , this paper proves

  28. Guaranteeing Stability (2) • Stability of SM • Assumptions • The settling time of EC is shorter than the control period of SM • Power consumption can be modeled as • Controller • Close loop system • System is stable

  29. Conclusions • Coordination architecture for five individual solutions • Simulations based on close to 200 server traces from realworld enterprise deployments • Compared with non-coordinated solution • Less constraint violations • More power efficient

  30. Critiques to Co-Con • Average response time is not an ideal performance metric • Can be extended to 90-percentile response time • The response time monitor is not perfectly implemented • Only CPU resource is considered • Extension to IO, network, etc. • Evaluation is based on simple workloads • A simple PHP script • Single tier • No IO/database operations

  31. Critiques to No “Power” Struggles • Controllers are highly coupled • Performance model is over simplified • Coordination between VMc and EC is over simplified • How can CPU be allocated to VMs? • How will DVFS affect the performance of multiple VMs? • How about hetorogenous servers? • Lack of implementation in real hardware

  32. Comparison of Two Papers

  33. Q&A Acknowledgments: Some slides are adapted based on the slides of Vanish Talwa

  34. Backup Slides

  35. Cluster-level CPU Resource Coordinator

  36. Response Times and CPU Allocation of the VMs Under Different CPU Frequencies

  37. Response Times and CPU Allocation of the VMs Under Different Workloads

  38. VMC in No “Power” Struggles

  39. Controllers in No “Power” Struggles

More Related