1 / 40

Ensemble-level Power Management for Dense Blade Servers

Ensemble-level Power Management for Dense Blade Servers . Partha Ranganathan, Phil Leech Hewlett Packard David Irwin, Jeff Chase Duke University. The problem. Power density key challenge in enterprise environments Blades increasing power density; Data center pushback on cooling.

chad
Download Presentation

Ensemble-level Power Management for Dense Blade Servers

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ensemble-level Power Management for Dense Blade Servers Partha Ranganathan, Phil Leech Hewlett Packard David Irwin, Jeff Chase Duke University

  2. The problem • Power density key challenge in enterprise environments • Blades increasing power density; Data center pushback on cooling • Increased thermal-related failures if not addressed • Problems exacerbated with data center consolidation HP Confidential

  3. Pure infrastructure solutions reaching limits Forced air cooling to liquid cooling? 60+Amps per rack? Large costs for power and cooling Capital costs: E.g., 10MW data center, $2-$4 million for cooling equipment Recurring costs: At data center, 1W of cooling for 1W of power For 10MW data center, $4-$8 million for cooling power Can we address this problem at system design level? Challenges with Traditional Solutions HP Confidential

  4. This Talk: Contributions Address power density at system level Ensemble-level architecture for power management • Manage power budget across collections of systems • Recognize trends across multiple systems • Address compounded overprovisioning inefficiencies • Power trends from 130+ servers in real deployments • Extract power efficiencies at larger scale • Architecture and implementation • Simple hardware/software support; preemptive and reactive policies • Prototype and simulation at blade enclosure level • Significant power savings; no performance loss HP Confidential

  5. Workload Behavior Trends Nominal different from peak (and nameplate) Data from hp.com HP Confidential

  6. Workload Behavior Trends Sum-of-peaks >>> peak-of-sums (system-of-system) Non-synchronized burstiness across systems ~300 ~150 Data from hp.com HP Confidential

  7. Workload Behavior Trends Similar trends on 132 servers in 9 different sites What does this mean? Compounded inefficiencies • Managing power budget for individual peak • 20W blades, 500W enclosures, 10KW racks, … • Managing power budget for ensemble typical-case • 20W blades, 250W enclosures, 4KW racks, … HP Confidential

  8. Functional Architecture • Hardware-software coordination for power control • Provision system for lower power budget • Intelligent software agent • Monitors power of individual blades • Ensures that total power of enclosure not > threshold • Use power throttling hooks in system in rare case of violations Application requirements HP Confidential

  9. Enclosure-level Implementation * Initialization and setup * data gathering/heartbeat checking * Event response HP Confidential

  10. Implementation Choices • Selection of system power budget • What value? • Enforcement strictness? • Thermal provisioning: relaxed • Power provisioning: strict • Power monitoring and control • Power/Temp? Polling/interrupts? Components? • P-states? HP Confidential

  11. Implementation Choices (2) • Policies for power throttling • Assigning power budgets • Preemptive: “ask before you can use more power” • Reactive: “use as much as you want until told you cant” • Choice of servers to (un)throttle • Round-robin, lowest-performance, highest-power, fair-share, … • Power level to (un)throttle • Incremental, deep, … • Resource estimation and polling heuristics HP Confidential

  12. Outline • Introduction • Characterizing real-world power trends • Architecture & Implementation • Evaluation • Conclusions HP Confidential

  13. Prototype Experiments • Experimental test bed with 8 proto blades • 1 GHz TM8000, 256MB, 40GB, Windows (533MHz/0.8V, 600MHz/.925V, 700MHz/1V, 833MHz/1.1V, 1000MHz/1.25V) • Prior blade design + power monitoring support • Firmware changes to BIOS and blade/enclosure controllers • Benchmarks: VNCplay and batch simulations • Measured power and performance • Tradeoffs + Validates implementation + Actual performance and power results -- Hard to model real enterprise traces -- Hard to do detailed design space exploration HP Confidential

  14. Simulator Experiments • High-level model of blade enclosure • Input resource utilization traces • Power/performance models • Configurable architecture parameters • Results validated on prototype • Benchmarks • 9 real enterprise site traces for 132 servers • Synthetic utilization traces of varying concurrency, load, … • Metrics • Total workload performance, per-server performance • Changes in utilization, frequency, MIPS – for peak/idle • Usage of different P-states, impact of delays HP Confidential

  15. Results • Significant enclosure power budget reductions • 10-20% @ enclosure level, 25-50% @ processor level • Higher savings possible with other P-state controls • Marginal impact on performance (less than 5%) • Preemptive competitive to reactive HP Confidential

  16. Interactive Applications Minimal impact on latency Vncplay interactive latency CDFs within measurement error HP Confidential

  17. Sensitivity Experiments • Other policy choices • No impact on real workload traces • Throttling few servers at high P-states preferable (vs. throttling many servers at low P-states) • Sensitivity to workload characteristics HP Confidential

  18. Other Benefits • Beyond the enclosure • Cascading benefits at rack, data center, etc. • “Soft” component power budgets for lower cost • e.g.,high-volume high-power vs high-cost low-power CPU • Adaptive power budget control • Heterogeneous power supplies for low-cost redundancy • Average power reduction • e.g., 90th% @ enclosure vs. multiple 90th% @ blades HP Confidential

  19. Summary Critical power density problem in enterprises • Ensemble-level architecture for power management • Manage power budget across collections of systems • Recognize trends across multiple systems • Address compounded overprovisioning inefficiencies Real world power analysis (130+ servers in 9 sites) • Dramatic differences between sum of peaks and peak of sums Architecture and implementation • Simple hardware/software support; preemptive and reactive policies Prototype and simulation at blade enclosure level • Significant power savings; no performance loss Other benefits in component flexibility, resiliency, … HP Confidential

  20. Questions? Speaker contact:Partha.Ranganathan@hp.com HP Confidential

  21. Backup Slides HP Confidential

  22. Hp.com desktop1 sap1 ecomm2 desktop2 pharma worldcup ecomm1 sap2 HP Confidential

  23. HP Confidential

  24. Backup on Simulation HP Confidential

  25. Pre-emptive and Reactive Policies Start with all servers unthrottled At each control period or on interrupt Compute total power consumption Check if power above threshold If yes Prioritize which servers to throttle Throttle each server to decided level Stop when power budget below threshold If no Prioritize which server to unthrottle Unthrottle each server to decided level Stop if power budget likely exceeded Start with all servers throttled At each control period or on interrupt Compute total power consumption Identify servers with “low” utilization Prioritize which servers to throttle Throttle each server to decided level Check if room in power budget If yes Identify servers with “high” utilization Prioritize which servers to unthrottle Unthrottle each server to decided level Stop if power budget likely exceeded If no Stop HP Confidential

  26. Related Work • Single-server power capping • Brooks et al – Capping @ Processor level • Felter et al – Power Shifting • Cluster-level power budget • Femal et al – Throughput per budget, local control • IBM, Duke, Rutgers work on average power • Resource provisioning • Urgaonkar et al – Overbooking resources • Yuan et al – OS-level CPU scheduling for batteries • Cooling work • Moore et al – temperature-aware workload placement • Patel et al – Smart Cooling • Uptime recommendations, … HP Confidential

  27. Future Work • More exploration • E.g., geographically distributed servers • More policies • High-performance workloads • Adaptive power budget variation • Interface with other local and global loops HP Confidential

  28. The problem HP Confidential

  29. A growing problemServer power densities up 10x in last 10 yrs Source: Datacom Equipment Power Trends and Cooling Applications, ASHRAE, 2005, http://www.ashrae.org HP Confidential

  30. 90th Percentile Utilization HP Confidential

  31. Enterprise power challenges:Compute equipment consume power… • Electricity costs • For large data center, recurring costs: $4-$8 million/yr “… energy costs for [data center] building $1.7 million last year...”, Cincinnati Bell, 2003 “… electricity costs large fraction of data center operations…,” Google 2003 • Environmental friendliness • Compute equipment energy use: 22M GJ + 3.9M tons CO2 • EnergyStar (US), TopRunner (Japan), FOE (Switzerland),… “…goal to increase computer energy efficiency by 85% by 2005.” Japan’s “TopRunner” energy program, 2002 HP Confidential

  32. Scratch slides HP Confidential

  33. The problem • Power density key challenge in enterprise environments • Blades increasing power density; Data center pushback on cooling • Increased thermal-related failures if not addressed • 50% server reliability degradation for 10oC over 20oC • 50% decrease in hard disk lifetime for 15oC increase • Problems exacerbated with data center consolidation HP Confidential

  34. Heat Generated .250 KW 10 - 15 KW 1000+ KW .050- 100 KW Energy to Remove Heat 0.005 KW 1 KW 1000 KW .025 KW Costs of Addressing Power Density • Cooling costs large fraction of TCO • Capital costs: • For 10MW data center, $2-$4 million for cooling equipment • Recurring costs: • At data center, 1W of cooling for 1W of power • For 10MW data center, $4-$8 million for cooling power • Similar issues with power delivery • Challenges with routing more than 60 amps per rack • Problems exacerbated by consolidation & blades growth Need to go beyond traditional facilities-level solutions HP Confidential

  35. Our Approach “Ensemble-level” architecture for power management Insight: systems designed for peak usage of individual box but end-user focus on long-term usage of entire solution Solution: Manage power budget across collections of systems • Recognize trends across multiple systems • Extract power efficiencies at larger scale Significant power budget savings HP Confidential

  36. Significant Power Savings Original power budget @ 100W • Processor power down from 100W to 15W (6X) • System power down from 350W to 280W (20%) • Additional benefits if corresponding hooks for memory, etc. • What about performance? New power budget @ 22.5 New power budget @ 15 HP Confidential

  37. Simulator Demo of Operation • Rich simulation infrastructure • Facilitates more extensive design space exploration HP Confidential

  38. Questions? HP Confidential

  39. The problem • Power density key challenge in enterprise environments • Blades increasing power density; Data center pushback on cooling • Increased thermal-related failures if not addressed • Problems exacerbated with data center consolidation HP Confidential

  40. The problem • Power density key challenge in enterprise environments • Blades increasing power density; Data center pushback on cooling • Increased thermal-related failures if not addressed • Problems exacerbated with data center consolidation HP Confidential

More Related