1 / 59

Energy Management for Servers and Clusters

Energy Management for Servers and Clusters. Robert Deaver Sept. 24 th 2009. Why Manage Energy. Reduce cost Rack energy usage could account for 23%-50% of collocation revenue [ Elnozahy ] Rate tariffs or up-front deposits required by utility companies [J. Mitchell-Jackson] Reduce heat

torgny
Download Presentation

Energy Management for Servers and Clusters

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy Management for Servers and Clusters Robert DeaverSept. 24th 2009

  2. Why Manage Energy • Reduce cost • Rack energy usage could account for 23%-50% of collocation revenue [Elnozahy] • Rate tariffs or up-front deposits required by utility companies [J. Mitchell-Jackson] • Reduce heat • Also reduces cost • Allows higher server density • Reducing heat reduces failures

  3. Two Approaches • Single servers • Energy Conservation Policies for Web Servers: Elnozahy, Kistler, and Rajamony, proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, March 2003 • Server clusters • Multi-mode Energy Management for Multi-tier Server Clusters: Horvath and Skadron, PACT 2008

  4. Energy Conservation Policies for Web Servers • Three policies for energy reduction • Dynamic Voltage Scaling (DVS) • Request Batching • DVS + Request Batching • All 3 policies trade system responsiveness to conserve energy • Results evaluated using simulator and hardware testbed

  5. The Policies • Focus on reducing CPU energy • CPU is the dominant consumer [Bohrer] • CPU exhibits the most variation in energy consumption • Feedback-driven Control Framework • Administrator specifies a percentile-based response time goal • Most experiments use 50ms 90th percentile response-time goal

  6. The Policies: DVS • Varies CPU frequency and voltage to conserve energy while meeting response time requirements • Most beneficial for moderate workload • Not task based! • Task based approach works well for desktop environment but not server environment • Ad-Hoc Controller Yes Is Response time goal being met? Decrease CPU Freq No Increase CPU Freq

  7. The Policies: Request Batching • Delay servicing of incoming • Keep CPU in low power state • Packets accumulate in buffer • When a packet has been kept pending for longer than specified batching timeout wake up and process requests. • If CPUs low power state saves 2.5W and server utilization is 25% it is possible to save 162KJ/day • Most beneficial for very light workload Yes Is Response time goal being met? Increase Batching Timeout No Decrease Batching Timeout

  8. The Polices: Combined • Uses request batching when workload is very light • Uses DVS when workload is moderate

  9. Workloads • Constructed from web server logs • Extended by modifying the inter-arrival time of connections by a scalefactor

  10. Salsa – a simulator • Estimates energy consumption and response time of web server • Based on queuing model built using CSIM execution engine • Models process scheduling and file cache hits and misses • Validated against real hardware

  11. Prototype • Used to validate Salsa • Specs: • 600MHz CPU • 2.4.3 Linux Kernel • Apache web server • Does not place CPU into low power state • Does not use response time feedback control • Salsa is run in “open-loop” mode for validation

  12. Validation: Energy • Batched requests for 11,953s. Salsa predicted 12,373s • 3.5% Error

  13. Validation: Response Time • 4.7% Error

  14. Evaluation: DVS (Response Time) Heavier Workload

  15. Evaluation: DVS (Workload)

  16. Evaluation: Request Batching (Response Time) Heavier Workload

  17. Evaluation: Request Batching (Workload)

  18. Evaluation: DVS vs. Request Batching • Energy savings dependent on workload • Both Policies effective for energy conservation

  19. Evaluation: Combined Policy Finance-12x, 50ms 90th Percentile Response Time Goal

  20. Evaluation: Combined Policy vs. DVS vs. Request Batching

  21. Evaluation: Combined Policy

  22. Faster Processors • Current CPU clock rates >> 600 MHz • DVS savings (% energy consumed) remains same • Request Batching savings increase • Results have not been validated against real hardware

  23. Faster Processors: Simulation Results

  24. Related Work • DVS • CPU utilization over intervals used to predict future utilization [Govil][Weiser] • CPU Freq/Voltage set on per task basis [Flautner] • Perform well for desktop systems but not in server environment • Simulation • Wattch, microprocessor power analysis tool [Brooks et. al.] • PowerScope, tool for profiling application energy use [Flinn et. al.] • Salsa is substantially faster because it is targeted for web workloads

  25. Conclusions • DVS • Vary CPU frequency and voltage to save energy • Most energy savings with medium workloads • Request Batching • Group requests and process them in batches when server is under-utilized and keep CPU in sleep mode as much as possible • Most energy savings with light workloads • DVS + Request Batching • Best of both policies! • Saves 17%-42% of CPU energy across broad range of workloads

  26. Critique • Request Batching is never compared to policy that uses deep-sleep but does not batch requests • DVFS and Request Batching controllers are ad-hoc solutions, no controls analysis • Only tested on static content

  27. Two Approaches • Single servers • Energy Conservation Policies for Web Servers: Elnozahy, Kistler, and Rajamony, proceedings of the 4th USENIX Symposium on Internet Technologies and Systems, March 2003 • Server clusters • Multi-mode Energy Management for Multi-tier Server Clusters: Horvath and Skadron, PACT 2008

  28. Multi-mode Energy Management for Multi-tier Server Clusters • Use DVS and multiple sleep states to manage energy consumption for a server cluster • Theoretical analysis of power optimization • Validate policies using a multi-tier server cluster • Cluster wide energy savings up to 25% with no performance degradation!

  29. Current Solutions • Focus on active portion of cluster • Dynamic Voltage Scaling (DVS) • Used on a per server basis • Increases power efficiency by slowing down CPU • Dynamic cluster reconfiguration • Load consolidated on subset of servers • Unused servers (after consolidation) are shut down

  30. Related work • Distributing demand to cluster subset [Pinheiro et al.] • PID controller used to compensate for transient demand variations • 45% energy savings • Static web workload was interface-bound with peak CPU utilization of 25% • Assume machines have lower than actual capacity to compensate wakeup latency • Cluster reconfiguration combined with DVS [Elnozahy et al.] • Assume a cubic relation between CPU frequency and power • Very different results due to different power model

  31. Outline • Models • Energy Management and optimization • Policies • Experiments and Analysis

  32. System Model • Multi-tier server cluster • All machines in one tier run same application • Requests go through all tiers • End to end performance is subject to a Service Level Agreement (SLA) • Assumptions • All machines in a single tier have identical power and performance characteristics • Load balancing within a tier is perfect. Required for analytical tractability. Observations show moderate imbalances are insignifficant!

  33. Power Model • Obtained power model through power measurements of a “large pool of characterization experiments” varying Ui and fi • Power usage is approximately linear

  34. Power Model • Pi: Power • Ui: Utilization • fi: Frequency • Parameters aij are found through curve fitting • Test system had average error of 1%

  35. Service Latency Model (SLM) • Service latency of short requests is mostly a function of CPU utilization • Offered load λi can be estimated by • Prediction • Estimate current λifrom measurements • Predict Ui based on λi • SLM obtained via regression analysis using a heuristically decided format

  36. Outline • Models • Energy Management and Optimization • Policies • Experiments and Analysis

  37. Multi-mode Energy Management • Must consider active and idle (sleeping) nodes • Minimization of Etransition less important as workload fluctuations for Internet servers fluctuate on larger time scale

  38. Active Energy Optimization • Assigns machines to tiers • Determines their operating frequencies • Energy management strategy is optimal iff: • Total power consumption is minimal • SLA is met

  39. Sleep Energy Optimization • Servers may support up to n sleep states (S-states) • Assumptions: • Workload spikes are unpredictable and arbitrarily large spikes are not supported • Maximum Accommodated Load Increase Rate (MALIR, σ) is defined to ensure system can meet target SLA • Esleep minimized by placing each unallocated server in the deepest possible state subject to MALIR constraint Power Level (pi) Wake-up Latency (ωi) S0 S1 … Sn

  40. Feasible Wakeup Schedule • Minimum number of servers for each sleep state? • If load increases with rate σ, cluster must wake up machines in to respond! • Feasible wakeup schedule exists iff: Zzz c: cluster capacity d: demand assume: c(t0), d(t0) are known

  41. Spare Servers • Optimal number of spare servers for each sleep state: • Discretized: Note: Derivation included in paper

  42. Outline • Models • Energy Management and Optimization • Policies • Experiments and Analysis

  43. Active capacity Policy • Brute force: • Exhaustive search of all possible cluster configurations • Does not scale to large clusters! • Heuristic approach: • Assumes never save power by powering on a machine and lowering cluster CPU frequency [Pinheiro et al.] • Takes 2 rounds of calculations • Similar to queuing theory based approach by Chen et al.

  44. Spare Server Policy - Optimal # Idle nodes in S0 > S0* S1 Place Idle nodes in an S state S2 Yes … Sn - 1 Sn No Done

  45. Spare Server Policy - Demotion • Maintain list that contains count of idle machines and time each smaller count was first seen. • During each control period • The list is used to determine the optimal number of machines for each sleep state • Working from state on to deeper states, nodes are demoted to states that have a deficit of machines, starting with the deepest state

  46. Spare Server Policy - Demotion List of timestamps initialized empty t0 t0 idle_since idle_since

  47. Spare Server Policy - Demotion 6 Machines Idling t10 t0 idle_since idle_since t1 t2 t3 t3 t10 t10 # machines idling – sizeof(idle_since)

  48. Spare Server Policy - Demotion 2 Machines Idling t10 t0 idle_since idle_since t1 t2 t3 t3 # machines idling – sizeof(idle_since)

  49. Spare Server Policy - Demotion t50 idle_since t10 t20 t30 t30 t50 t50

  50. Spare Server Policy - Demotion 2 t50 idle_since t10 t20 t30 t30 t50 t50 t>t* + wi

More Related