1 / 44

When the Autonomic Cloud Meets the Smart Grid IBM Smarter Planet, 11/20/09

When the Autonomic Cloud Meets the Smart Grid IBM Smarter Planet, 11/20/09. Jeff Chase Duke University. Server Energy. Many s ervers in large aggregations/farms. Data centers (DC) “Warehouse Computers” (WHC) Modular shipping containers These facilities burn a lot of energy.

nickan
Download Presentation

When the Autonomic Cloud Meets the Smart Grid IBM Smarter Planet, 11/20/09

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. When the Autonomic Cloud Meets the Smart Grid IBM Smarter Planet, 11/20/09 Jeff Chase Duke University

  2. Server Energy • Many servers in large aggregations/farms. • Data centers (DC) • “Warehouse Computers” (WHC) • Modular shipping containers • These facilities burn a lot of energy. • 1% to 4%, depending on the study • …Of electricity/carbon • …In the US/world, now/soon • That energycosts a lot of money. EPA2007 report: data centers at 1.5% of US electricity: 60 TWh for $4.5B. Expected to double by 2011.

  3. How much money? • TCO : energy cost exceeds server cost. [Uptime Institute for 2010] • Worldwide server market: $60B • Worldwide server power/cooling: $35B [IDC 2008]

  4. Running to stand still • Their energy demand will grow. • Performance/watt doubles every 2 years. • But capacity demand grows faster. • Their share of electricity/carbon will grow. • Many “low hanging fruit” efficiencies elsewhere • The cost will grow too. • Peak of “easy” oil  more substitution of electricity for transport needs • Even the 450 ppm scenario requires massive reductions in climate-disrupting emissions.

  5. IEA: no reason to fear “peak oil” Something will turn up. It always has.

  6. How to reduce IT energy/cost? • Efficiency first • Reduces OpEx at peak demand level • Reduces CapEx for plant/power/cooling • Static optimization: “simply a matter of engineering” • DC Metric: Power Use Efficiency (PUE) • PUE = total power / power to servers • 1/PUE = Data Center Efficiency (DCE) • High-end: 75% of watts make it to the servers • The rest is cooling, power distribution etc. • Most data centers today are much worse!

  7. Key Distinctions • Energy efficiency • Buy lights that generate more lumens per watt when they are on. • Energy proportionality • Turn lights off when you leave the room. • Burn power only when you need lumens. • Conservation, aka reduced service • Shiver in the dark • Short/cold showers and warm beer

  8. Listen to this man • Goal: “uncompromised” design • Design for radical efficiency • But he says: make the software more efficient! Sure, but… • No scripting languages? • No XML? • Are high-productivity software environments “bad design”? • C: efficiency or conservation? Dr. Amory B. Lovins, rmi.org

  9. Focus: Energy Proportionality Servers are rarely fully utilized. Internet services have periodic and variable load. Source: Akamai [Quershi09] (Bruce Maggs) Source: Google [Barroso/Holzle08]

  10. Focus: Energy Proportionality • Dynamic range • 1 – (idle/peak) • Higher is better • Room to improve! • CPU: 70% + • Server: 50% • Cooling: LOW • Some progress… Source: Google [Barroso/Holzle08]

  11. Focus: Energy Proportionality • Surplus capacity creates an opportunity for dynamic optimization. • Shift load to underutilized resources… • …in some other place • …at some other time. • Key idea: dynamic optimization at the scale of an aggregate can improve proportionality and reduce energy cost. • Reduces OpEx atnon-peakdemand level • But: does not reduce CapExfor plant/power/cooling

  12. Managing Energy and Server Resources in Hosting Centers Jeff Chase, Darrell Anderson, Ron Doyle, Prachi Thakar, Amin Vahdat Duke University

  13. Managing Energy and Server Resources • Key idea: a hosting center OS maintains the balance of requests and responses, energy inputs, and thermal outputs. • US in 2003: 22 TWh ($1B - $2B+) • Adaptively provision server resources to match request load. • Provision server resources for energy efficiency. • Degrade service on power/cooling failures. energy requests responses Power/cooling “browndown” Dynamic thermal management [Brooks] waste heat

  14. Adaptive Provisioning - Efficient resource usage - Load multiplexing - Surge protection - Online capacity planning - Dynamic resource recruitment - Balance service quality with cost - Service Level Agreements (SLAs)

  15. A B C D Energy vs. Service Quality A B Active set = {A,B,C,D} Active set = {A,B} • i<target • Low latency • i=target • Meets quality goals • Saves energy

  16. Energy-Conscious Provisioning • Light load: concentrate traffic on a minimal set of servers. • Step down surplus servers to a low-power state. • APM and ACPI • Activate surplus servers on demand. • Wake-On-LAN • Browndown: can provision for a specified energy target.

  17. Example 2: “Balance of Power” CRAC • Continuous thermal sensors in a data center • Infer “thermal topology” • Place workload to optimize cooling • Dynamic thermal management Temperature Scale (C) Rack Hot spots

  18. The Importance of Being Idle • 100%: • No choices Only midrange has a useful spread between good choices and bad choices. • 0%: • No choices

  19. Temperature-Aware Workload Placement • Less heat recirculation  lower cooling power cost • “Hot spots” can be OK and beneficial, provided heat exits • Avoid servers whose exhaust recirculates Moore05Usenix

  20. Demand Side Management for the Smart Grid • Electricity supply and demand are also highly variable. • “Smart grid” matches supply and demand. • If we have: • variable electricity pricing • surplus server capacity • energy proportionality • …can we place workload to minimize cost? • …without violating SLA?

  21. Peak: $250 per MWh Prices vary across markets: ripe for arbitrage! Trough: $25 per MWh Demand is predictable. http://www.ferc.gov/market-oversight/mkt-electric/pjm/2008/12-2008-elec-pjm-dly.pdf

  22. Demand peaks cheap-and-dirty power

  23. Shape the Demand Curve? • Statistical multiplexing is not enough • Not for networks • Not for smart clouds or smart (electrical) grids • Wide variance in aggregate demand • Congestion  higher price, higher carbon footprint • Demand-side management offers a smoother ride. James Hamilton source: http://perspectives.mvdirona.com/CommentView,guid,e7848cf7-5430-49bf-a3d0-d699bec2a055.aspx

  24. cutting the electric bill for internet-scale systems AsfandyarQureshi(MIT) Rick Weber (Akamai) HariBalakrishnan (MIT) John Guttag (MIT) Bruce Maggs (Duke/CMU/Akamai) Éole @ flickr

  25. context: massive systems Google: • estimated map • tens of locations in the US • >0.5M servers major data center others • thousands of servers / multiple locations • Amazon, Yahoo!, Microsoft, Akamai • Bank of America (≈50 locations), Reuters Qureshi • SIGCOMM • August 2009 • Barcelona • Spain

  26. request routing framework capacity constraints latency goals network topology bandwidth price model performance aware routing best-price performance aware routing map: requests to locations requests electricity prices (hourly) Qureshi • SIGCOMM • August 2009 • Barcelona • Spain

  27. importance of elasticity 2011 PUE & active server scaling off the rack servers Google circa 2008 savings (%) $3M+ 8% $2M 5% $1M+ 3% Idle: 2.0 65% 65% 1.3 1.7 33% 33% 1.3 25% 1.3 1.1 0% 0% 1.0 PUE: energy model parameters increasing energy proportionality Qureshi • SIGCOMM • August 2009 • Barcelona • Spain

  28. The Elasticity of Power • Clouds: “boundless infrastructure on demand” • Elasticity: Grow/shrink resource slices as required 1.Can I have more resources? 2. Here you go: N more servers

  29. Demand Side Management • Reflection in elastic cloud applications: • Adapt behavior based on resource availability • Opportunistically exploit surplus resources • Defer/avoid work during congestion 2. What useful work should I do? 1.Energy is cheap right now. 3. I will use N more servers.

  30. Reflective Control • Reflection in elastic cloud applications: • Adapt behavior based on resource availability • Opportunistically exploit surplus resources • Defer/avoid work during congestion • Requires deeper integrated control 2. What useful work should I do? 1. Energy is getting more expensive now. 3. I will use fewer servers

  31. DSM/Reflection: Challenges • Multiple objectives: deadline, budget, accuracy • How much parallelism for opportunistic/speculative use? • Does it generalize? To what extent can we “factor out” reflective policies from applications? Better What does it require from the “cool cloud”? Faster Cheaper

  32. Workbench-assisted benchmarking • Goal: response surface map: peak rate= • Parallelism • Data dependency at each point in surface • Partition surface arbitrarily: embarrassingly parallel • What experiments to run? • Need notion of experiment utility: u(e) • Highly selective sampling

  33. Better? Cheaper? Faster? • How long should I run each experiment? • Which search techniques can I use? • How to quantify cost? • Do I need more samples? • How many times to repeat each experiment? • Do I have enough resources? Get more or return some? • Wait for more or run lower rank experiments with what I have now? Can I meet my deadline? Will I have sufficient confidence in the result?

  34. Gang Computing Faculty-owned clusters in closets

  35. Gang Computing Aggregation Substrate “socket” “Shareholders” Provider (University OIT)

  36. Gang Computing: Value Flow Ease of use Protect/enhance CapEx Zero OpEx Sharing the surplus Substrate “socket” Economies of scale Control over surplus Enhancement Efficiency Shareholders Provider (University OIT)

  37. Gang Computing: Value Flow Ease of use Protect/enhance CapEx Zero OpEx Surplus access $$$ $ Substrate “socket” Economies of scale Control over surplus Enhancement Efficiency $$$ Shareholders Provider (OIT)

More Related