1 / 25

Blink: Managing Server Clusters on Intermittent Power

Blink: Managing Server Clusters on Intermittent Power. Navin Sharma, Sean Barker, David Irwin, and Prashant Shenoy. Energy ’ s Impact. Datacenters are growing in size 100k servers + millions of cores possible Energy demands also growing Cost of energy is increasing

abram
Download Presentation

Blink: Managing Server Clusters on Intermittent Power

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Blink: Managing Server Clusters on Intermittent Power Navin Sharma, Sean Barker, David Irwin, and Prashant Shenoy

  2. Energy’s Impact • Datacenters are growing in size • 100k servers + millions of cores possible • Energy demands also growing • Cost of energy is increasing • Estimates of >30% of TCO and rising • ≈ ½ the emissions of the airline industry

  3. Reducing Energy’s Impact Solar • Financial: optimize energy budget • Regulate usage for variable prices • Environmental: use green energy • Leverage more renewables Wind • Must regulate energy footprint

  4. Challenge • How do we design server clusters that run on intermittent power? • Power fluctuate independent of workload • Maximize performance subject to available energy

  5. Outline • Motivation • Overview: Blink Abstraction • Application Example: Blinking Memcached • Implemenation: Blink Prototype • Evaluation: Power/workload Traces • Related Work • Conclusion

  6. Running Clusters on Intermittent Power • Short term fluctuations (~minutes) • Smooth power using UPSes • Long term fluctuations • Increase/decrease power consumption • One approach: activate/deactivate servers • But servers maintain memory/disk state… • …that will be unavailable if not transferred/replicated

  7. Server Blinking Blink Interval Blink Interval 100% power (0% duty cycle) • Blinking == a duty cycle for servers • Continuous active-to-inactive transitions • Extends PowerNap (ASPLOS ‘09) • Goal: cap energy use over short intervals • Feasible Today: ACPI S3 (Suspend-to-RAM) • Fast transitions (~seconds) • Large power reductions (>90% peak) 50% power (50% duty cycle)

  8. Blinking Abstraction • Blinking policies: coordinate blinking across servers • Activation: vary active servers • Synchronous: blink servers in tandem • Asymmetric: blink servers at different rates Node 2 (100%) Node 3 (0%) Node 1 (100%) Node 4 (0%) Node 1 (50%) Node 2 (50%) Node 3 (50%) Node 4 (50%) Node 2 (50%) Node 3 (35%) Node 1 (100%) Node 4 (15%)

  9. Proof-of-Concept Example: Memcached • Distributed in-memory cache • Popular optimization for web applications • E.g., Facebook, LiveJournal, Flikr • Smart Clients/Dumb Servers • Simple client hash function maps keysservers Example: hash(key) = 2 Memcached Node #1 Web App DB Memcached Node #2 get(key) Memcached Client . . . Memcached Node #N get_from_db(key)

  10. Activation for Memcached • Initial approach: no memcached modifications • Keys randomly distributed across servers • Problem: which node to deactivate? • Arbitrarily favor keys on active servers Memcached Node #1 get key1 MCD Client obj1 get key2 key1 node 1 Memcached Node #2 miss key2 node 2

  11. Reducing Activation’s Performance Penalty • Popularity-based key migration • Group similarly popular objects on same server • Deactivate “least popular” servers • Invalidates least popular objects • Benefit: higher hit rates • Problem: unfair if many keys have same popularity • No server is “least popular” • Still arbitrarily favors keys on active servers

  12. Synchronous for Memcached • Benefits for Uniform Popularity • Fairness: all keys equally available • Performance: same hit rate as activation • Problem: poor hit rate if keys not equally popular Activation Hit Rate = 80% Synchronous Hit Rate = 50% Activation Hit Rate = 50% Synchronous Hit Rate = 50% (20% requests) (50% requests) (50% requests) (80% requests) (80% requests) (20% requests) (50% requests) (50% requests) Node #1 Node #1 Node #1 Node #1 Node #2 Node #2 Node #2 Node #2 miss miss miss miss miss miss hit hit hit hit hit hit hit hit miss miss miss miss hit hit

  13. Best of Both Worlds: Load-proportional • Blink servers in proportion to load • Balance performance and fairness • Works well for Zipf popularity distributions • Few popular keys, many (equally) unpopular keys High hit rate for (mostly) active popular servers Fair for (mostly) synchronously blinking less popular servers

  14. Blink Prototype Experimental Deployment Field Deployment Power trace via serial port Programmable Power Supply solar wind Battery - - - + + + Power Manager Low Power Node Low Power Node Low Power Node

  15. BlinkCache Implementation • Place proxy between clients and servers • Proxy maintains keyserver table • Tracks key popularity and migrates keys • Modest implementation complexity • No client/server memcached modifications • Added ~300LOC to existing proxy Application Server Backend Server MCD Proxy PHP Server MCD Client MCD Server Power Client Power Manager Application Server Backend Server PHP Server MCD Client MCD Server UPS Power Client

  16. Experimental Setup • Test Deployment • Cluster of 10 low-power nodes • AMD Geode LX (433MHz) CPU, 256 MB RAM • Power consumption well-matched to production • ~100 watts peak production/consumption • Workloads and Metrics • Solar + wind traces from deployment • Zipf popularity distributions (α = 0.6) • Hit Rate (Performance), Standard Deviation (Fairness)

  17. S3 Transition Overhead Blink Prototype

  18. Balancing Performance and Fairness • Activation with key migration (green) • Best hit rate • Load-proportional (red) • Slightly lower hit rate, but more fair

  19. Case Study: Tag Clouds in GlassFish • GlassFish: Java application server that creates tag clouds • Cache dynamically-generated HTML pages in memcached • Each HTML page: 20 requests to a MySQL database

  20. Related Work • Sensor and Mobile Research • Use duty-cycling to reduce energy footprint • Lexicographic (SenSys 09), AdaptiveCycle (SECON 07) • Blink: power delivery infrastructure shared • Energy-efficient Computing • Minimize energy to satisfy a workload • FAWN (SOSP 09), PowerNap (ASPLOS 09) • Blink: optimize workload to satisfy energy budget • Dealing with Variability • Similar to churn in DHTs • Chord (SIGCOMM 01), Bamboo (USENIX 04) • Blink: introduce regulated/controllable churn

  21. Conclusions • Blink: new abstraction for regulating energy footprint • Blinking is feasible in modern servers • Highlight differences in blinking policies • Modify example application (Memcached) for blinking • Modest implementation overhead • Ongoing work • Explore more applications • Distributed storage is more important/challenging • Explore more power profiles • Variable pricing, battery capacities

  22. Questions?

  23. Activation for Memcached • Next approach: alter client hash function • Add/remove servers from hash • Problem: penalizes power fluctuations • Invalidates keys on every change • Consistent hashing: 1/nth keys for single addition/removal get key1 Server #1 obj1 MCD Client key1 key1 node 1 node 1 get key2 obj2 key2 key2 node 1 node 2 Server #2

  24. Uniform Popularity • Synchronous • Same hit rate as activation • More fair than activation

  25. Proxy Overhead • Memcached Proxy • Imposes modest overhead

More Related