1 / 29

Dynamic Placement of Virtual Machines for Managing SLA Violations

Dynamic Placement of Virtual Machines for Managing SLA Violations. Norman Bobroff , Andrzej Kochut , Kirk Beaty Some slide content adapted from Alexander Nus Presented by Jon Logan. Virtual machines are becoming more and more popular throughout our datacenters Servers use electricity

gerodi
Download Presentation

Dynamic Placement of Virtual Machines for Managing SLA Violations

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Placement of Virtual Machines for Managing SLA Violations Norman Bobroff, AndrzejKochut, Kirk Beaty Some slide content adapted from Alexander Nus Presented by Jon Logan

  2. Virtual machines are becoming more and more popular throughout our datacenters • Servers use electricity • Electricity can be expensive! • How do we minimize the number of utilized machines, while meeting our SLA obligations? • Usage patterns of machines are NOT static, and generally change dynamically Motivation

  3. Maximize utilization of active machines • Minimize Service Level Agreement (SLA) violations • Minimize number of active machines • Power off unused machines to conserve cost (electricity) • Essentially, minimize cost while meeting SLA guarantees Goals

  4. All machines are taken offline, and historical usage is used to determine ideal placement Happens very infrequently (~weeks or months) Must interrupt service to relocate Utilization is not consistent in many cases! Demand may vary significantly within the period between allocations Static Allocation

  5. VMs are seamlessly migrated between machines based on predicted demand • Is done rather frequently (~minutes, hours) • Live migration • Minimal (~ms) service disruptions during migration • Allows for allocations to more closely follow demand Dynamic Allocation

  6. Moves a VM image between machines without service interruption • The paper cites a ~45 second transition time • VM must be serialized and transferred over the network • Artificially limits our reallocation period • Can’t reallocate faster than we can migrate! Live Migration

  7. Essentially is a contract between the provider and the customer that states that resources R will be available X% of the time • Violations cost money! • X is usually high (ex. 95%) • VMs do not necessarily use this entire resource allocation at all times, but it must be available should they choose to use it • Ex. VM may be doing batch processing, and only do substantial work between 12:00AM and 1:00AM Service Level Agreement

  8. Workloads are not static! • Try to predict the usage of the VM in a time T • Reallocate machines to be able to meet that predicted usage • Need to be within a certain percentile to meet SLA requirements • Capacity savings is simply • Static Allocation - (Predicted Usage + Error Factor) • Repeat this process every time T Static vs Dynamic Usages

  9. Not all Workloads are created equal • Some tend to be better than others • Constant workloads = bad! • A workload is an ideal candidate for dynamic allocation if • It has strong variability AND • It has strong autocorrelation combined with periodic behavior • Essentially, you need to have a decent degree of variability, and be able to reasonably predict its usage What Workloads Are Best For Dynamic Allocation?

  10. Strongly variable – good • Autocorrelation ~0.8 – good • Weak periodic behavior – bad • Verdict – Good • Large variability offers significant potential for optimization • Strong autocorrelation makes it possible to obtain a low-error predication Workload 3a

  11. Weakly variable - bad • Decaying autocorrelation - bad • Weak periodic behavior – bad • Verdict – Bad • Low variability makes potential gain low • Weak autocorrelation and no periodic component make it difficult to predict demand Workload 3b

  12. Strongly variable – good • Strong Autocorrelation– good • Strong periodic behavior – good • Verdict – Very Good • An ideal case for dynamic allocation Workload 3c

  13. Potential Gain

  14. Determine the periods in demand using ‘common sense’ aided by periodogram(e.g.time-of-day,day of week,…) Decompose the process into deterministic periodic and residual components Di+ ri Estimate the deterministic part using averaging of multiple smoothed historical periods Fit Auto Regressive Moving Average (ARMA) model to the residual process Use the combined components for demand prediction Demand forecast algorithm Ui = Di + ri

  15. Goal is to minimize time averaged number of active servers without violating the SLA agreement • Machines that are not utilized to handle VMs are powered off or put in a low power state • Will be reactivated if/when required (minimally, the next period) • The time to power on & migrate must be less than the period T • Responsible for actual migrations of machines • Placing of VMs is essentially a version of the bin packing problem • NP hard! • We use an approximation, using first-fit Management Algorithm

  16. Measure – Measure usage Forecast – Predict usage for the next window Remap – Relocate machines if necessary Preform this (MFR) at regular intervals Designed to try to predict the “best we can do” Management Algorithm

  17. Management Algorithm Overview

  18. N – virtual machines M – physical machines Cm – Maximum capacity of physical machine fni, k – forcast value for resource demand of VM n at interval i+k R – migration interval Cp(u, o2) – (1-p)-percentile of Gaussian distribution with mean u and variance o2 Key Terms

  19. Management Algorithm

  20. Management Algorithm (2)

  21. Management Algorithm (3)

  22. Management Algorithm (4)

  23. Simulated using traces gathered from hundreds of production servers using various applications • Traces contain CPU, memory, storage, and network • We are only focusing on CPU usage • Samples were collected every 15 minutes • The simulated study • Verifies that the MFR meets SLA targets • Quantifies the reduction of SLA violations • Quantifies the number of saved machines • Explores the relationship between the remapping interval and the gain from dynamic management • Performs measurements to determine properties of a practical infrastructure with respect to migration of VMs Simulations

  24. Overflows vs Number of PMs

  25. Number of Machines vs Overflow Desired Significantly reduces number of machines active

  26. Performance degrades as the migration interval increases Essentially, the prediction is the max usage predicted within the range

  27. The paper only looks at one resource utilization • In this case, CPU utilization • In the real world, you have numerous resources to handle allocations for • Memory, CPU, IO, Network, etc. • Assumes bandwidth between machines is free & unrestricted • Relocating some VMs in some cases may not be worth the cost of relocating the image • Their study size is small • Only 6 physical machines • What if different VMs have different SLA requirements? • What if your PMs had differing hardware? Limitations

  28. Based on the simulated data, it significantly reduces cost to execute virtual machines • Relies on an ideal case of VMs • Predictable and volatile usage • Algorithm could be optimized to reduce the number of VM relocations, or to more optimally schedule • Simulation is too small • The paper claims a 44% average savings in the number of active PMs Conclusion

More Related