1 / 14

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud. Can Hankendi Ayse K. Coskun Boston University Electrical and Computer Engineering Department IGCC’13, Arlington, VA. This work has been partially funded by VMware, Inc. and MGHPCC seed funds.

remy
Download Presentation

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud Can HankendiAyse K. Coskun Boston University Electrical and Computer Engineering Department IGCC’13, Arlington, VA This work has been partially funded by VMware, Inc. and MGHPCC seed funds.

  2. Energy-Efficiency in Computing Clusters • Energy-related costs are among the biggest contributors to the total cost of ownership. • Consolidating multiple workloads on the same physical node improves energy efficiency.

  3. Multi-threaded Applications in the Cloud Virtualization Layer Virtualization Layer • HPC applications are expected to shift towards cloud resources. • Resource allocation decisions significantly affect the energy efficiency of server nodes. • Non-HPC: • Low utilization • High VM density • HPC: • High utilization • Scalability matters vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU VM VM VM

  4. Outline • Background • Analyses on Performance Isolation • Autonomous Resource Allocation Technique • Results

  5. Background Virtualization Layer Virtualization Layer Virtualization Layer Virtualization Layer Cluster-level Management • VM migration techniques [Bobroff et al. INM’07, Beloglazov et al. MGC’10, VMware DRS] • Infrastructure scale-up/down [Bonvin et al. CCGrid’11] Node-level Management App-1 • Finding best thread mixes • [Frachtenberg et al. TPDS’05, McGregor et al. IPDPS’05] App-0 OS • Finding best application-pairs (e.g. pairing high IPC applications with low IPC ones) • [Dhiman et al. ISLPED’09, Bhadauria et al. ICS’10]

  6. Virtualized System Setup • 12-core AMD MagnyCours Server • 2x 6-core dies attached side by side in the same package • Private L1 and L2-caches for each core • 6 MB shared L3-cache for each 6-core die • Virtualized through VMware vSphere5.1 ESXi hypervisor • 2 Virtual Machines with Ubuntu Server Guest OS

  7. Power and Performance Monitoring • Performance monitoring through virtualized performance counters • Performance counter multiplexing • 100ms sampling period • Guest-OS-level monitoring • Selected performance counters: • CPU cycles, retired instructions, L2-cache misses, L3-cache misses, bus utilization, stall cycles, branch misspredictions, floating-point instructions • System-level power measurement using Wattsup power meter Logger

  8. Performance Isolation on Virtual Systems w/o binding w/ binding • Consolidating multiple workloads can degrade performance due to resource contention • CPU binding and NUMA balancing can mitigate the performance variation Thread-0 Thread-1 Native Native VM w/ NUMA Bal. VM w/o NUMA Bal. CPU CPU Mem Mem VM w/ NUMA Bal.

  9. CPU Resources vs. Performance • Performance benefits from increasing CPU resources vary significantly across PARSEC benchmarks

  10. User-defined Constraints ESXi 5.1 • Users can define and put constraints on the allocation decisions (e.g., minimum throughput, fairness) • Resource allocation routine is designed as a closed-loop controller to satisfy the constraints User-defined Constraints Compute weights Check constraint Monitor Application Adjust CPU Limits

  11. Runtime Behavior w/ Constraints Benchmarks blackscholes dedup Benchmarks blackscholes dedup Benchmarks blackscholes dedup Benchmarks blackscholes dedup Minimum throughput constraint Weight (w) 0.50 0.50 Weight (w) 0.63 0.37 Weight (w) 0.62 0.38 Weight (w) 0.66 0.34 Resource (MHz) 11970 11970 Resource (MHz) 14963 8977 Resource (MHz) 15648 8292 Resource (MHz) 15648-1995 8292+1995

  12. Overall Results • For randomly generated 50 workload sets, the proposed technique together with MPC*Utilization application selection policy improves the energy efficiency by 17% on average.

  13. Increasing Number of VMs • Energy efficiency improvements are 13% lower for higher number of VMs (i.e., 6 VMs) running multi-threaded applications • Application set (12 apps): 2x blackscholes, 2x dedup, 2x vips, bodytrack, canneal, facesim, swaptions, streamcluster, x264

  14. Conclusions ESXi 5.1 • Performance isolation in virtual environments limits the benefits of application-selection based consolidation strategies • We propose a runtime resource management technique based that takes the performance scalability into account • Our proposed technique improves the energy efficiency by 17% over the state-of-the-art techniques

More Related