Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud Can HankendiAyse K. Coskun Boston University Electrical and Computer Engineering Department IGCC’13, Arlington, VA This work has been partially funded by VMware, Inc. and MGHPCC seed funds.

Energy-Efficiency in Computing Clusters • Energy-related costs are among the biggest contributors to the total cost of ownership. • Consolidating multiple workloads on the same physical node improves energy efficiency.

Multi-threaded Applications in the Cloud Virtualization Layer Virtualization Layer • HPC applications are expected to shift towards cloud resources. • Resource allocation decisions significantly affect the energy efficiency of server nodes. • Non-HPC: • Low utilization • High VM density • HPC: • High utilization • Scalability matters vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU vCPU VM VM VM

Outline • Background • Analyses on Performance Isolation • Autonomous Resource Allocation Technique • Results

Background Virtualization Layer Virtualization Layer Virtualization Layer Virtualization Layer Cluster-level Management • VM migration techniques [Bobroff et al. INM’07, Beloglazov et al. MGC’10, VMware DRS] • Infrastructure scale-up/down [Bonvin et al. CCGrid’11] Node-level Management App-1 • Finding best thread mixes • [Frachtenberg et al. TPDS’05, McGregor et al. IPDPS’05] App-0 OS • Finding best application-pairs (e.g. pairing high IPC applications with low IPC ones) • [Dhiman et al. ISLPED’09, Bhadauria et al. ICS’10]

Virtualized System Setup • 12-core AMD MagnyCours Server • 2x 6-core dies attached side by side in the same package • Private L1 and L2-caches for each core • 6 MB shared L3-cache for each 6-core die • Virtualized through VMware vSphere5.1 ESXi hypervisor • 2 Virtual Machines with Ubuntu Server Guest OS

Power and Performance Monitoring • Performance monitoring through virtualized performance counters • Performance counter multiplexing • 100ms sampling period • Guest-OS-level monitoring • Selected performance counters: • CPU cycles, retired instructions, L2-cache misses, L3-cache misses, bus utilization, stall cycles, branch misspredictions, floating-point instructions • System-level power measurement using Wattsup power meter Logger

Performance Isolation on Virtual Systems w/o binding w/ binding • Consolidating multiple workloads can degrade performance due to resource contention • CPU binding and NUMA balancing can mitigate the performance variation Thread-0 Thread-1 Native Native VM w/ NUMA Bal. VM w/o NUMA Bal. CPU CPU Mem Mem VM w/ NUMA Bal.

CPU Resources vs. Performance • Performance benefits from increasing CPU resources vary significantly across PARSEC benchmarks

User-defined Constraints ESXi 5.1 • Users can define and put constraints on the allocation decisions (e.g., minimum throughput, fairness) • Resource allocation routine is designed as a closed-loop controller to satisfy the constraints User-defined Constraints Compute weights Check constraint Monitor Application Adjust CPU Limits

Runtime Behavior w/ Constraints Benchmarks blackscholes dedup Benchmarks blackscholes dedup Benchmarks blackscholes dedup Benchmarks blackscholes dedup Minimum throughput constraint Weight (w) 0.50 0.50 Weight (w) 0.63 0.37 Weight (w) 0.62 0.38 Weight (w) 0.66 0.34 Resource (MHz) 11970 11970 Resource (MHz) 14963 8977 Resource (MHz) 15648 8292 Resource (MHz) 15648-1995 8292+1995

Overall Results • For randomly generated 50 workload sets, the proposed technique together with MPC*Utilization application selection policy improves the energy efficiency by 17% on average.

Increasing Number of VMs • Energy efficiency improvements are 13% lower for higher number of VMs (i.e., 6 VMs) running multi-threaded applications • Application set (12 apps): 2x blackscholes, 2x dedup, 2x vips, bodytrack, canneal, facesim, swaptions, streamcluster, x264

Conclusions ESXi 5.1 • Performance isolation in virtual environments limits the benefits of application-selection based consolidation strategies • We propose a runtime resource management technique based that takes the performance scalability into account • Our proposed technique improves the energy efficiency by 17% over the state-of-the-art techniques

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud

Energy-Efficient Server Consolidation for Multi-threaded Applications in the Cloud

Presentation Transcript

Multi-threaded RTOS

Multi-threaded Active Objects

Server Consolidation

Multi-threaded Active Objects

Multi-threaded applications

Debugging Threaded Applications

Multi-Server Cloud Environments

Regression Verification for Multi-Threaded Programs

Energy Efficiency in Cloud Data Centers: Energy Efficient VM Placement for Cloud Data Centers

Energy Efficient Scheduling in IaaS Cloud

Multi-threaded Reachability

Adaptive Energy -efficient Resource Sharing for Multi-threaded Workloads in Virtualized Systems

Multi Threaded Chat Server

Multi-threaded Reachability

Multi-Threaded Transactions

Energy-efficient Multi-tier Web Server Clusters

Parallelism (Multi-threaded)

Energy Efficient Web Server Cluster

Multi-threaded RTOS

Multi-Threaded Video Rendering

Multi-threaded ROOT

Benefits and Drawbacks of Multi-Threaded Server in Java