290 likes | 348 Views
Explore data placement, workload interference, impacts of virtualization, resource utilization, and provisioning in cloud and virtualized environments. Learn about storage I/O insights and issues in modern virtualized settings.
E N D
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing, IBM 2010
Talk Outline • Introduction • Measurement results & Observations • Data Placement & Provisioning • Workload Interference • Impacts of Virtualization • Summary
Cloud & Virtualization • Cloud Environment – Goals • Flexibility in resource configuration • Maximum resource utilization • Pay-per-use Model • Virtualization – Benefits • Resource consolidation • Re-structuring flexibility • Separate protection domains • Virtualization suits as one of the basic foundations of Cloud infrastructures
Fundamental Issues • Could Service Providers (CSPs) vs. Customers • Customers purchase computing resources • CSPs provide virtual resources (VMs) • Customers perceive their resources as physical machines! • Multiple VMs reside in single physical host • Resource Interference • End-user performance depends on other users • End-user unaware of where their data physically exists
Goals of our Measurement • For cloud service providers • How to place data such that end-user performance is maximized ? • How to co-locate workloads for least interference ? • For End-Users • How to purchase resources in tune with requirement ? • How to tune applications for maximum performance ? • General insights on storage I/O in virtualized environments
Benchmarks Used • Postmark • Mail Server Workload • Create/Delete, Read/Append files • Parameters • File Size • # of files • Read/Write ratio • Synthetic Workload • Sequential vs. random accesses • Zipf Distribution
Disk Provisioning Consider a 100GB Disk Case - I Case - II Workload Data footprint ~150MB 40GB Partition 4GB Partition Throughput : 1.4 MB/s Throughput : 2.1 MB/s Performance Difference : 33%
Where to place VM disk ? • Postmark benchmark • Read operation • Cases : • Read from physical partitions in different zones • Based on LBNs • LBNs start from inner zone and proceeds towards outer zones. • Read from disk file (.vmdk)
Where to place multiple VM disks ? • Postmark benchmark • 2 instances (1 for each VM) • Random reads • Compare physical partitions placed in different zones • O -> Outer • I -> Inner • M -> Mid
Observations • Customers should purchase storage based on workload requirement, not price • Thin provisioning may be practiced • Throughput intensive VMs can be placed in outer disk zones • Multiple VMs that may be accessed simultaneously should be co-located on disk • CSPs can monitor access patterns and move virtual disks accordingly
CPU-Disk Interference Physical Host VM - 1 VM - 2 CPU CPU CPU DISK DISK DISK Throughput : 23.4 MB/s Throughput : 27.6 MB/s Performance Difference : 15.3%
CPU-Disk Interference • CPU allocation ratios has no effect on disk throughput across VMs • Disk intensive job performs better along with a CPU intensive job
Reason ? CPU-Disk Interference Dynamic Frequency Scaling
CPU-Disk Interference • CPU DFS is enabled in Linux by default • Three ‘governors’ to control the DFS policy • On-demand (default) • Performance • Power-save • When 1 core is idle, entire CPU is down-scaled because overall CPU utilization falls
Disk-Disk Interference • 1 instance of Postmark in each VMs • 65.3% more time taken when compared to running Postmark in a single VM • Overhead mainly attributed to disk seeks : No more sequential accesses CPU CPU VM-1 VM-2 V.Disk-1 V.Disk-2 Physical Host Physical Disk
Disk-Disk Interference • VMs using separate physical disks • 17.52% more time taken when compared to running Postmark in a single VM • Overhead attributed to contention in Dom-0’s queue structures CPU CPU VM-1 VM-2 V.Disk-1 V.Disk-2 Physical Host Disk - 1 Disk - 2
Disk-Disk Interference • Postmark Benchmark (Reads) • Cases : • Running in a single VM • 1 instance in each of two VMs • 2 VMs reading from virtual disks in same physical disk • 2 VMs reading from virtual disks in different physical disks
Disk-Disk Interference • IO scheduling policy in Dom-0 has less effect • ‘Ideal’ case is time taken when running Postmark in single VM • Other cases are running 1 instance of Postmark in each of 2 VMs (separate physical disks)
Disk-Disk Interference • Interference with respect to workload type • Synthetic read workload • VMs use separate physical disks • Cases : • Mix of sequential versus random reads • Sequential requests from both VMs flood Dom-0 queue - contention
Observations • CPU-intensive and disk-intensive workloads can be co-located for optimal performance and power • Virtual disks that may be accessed simultaneously must be placed in separate physical disks • I/O scheduling in Dom-0 has less effect on disk workload interference • Two sequential workloads, when co-located suffer in performance due to queue contention • With separate disks, workload contention is generally minimal, other than the case of two sequential workloads
Sequentiality • Postmark benchmark (reads) • No much overhead seen for random disk accesses • VM overhead is mitigated by larger disk overhead • More felt for sequential disk accesses
Block Size • Postmark sequential reads • Fixed overhead with every requests • As block sizes increase, # of requests are reduced, hence overhead is reduced • Efficient to read in larger blocks
Observations • VM overhead is not felt in random workloads – amortized by disk seeks • Extra layers of indirection is the reason for VM overhead – when block size is large, overhead is amortized • Block size may be increased only if there is sufficient locality in access
Summary • Storage purchased must depend on requirement, not price! • It is better to place sequentially accessed streams in outer disk zone • Co-locate virtual disks that may be accessed simultaneously • Co-locate CPU intensive task with disk intensive task for better power and performance • Avoid co-locating two sequential workloads on single physical machine – even when it goes to separate physical disks! • Read in large blocks only when there is locality in workload
Questions Contact : sankaran@gatech.edu