Exploiting Spatial Locality to Improve Disk Efﬁciency in Virtualized Environments

Exploiting Spatial Locality to Improve Disk Efﬁciency in Virtualized Environments Xiao Ling1, Shadi Ibrahim2, Hai Jin1, Song Wu1, Songqiao Tao 1 1Cluster and Grid Computing Lab Services Computing Technology and System Lab School of Computer Science and Technology Huazhong University of Science and Technology 2INRIA Rennes - Bretagne Atlantique Rennes, France

Disk efficiency in virtualized environments • VMs with multiple OSs and applications running on a physical server • Disk I/O utilization impacts I/O performance of applications running on VMs • Disk efficiency depending on exploitation of spatial locality • Disk scheduling exploits spatial locality • Reducing disk seek and rotational overheads But achieving high spatial locality is a challenging task in a virtualized environment

Why difficult? • Complicated I/O behavior of VMs • More than one process running on VMs (e.g. Virtual desktop, data intensive application)--mixed applications • Transparency of Virtualization Streaming App File editing Process A Process B Process C Process D Block layer Lacks : a goral view of I/O access patterns of processes in the virtualized environment Guest OS Guest OS Guest OS Hypervisor Software Shared disk

Shoulders of Giants Studies on improving I/O performance of applications proceed us • Invasive mode scheduling • Selecting the disk scheduler pair within both the hypervisor and VMs according to access pattern of applications[ICPP’11, SIGOPS Oper. Syst. Rev. ’10] • An additional Hypervisor-to-VM interference • Non-invasive mode scheduling • Streaming scheduling [Fast’11], Antfarm[USENIX ATC’06] • All VM with similar read applications • Grabbing bandwidth among VMs • Analysis of data accesses of VMs • Only a specific(one) application is running within a VM

What do we solve? • Considering mixed applications and the transparency feature of virtualization • Exploring the benefit of the spatial locality and regularity of data accesses • Disk scheduling how to exploit spatial locality to maximize disk efficiency while preserving the transparency of virtualization?

Outline • Problem Description • Related Work • Observe Disk Access patterns of VMs • Prediction Model • Design of Pregather • Performance Evalution • Conclusions and Future Work

Difference of Data Access Virtualized Environment Traditional Environment simultaneously accessing different parts of data blocks in the range of VM image space

Experiment settings • Physical server • four quad-core 2.40GHz Xenon processor, • 22GB of memory and one dedicated SATA disk of 1TB • Xen 4.0.1 with kernel 2.6.18 , Ext3 file system • Configuration of VMs • RHEL5 with kernel 2.6.18, Ext3 file system, 1GB memory and 2 VCPU, 12GB virtual disk • Defaut Noop scheduler • workloads • Sysbench-file I/O: sequential read/write, random read/write

Access Patterns of VMs Our observations: • Regions across VMs • requests from the same VM • Sub-regions within VM • different ranges and frequencies of access

Access Patterns of VMs Regional Spatial Locality Sub-regional Spatial Locality Sub-regions without spatial locality

Observations • Special spatial locality • Regional spatial locality->bounded by VM image • Sub-regional spatial locality->access patterns of applications • Ignoring of these spatial locality • Seeking among VM • increasing disk head seeks among sub-regions (e.g. CFQ, AS) • Our goal • taking advantage of special spatial locality to improve physical disk efficiency in the virtualized environment.

How to exploit these spatial locality • Batch Processing requests with special spatial locality with adaptive non-working-conserving mode • Easy capturing regularity of regional spatial locality • Hardly perceiving the regularity of Sub-regional spatial locality due to transparency of virtualization The distribution of sub-regions with spatial locality? Access interval of these sub-regions? Prediction Model

Outline • Problem Description • Related Work • Zoom Disk Access patterns of VMs • Prediction Model • Design of Pregather • Performance Evalution • Conclusions and Future Work

Prediction Model • Challenges • the distribution of sub-regions with spatial locality is changing with time and the access patterns of applications • Interference from background processes running on a VM • different sub-regions may have different access regularity • Analyzing historical data access within a VM image to predict sub-regional spatial locality

Prediction Model-vNavigator • Quantization of Access Frequency • contributions of historical requests for prediction • Temporal access-density of zone

Prediction Model-vNavigator • Explore Sub-regional Spatial Locality • temporal access-density threshold of a VM where • Clustering zones

Prediction Model-vNavigator • Access Regularity of Sub-regional Spatial Locality • The range of a sub-region unit • Future access interval of the sub-region unit is the average access interval where

Design of Pregather • An adaptive non-work-conserving disk scheduling in the hypervisor • whether or not to dispatch the pending request without starving other requests. • How long wait for future request with spatial locality • A spatial-locality-aware heuristic algorithm • the regional spatial locality across VMs and the prediction of sub-regional spatial locality from the vNavigator model • Guide Pregather to make the decision • waiting time is less than seek time

The SPLA Algorithm • Setting timer according to position of disk head • Whether setting Coarse waiting time for regional spatial locality • Whether setting Fine waiting time for sub-regional spatial locality • AvgD(VMx ) <D|neighor VM-LBA of completed request | • no pending request from the current serving VMx • CoarseTimer= • AvgT(VMx ) • pending request from the the current serving VMx • Existing SR(Ui ) including LBA of completed request • FineTimer= • ST (Ui )

The SPLA Algorithm • Dispatching request or continuing to wait • Seektime(closest pending request, completed request) • Within coarse waiting time • Within fine waiting time • till over timer or deadline of pending request or a suitable new request • Dispatch the request and turn off timer • Seektime< • AvgT(VMx ) • Request from VMx OR • Seektime< • ST (Ui ) • LBA of Request in SR(Ui ) • Dispatch the request and turn off timer OR

Implementation of Pregather In Xen-hosted platform Pregather allocates each VM an equal serving time slice and serves VMs in a round robin fashion

Outline • Problem Description • Related Work • Zoom Disk Access patterns of VMs • Prediction Model • Design of Pregather • Performance Evolution • Conclusions and Future Work

Performance Evolution • Goal of Experiments • Verifying the vNavigator model • the overall performance of Pregather for multiple VMs • Evaluating the overhead of memory • Setting Parameters • The size of zone: 2000; prediction window:20ms; λ: 2; • Time slice: 200ms • Benchmark • Sysbench-file I/O, hadoop, tpch

Verification of vNavigator Model • The ratio of successful waiting • VM with Sequential applications has clear sub-regional locality (e.g. success ratio 90.3%) • VM with only random applications has weak sub-regional locality (e.g. success ration 80.4%) 10% 33% 31% 38% 22%

Pregather for Multiple VMs • VMs with Different Access Patterns 1.6x 2.6x

Pregather for Multiple VMs • Disk I/O efficiency for Data Intensive Applications ↑ 26% CFQ ↑ 28%AS ↑38%Deadline At Zero: Pregather: 65% CFQ: 53% AS: 36% ↓18% ↓20%

Pregather for Multiple VMs • Disk I/O efficiency for Data Intensive Applications with other applications Compared with CFQ: Q2: ↓10%, Q19: ↓8%, Sort: ↓12% Pregather: 63%

Pregather for Multiple VMs • Memory Overheads 916KB

Conclusion and Future Work • Contributions • Observing regional spatial locality and sub-regional spatial locality • an intelligent prediction model to predict the regularity of sub-regional spatial locality • Pregather with a spatial-locality-aware heuristic algorithm in the hypervisor to improve disk I/O efficiency without any prior knowledge of applications • Future work • extend Pregather to enable an intelligent allocation of physical blocks • Qos guarantee for VMs

Thanks!

Exploiting Spatial Locality to Improve Disk Efﬁciency in Virtualized Environments

Exploiting Spatial Locality to Improve Disk Efﬁciency in Virtualized Environments

Presentation Transcript

Characterizing and Exploiting Reference Locality in Data Stream Applications

A Component-based Definition of Spatial Locality

In Locality

Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-Optimized Compressed Caching

Exploiting Sequential Locality for Fast Disk Accesses

vGreen : A System for Energy Efficient Manager in Virtualized Environments

Differentiated I/O services in virtualized environments

vGreen : a system for energy efficient computing in virtualized environments

Power-aware Consolidation of Scientific Workflows in Virtualized Environments

In Locality

Transparent Accelerator Migration in Virtualized GPU Environments

Memory Buddies: Exploiting Page Sharing for Smart Colocation in Virtualized Data Centers

Disk-Locality in Datacenter Computing Considered Irrelevant ( and then what? )

Data Deduplication in Virtualized Environments

vSnoop: Improving TCP Throughput in Virtualized Environments via Acknowledgement Offload

Opportunistic Flooding to Improve TCP Transmit Performance in Virtualized Clouds

Impact of Disk Alignment in Virtualized Environments

Storage Management in Virtualized Cloud Environments

Exploiting Subjectivity Classification to Improve Information Extraction

Exploiting Modelling to Improve Decision-making

Exploiting Store Locality through Permission Caching in Software DSMs

Adaptive Control of Virtualized Resources in Utility Computing Environments