VMTorrent : Scalable P2P Virtual Machine Streaming

VMTorrent: Scalable P2P Virtual Machine Streaming Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra, Jason Nieh, and Dan Rubenstein

VM Basics • VM: software implementation of computer • Implementation stored in VM image • VM runs on VMM • Virtualizes HW • Accesses image VM VM Image VMM

Where is Image Stored? VM VM Image VMM

Traditionally: Local Storage Local Storage VM VMM

IaaS Cloud: on Network Storage Network Storage VM VMM VM Image

Can Be Primary Network Storage VM VMM NFS/iSCSI VM Image • e.g., OpenStack Glance • Amazon EC2/S3 • vSphere network storage

Or Secondary Network Storage Local Storage VM VMM VM Image • e.g., Amazon EC2/EBS • vSphere local storage

Either Way, No Problem Here Network Storage VM VMM VM Image

Here? Network Storage VM Image Bottleneck!

Lots of Unique VM Images on EC2 alone 54784 unique images* Network Storage *http://thecloudmarket.com/stats#/totals , 06 Dec 2012

Unpredictable Demand • Lots of customers • Spot-pricing • Cloud-bursting Network Storage

Don’t Just Take My Word • “The challenge for IT teams will be finding way to deal with the bandwidth strain during peak demand - for instance when hundreds or thousands of users log on to a virtual desktop at the start of the day - while staying within an acceptable budget” 1 • “scale limits are due to simultaneous loading rather than total number of nodes” 2 • Developer proposals to replace or supplement VM launch architecture for greater scalability3 http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops-7000008229/?s_cid=e539 http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup-129 https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images

Challenge: VM Launch in IaaS • Minimize delay in VM execution • Starting from time launch request arrives • For lots of instances (scale!)

Naive Scaling Approaches • Multicast • Setup, configuration, maintenance, etc.1 • ACK implosion • “multicast traffic saturated the CPU on [Etsy] core switches causing all of Etsy to be unreachable“ 2 • [El-Sayed et al., 2003; Hosseini et al., 2007] • http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication

Naive Scaling Approaches • P2P bulk data download (e.g., Bit-Torrent) • Files are big (waste bandwidth) • Must wait until whole file available (waste time) • Network primary? Must store GB image in RAM!

Both Miss Big Opportunity VM image access • Sparse • Gradual • Most of image doesn’t need to be transferred • Can start w/ just a couple of blocks VM image streaming  

VMTorrent Contributions • Architecture • Make (scalable) streaming possible: Decouple data delivery from presentation • Make scalable streaming effective: Profile-based image streaming techniques • Understanding / Validation • Modeling for VM image streaming • Prototype & evaluation not highly optimized

Talk • Make (scalable) streaming possible: Decouple data delivery from presentation • Make scalable streaming effective: Profile-based image streaming techniques • VMTorrentPrototype & Evaluation (Modeling along the way)

Decoupling Data Delivery from Presentation(Making Streaming Possible)

Generic Virtualization Architecture • Virtual Machine Monitor virtualizes hardware • Conducts I/O to image through file system VM VM Image Host VMM FS Hardware

Cloud Virtualization Architecture Network backend used • Either to download image • Or to access via remote FS Network Backend VM VM Image VMM FS Hardware

VMTorrent Virtualization Architecture • Introduce custom file system • Divide image into pieces • But provide appearance • of complete image to VMM Custom FS Network Backend VM VM Image VMM FS Hardware

Decoupling Delivery from Presentation VMM attempts to read piece 1 Piece 1 is present, read completes Custom FS Network Backend VM 0 1 2 3 4 5 VMM 6 7 8 Hardware

Decoupling Delivery from Presentation VMM attempts to read piece 0 Piece 0 isn’t local, read stalls VMM waits for I/O to complete VM stalls Custom FS Network Backend VM 0 1 2 3 4 5 VMM 6 7 8 Hardware

Decoupling Delivery from Presentation FS requests piece from backend Backend requests from network Custom FS Network Backend VM 0 1 2 3 4 5 VMM 6 7 8 Hardware

Decoupling Delivery from Presentation Later, network delivers piece 0 Custom FS receives, updates piece Read completes VMM resumes VM’s execution Custom FS Network Backend VM 0 0 1 2 3 4 5 VMM 6 7 8 Hardware

Decoupling Improves Performance Primary Storage No waiting for image download to complete Custom FS Network Backend VM 1 2 0 3 4 5 VMM 6 7 8 Hardware

Decoupling Improves Performance Secondary Storage No more writes or re-reads over network w/ remote FS X Custom FS Network Backend VM 1 2 0 3 4 5 X VMM 6 7 8 Hardware

But Doesn’t Scale Assuming a single server, the time to download a single piece is t = W + S / (rnet / n) • W:wait time for first bit • rnet: network speed • S: piece size • n : # of clients Transfer time, each client gets rnet/ n of server BW

Read Time Grows Linearly w/ n Assuming a single server, the time to download a single piece is t = W + n * S / rnet • W:wait time for first bit • rnet: network speed • S: piece size • n : # of clients Transfer time linear w/ n

This Scenario csd Custom FS Network Backend VM 1 2 0 3 4 5 VMM 6 7 8 Hardware

Decoupling Enables P2P Backend Alleviate network storage bottleneck Swarm • Exchange pieces w/ swarm P2P copy must remain pristine Custom FS Network Backend P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 8 Hardware

Space Efficient FS uses pointers to P2P image Swarm FS does copy-on-write Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 6 7 8 Hardware

Minimizing Stall Time Non-local piece accesses Swarm Trigger high priority requests 4! Custom FS P2P Manager VM 1 2 0 1 2 0 4? 3 4 5 3 4 5 4? VMM 6 7 8 6 7 6 7 8 Hardware

P2P Helps Now, the time to download a single piece is t = W(d)+ S / rnet • W(d) :wait time for first bit as function of • d : piece diversity • rnet: network speed • S: piece size • n : # of peers Transfer time independent of n Wait is function of diversity

High Diversity Swarm Efficiency

Low Diversity Little Benefit Nothing to share

P2P Helps, But Not Enough All peers request same pieces at same time t = W(d)+ S / rnet • Low piece diversity • Long wait (gets worse as ngrows) • Long download times

This Scenario p2pd Swarm Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 6 7 8 Hardware

Profile-based Image Streaming Techniques(Making Streaming Effective)

How to Increase Diversity? Need to fetch pieces that are • Rare: not yet demanded by many peers • Useful: likely to be used by some peer

Profiling • Need useful pieces • But only small % of VM image accessed • We need to know which pieces accessed • Also, when (need later for piece selection)

Build Profile • One profile for each VM/workload • Ran one or more times (even online) • Use FS to track • Which pieces accessed • When pieces accessed • Entries w/ average appearance time, piece index, and frequency

Piece Selection • Want pieces not yet demanded by many • Don’t know piece distribution in swarm • Guess others like self • Gives estimate when pieces likely needed

Piece Selection Heuristic • Randomly (rarest first) pick one of first k pieces in predicted playback window • fetch w/ medium priority (demand wins)

Profile-based Prefetching • Increases diversity • Helps even w/ no peers (when ideal access exceeds network rate)

Obtain Full P2P Benefit Profile-based window-randomized prefetch t = W(d)+ S / rnet • High piece diversity • Short wait (shouldn’t grow much w/ n) • Quick piece download

Full VMTorrent Architecture p2pp Swarm Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 VMM 6 7 8 6 7 6 7 8 profile Hardware

Prototype

VMTorrent Prototype BT Swarm Custom C++ & Libtorrent Custom C Using FUSE Custom FS P2P Manager VM 1 2 0 1 2 0 3 4 5 3 4 5 6 7 8 6 7 6 7 8 profile Hardware

VMTorrent : Scalable P2P Virtual Machine Streaming

VMTorrent : Scalable P2P Virtual Machine Streaming

Presentation Transcript

Virtual PBX Phone System

Hosted PBX

Audio Compression

A Universal Turing Machine

Chapter 8 The X-ray Machine

Chapter 9: Virtual Memory

Chapter 9: Virtual Memory

Overview on Scalable Video Coding - II

PARIS: ProActive Routing In Scalable Data Centers

Scalable Information Extraction

Introduction

Introduction LHC Machine Protection Rüdiger Schmidt

Virtual Memory

Scalable Web Architectures

Lifestyles in the Game World: the virtual, not virtual at all

Virtual Machine Monitors

Virtual Workplace

PARIS: ProActive Routing In Scalable Data Centers

Microsoft Azure Virtual Networks

A Universal Turing Machine