virtuoso distributed computing using virtual machines l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Virtuoso: Distributed Computing Using Virtual Machines PowerPoint Presentation
Download Presentation
Virtuoso: Distributed Computing Using Virtual Machines

Loading in 2 Seconds...

play fullscreen
1 / 49

Virtuoso: Distributed Computing Using Virtual Machines - PowerPoint PPT Presentation


  • 278 Views
  • Uploaded on

Virtuoso: Distributed Computing Using Virtual Machines. Peter A. Dinda Prescience Lab Department of Computer Science Northwestern University http://plab.cs.northwestern.edu. People and Acknowledgements. Students

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Virtuoso: Distributed Computing Using Virtual Machines' - Pat_Xavi


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
virtuoso distributed computing using virtual machines

Virtuoso: Distributed Computing Using Virtual Machines

Peter A. Dinda

Prescience Lab

Department of Computer Science

Northwestern University

http://plab.cs.northwestern.edu

people and acknowledgements
People and Acknowledgements
  • Students
    • Ashish Gupta, Ananth Sundararaj, Dong Lu, Bin Lin, Jason Skicewicz, Billy Davidson, Andrew Weinrich, Jack Lange, Alex Shoykhet
  • Collaborators
    • In-Vigo project at University of Florida
      • Renato Figueiredo, Jose Fortes
      • http://invigo.acis.ufl.edu
  • Funder
    • NSF through several awards
outline
Outline
  • Motivation
  • Virtuoso Model
  • Virtual networking and remote devices
  • Information services
  • Resource measurement and prediction
  • Resource control
  • Related work
  • Conclusions

R. Figueiredo, P. Dinda, J. Fortes, A Case For Grid Computing on Virtual Machines, ICDCS 2003

slide5
How do we deliver arbitrary amounts of computational power to ordinary people?

Distributed and Parallel Computing

Interactive Applications

slide6
How do we deliver arbitrary amounts of computational power to ordinary people?

Distributed and Parallel Computing

Interactive Applications

slide7

Internet

IBM xSeries

virtual cluster

(64 CPUs),

1 TB RAID

Interactivity

Environment

Cluster, CAVE

(~90 CPUs),

8 TB RAID

2 Distributed

Optical Testbed

Clusters

IBM xSeries

(14-28 CPUs),

1 TB RAID

DOT clusters

with optical

connectivity

IBM xSeries

(14-28 CPUs),

1 TB RAID:

Argonne, U.Chicago,

IIT, NCSA, others

Nortel Optera

Metro Edge

Optical Router

Distributed Optical Testbed

(DOT) Private Optical Network

Northwestern

grid computing
Grid Computing
  • “Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources”
      • I. Foster, C. Kesselman, S. Tuecke, The Anatomy of the Grid: Enabling Scalable Virtual Organizations, International J. Supercomputer Applications, 15(3), 2001
  • Globus, Condor/G, Avaki, EU DataGrid SW, …
complexity from user s perspective
Complexity from User’s Perspective
  • Process or job model
    • Lots of complex state: connections, special shared libraries, licenses, file descriptors
  • Operating system specificity
    • Perhaps even version-specific
    • Symbolic supercomputer example
  • Need to buy into some “Grid API”
  • Install and learn complex Grid software
complexity from resource owner s perspective
Complexity from Resource Owner’s Perspective
  • Install and learn complex Grid software
  • Deal with local accounts and privileges
    • Associated with global accounts or certificates
  • Protection
  • Support users with different OS, library, license, etc, needs.
virtual machines
Virtual Machines
  • Language-oriented VMs
    • Abstract interpreted machine, JIT Compiler, large library
    • Examples: UCSD p-system, Java VM, .NET VM
  • Application-oriented VMs
    • Redirect library calls to appropriate place
    • Examples: Entropia VM
  • Virtual servers
    • Kernel makes it appear that a group of processes are running on a separate instance of the kernel
    • Examples: Ensim, Virtuozzo, SODA, …
  • Virtual machine monitors (VMMs)
    • Raw machine is the abstraction
    • VM represented by a single image
    • Examples: IBM’s VM, VMWare, Virtual PC/Server, Plex/86, SIMICS, Hypervisor, DesQView/TaskView. VM/386
isn t it going to be too slow
Isn’t It Going to Be Too Slow?

Small relative

virtualization

overhead;

compute-intensive

Experimental setup: physical: dual Pentium III 933MHz, 512MB memory, RedHat 7.1,

30GB disk; virtual: Vmware Workstation 3.0a, 128MB memory, 2GB virtual disk, RedHat 2.0

NFS-based grid virtual file system between UFL (client) and NWU (server)

isn t it going to be too slow15
Isn’t It Going To Be Too Slow?

Synthetic benchmark: exponentially arrivals of compute bound tasks, background load provided by playback of traces from PSC

Relative overheads < 10%

isn t it going to be too slow16
Isn’t It Going To Be Too Slow?
  • Virtualized NICs have very similar bandwidth, slightly higher latencies
    • J. Sugerman, G. Venkitachalam, B-H Lim, “Virtualizing I/O Devices on VMware Workstation’s Hosted Virtual Machine Monitor”, USENIX 2001
  • Disk-intensive workloads (kernel build, web service): 30% slowdown
    • S. King, G. Dunlap, P. Chen, “OS support for Virtual Machines”, USENIX 2003
virtuoso
Virtuoso
  • Approach: Lower level of abstraction
    • Raw machines, not processes
  • Mechanism: Virtual machine monitors
  • Our Focus: Middleware support to hide complexity
    • Ordering, instantiation, migration of machines
    • Virtual networking and remote devices
    • Connectivity to remote files, machines
    • Information services
    • Monitoring and prediction
    • Resource control
the virtuoso model
The Virtuoso Model
  • User orders raw machine(s)
    • Specifies hardware and performance
    • Basic software installation available
      • OS, libraries, licenses, etc.
  • Virtuoso creates raw imageand returns reference
    • Image contains disk, memory, configuration, etc.
  • User “powers up” machine
  • Virtuoso chooses provider
    • Information service
  • Virtuoso migrates image to provider
    • Efficient network transfer
      • rsync, demand paging, versioned filesystems
the virtuoso model19
The Virtuoso Model
  • Provider instantiates machine
    • Virtual networking ties machine back to user’s home network
    • Remote device support makes user’s desktop’s devices available on remote VM
    • Remote display support gives user the console of the machine (VNC)
    • Resource control to give user expected performance
  • User goes to his network admin to get address, routing for his new machine
  • User customizes machine
    • Feeds in CDs, floppies, ftp, up2date, etc.
the virtuoso model20
The Virtuoso Model
  • User uses machine
    • Shutdown, hibernate, power-off, throw away
  • Virtuoso continuously monitors and adapts
    • Various mechanisms, all invisible to user
      • Migrating the machine
      • Routing traffic between machines
      • Virtual network topology
      • Predictive scheduling versus reservations
    • Various goals
      • Price
      • Interactivity
    • Information service
    • Resource monitoring and prediction
outline21
Outline
  • Motivation
  • Virtuoso Model
  • Virtual networking and remote devices
  • Information services
  • Resource measurement and prediction
  • Resource control
  • Related work
  • Conclusions

R. Figueiredo, P. Dinda, J. Fortes, A Case For Grid Computing on Virtual Machines, ICDCS 2003

why virtual networking
Why Virtual Networking?
  • A machine is suddenly plugged into your network. What happens?
    • Does it get an IP address?
    • Is it a routeable address?
    • Does firewall let its traffic through?
    • To any port?

How do we make virtual machine hostileenvironments as friendly as the user’s LAN?

a layer 2 virtual network vlan for the user s virtual machines
A Layer 2 Virtual Network (VLAN) for the User’s Virtual Machines
  • Why Layer 2?
    • Protocol agnostic
    • Mobility
    • Simple to understand
    • Ubiquity of Ethernet on end-systems
  • What about scaling?
    • Number of VMs limited
    • Hierarchical routing possible because MAC addresses can be assigned hierarchically
a simple layer 2 virtual network
A Simple Layer 2 Virtual Network

Client

Server

VM monitor

SSH

Remote VM

Virtual

NIC

Physical

NIC

Physical

NIC

Friendly Local Network

Hostile Remote Network

a simple layer 2 virtual network25
A Simple Layer 2 Virtual Network

Client

Server

VM monitor

SSH

Remote VM

Virtual

NIC

Physical

NIC

Physical

NIC

Friendly Local Network

Hostile Remote Network

a simple layer 2 virtual network26
A Simple Layer 2 Virtual Network

Client

Server

SSH Tunnel

Or SSL TCP

Bridged

Bridged

VM monitor

Remote VM

Virtual

NIC

Physical

NIC

Physical

NIC

Friendly Local Network

Hostile Remote Network

an overlay network
An Overlay Network
  • Bridgeds and connections form an overlay network for routing traffic among virtual machines and the user’s home network
  • Links can trivially be added or removed
bootstrapping the virtual network
Bootstrapping the Virtual Network
  • Star topology always possible
      • TCP session from client must have been possible
  • Better topology may be possible
      • Depends on security at each site
  • Topology may change
      • Virtual machines can migrate
  • Bootstrap to higher layers
      • Virtual filesystems
remote devices
Remote Devices

Client

Server

SSH Tunnel

Or SSL TCP

nbd-server

nbd-client

VM monitor

Remote VM

Virtual

CDROM

Physical

CDROM

Linux Network Block Device Driver

/dev/cdrom <-> /dev/nb0 <-> VMWare CD Image

extending a grid information service gis to support virtual machines
Extending a Grid Information Service (GIS) to Support Virtual Machines
  • A GIS contains information about the available resources in a grid
    • Hosts, routers, switches, software, etc.
  • URGIS project at Northwestern
    • GIS based on the relational data model
    • Compositional queries (joins) to find collections of resources.
      • “Find physical machines which can instantiate a virtual machine with 1 GB of memory”
      • “Find sets of four different virtual machines on the same network with a total memory between 512 MB and 1 GB”
    • Nondeterministic query extension for scalability
  • http://www.cs.northwestern.edu/~urgis
slide32

Motivation for Non-deterministic Queries

  • Queries for compositions of resources easily expressed in SQL:
  • But such queries can be very expensive to execute
  • However, we typically don’t need the entire result set, just some rows, and not always the same ones
  • And we need them in a bounded amount of time
  • Approach: return random sample of result set

select

h1.insertid, h2.insertid

from

hosts h1, hosts h2

where

h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072

“Find 2 hosts with Linux that

together have 3 GB of RAM”

slide33

Implementing non-deterministic queries

select nondeterministically

h1.insertid, h2.insertid

from

hosts h1, hosts h2

where

h1.os=‘LINUX’ and h2.os=‘LINUX’ and h1.mem_mb+h2.mem_mb>=3072

within

2 seconds

SELECT

H1.INSERTID, H2.INSERTID

FROM

HOSTS H1, HOSTS H2 ,

INSERTIDS TEMP_H1 , INSERTIDS TEMP_H2 WHERE

(H1.OS='LINUX' AND H2.OS='LINUX' AND H1.MEM_MB+H2.MEM_MB>=3072) AND(H1.INSERTID=TEMP_H1.INSERTID AND TEMP_H1.rand > 982663452.975047 AND TEMP_H1.rand <= 1025613125.93505) AND (H2.INSERTID=TEMP_H2.INSERTID AND TEMP_H2.rand > 1877769069.94039 AND TEMP_H2.rand <= 1920718742.90039)

Query Manager

and Rewriter

Random sample ofinput tablesProbability of inclusiondetermined by time constraintand server load

slide35

P. Dinda, D. Lu, Nondeterministic Queries in a Relational Grid Information Service, SC 2003

D. Lu, P. Dinda, Synthesizing Realistic Computational Grids, SC 2003

D. Lu, P. Dinda, J. Skicewicz, Scoped and Approximate Queries in a Relational Grid Information Service, Grid 2003

extending a grid information service gis to support virtual machines36
Extending a Grid Information Service (GIS) to Support Virtual Machines
  • Virtual indirection
    • Each RGIS object has a unique id
    • Virtualization table associates unique id of virtual resources with unique ids of their constituent physical resources
    • Virtual nature of resource is hidden unless query explicitly requests it
  • Futures
    • An RGIS object that does not exist yet
    • Futures table of unique ids
    • Future nature of resource hidden unless query explicitly requests it
extending a resource monitoring and prediction system to support virtual machines
Extending a Resource Monitoring and Prediction System to Support Virtual Machines
  • Measuring and predicting dynamic resource availability to support adaptation
    • Virtual machine migration
    • Routing on the virtual network
    • Application-level adaptation
  • RPS System at Northwestern
    • Host and network measurements for Unix and Windows
    • Emphasis on prediction (wide range of linear and nonlinear models) and communication (wide range of transports)

P. Dinda, Online Prediction of the Running Time of Tasks, Journal of Cluster Computing, 2002

J. Skicewicz, P. Dinda, J. Schopf, Multiresolution Resource Behavior Queries using Wavelets, HPDC 2001

P. Dinda, A Prediction-based Real-time Scheduling Advisor, IPDPS 2002

rps toolkit
RPS Toolkit
  • Extensible toolkit for implementing resource signal prediction systems [CMU-CS-99-138]
      • Growing: RTA, RTSA, Wavelets, GUI, etc
  • Easy “buy-in” for users
      • C++ and sockets (no threads)
      • Prebuilt prediction components
      • Libraries (sensors, time series, communication)
  • http://www.cs.northwestern.edu/~RPS
example multiscale network prediction
Example: Multiscale Network Prediction
  • Large, recent study of predictability
  • Hundreds of NLANR and other traces
    • Mostly WANs
  • Different resolutions
    • Binning and low-pass via wavelets
  • Sweet Spot
    • Predictability often maximized at particular resolution

Y. Qiao, J. Skicewicz, P. Dinda, Multiscale Predictability of Network Traffic, NWU-CS-02-13

extending a resource prediction system to support virtual machines
Extending a Resource Prediction System to Support Virtual Machines
  • Goal: monitor physical machine and infer behavior inside of virtual machine
  • Current approach: /proc on physical machine to slowdown on resource rate in virtual machine
    • ARX models
    • Causality problem
resource control
Resource Control
  • Owner has an interest in controlling how much and when compute time is given to a virtual machine
  • Our approach: A language for expressing these constraints, and compilation to real-time schedules, proportional share, etc.
  • Very early stages. Trying to avoid kernel modifications.
how to control user irritation project
How to Control: User Irritation Project
  • Measure interactive user tolerance to resource stealing
  • Conversely, what service must be provided to interactive users?
  • “Irritation@Home”
outline44
Outline
  • Motivation
  • Virtuoso Model
  • Virtual networking and remote devices
  • Information services
  • Resource measurement and prediction
  • Resource control
  • Related work
  • Conclusions

R. Figueiredo, P. Dinda, J. Fortes, A Case For Grid Computing on Virtual Machines, ICDCS 2003

related work
Related Work
  • Collective / Capsule Computing (Stanford)
    • VMM, Migration/caching, Hierarchical image files
  • Denali (U. Washington)
    • Highly scalable VMMs (1000s of VMMs per node)
  • CoVirt (U. Michigan)
  • Xenoserver (Cambridge)
  • SODA (Purdue)
    • Virtual Server, fast deployment of services
  • Internet Suspend/Resume (Intel Labs Pittsburgh)
  • Ensim
    • Virtual Server, widely used for web site hosting
    • WFQ-based resource control released into open-source Linux kernel
  • Virtouzzo (SWSoft)
    • Ensim competitor
  • Available VMMs: IBM’s VM, VMWare, Virtual PC/Server, Plex/86, SIMICS, Hypervisor, DesQView/TaskView. VM/386
current status at northwestern
Current Status (At Northwestern)
  • Bridged components done
    • Mechanism for virtual networking
    • No policy yet
  • Very preliminary system for acquiring and instantiating VMs done
  • RGIS schema extensions done
  • Work In Progress
    • Remote devices (management)
    • Virtual networking (policy + adaptation)
    • VM Monitoring using RPS
    • User Irritation
for more information
For MoreInformation
  • Prescience Lab (Northwestern University)
    • http://plab.cs.northwestern.edu
  • ACIS (University of Florida)
    • http://acis.ufl.edu

R. Figueiredo, P. Dinda, J. Fortes, A Case For Grid Computing on Virtual Machines, ICDCS 2003

nondeterministic query performance
Nondeterministic query performance

Meaningful tradeoff between query processing time and result set size is possible

Select two hosts that together have >3GB of RAM

500,000 host grid generated by GridG

Memory distribution according to Smith study of MDS contents

Dual Xeon 1 GHz, 2 GB, 240 GB RAID, RGIS2, Oracle 9i Enterprise

Average of five trials

nondeterministic query performance49
Nondeterministic query performance

Can use tradeoff to controlquery time independent of query complexity

Select n hosts that together have >3GB of RAM

500,000 host grid generated by GridG

Memory distribution according to Smith study of MDS contents

Dual Xeon 1 GHz, 2 GB, 240 GB RAID, RGIS2, Oracle 9i Enterprise

Average of five trials