1 / 1

The VM deployment process has 3 major steps:

The promise of VMs. Current limitations to using Grids:. Quality of Life in the Grids: VMs meet Bioinformatics Applications Daniel Galron [1] Tim Freeman [2] Kate Keahey [3] Stephanie Gato [4] Natalia Maltsev [5] Alex Rodriguez [6] Mike Wilde [7]

arich
Download Presentation

The VM deployment process has 3 major steps:

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The promise of VMs Current limitations to using Grids: Quality of Life in the Grids: VMs meet Bioinformatics Applications Daniel Galron[1] Tim Freeman[2] Kate Keahey[3] Stephanie Gato[4] Natalia Maltsev[5] Alex Rodriguez[6] Mike Wilde[7] [1] The Ohio State University. galron@cis.ohio-state.edu [2] Argonne National Laboratory. tfreeman@mcs.anl.gov [3]Argonne National Laboratory. keahey@mcs.anl.gov [4]Indiana University. sgato@cs.indiana.edu [5] Argonne National Laboratory. maltsev@mcs.anl.gov [6]Argonne National Laboratory. arodri7@mcs.anl.gov [7]Argonne National Laboratory. wilde@mcs.anl.gov A Glossary of Terms: VMM(Virtual Machine Monitor) – a 3rd-party tool providing the interface between a Virtual Machine and the host machine. Some examples of VMMs are VMWare and Xen. • Using VMs has many benefits for scientists running complex applications: • Broader resource base: a virtual machine can be pre-configured with a required OS, library signature and application installation and then deployed on many different nodes independently of that node’s configuration • Simplified deployment/distribution: VMs can be used as distribution packages; to duplicate an installation, just copy a VM image • Easy Migration capability: an executing VM image can be “frozen,” transferred to (another) resource and restarted within milliseconds • Fine grained resource management: one can confine resource usage within most VM implementations • Enhanced security: VMs provide outstanding isolation protecting the resource from the user and isolating users from each other • Complex applications require customized software configurations; such environments may not be widely available on Grid nodes • Installing scientific applications by hand can be arduous, lengthy and error-prone; the ability to amortize this process over many installations would help • Providing good isolation of Grid computations is a key security requirement; the currently used mechanism of Unix accounts is not sufficient • Providing a vehicle for fine-grained resource usage enforcement is critical for more efficient use of Grid resources, yet such technology is not widely available • The ability to migrate or restart applications would be of enormous value in a Grid environment; yet the current Grid frameworks do not support it VMManager – Grid service interface to allow a remote client to interact with the VMM VMRepository – Grid service which catalogues VM images of a VO and which stores them for retrieval and deployment Authorization Service – Grid service which the VMManager and VMRepository services call to check if a user is authorized to perform the requested operation Virtual Machines meet the Grids Performance Implications In a nutshell The performance of applications running on a VM depends on the third-party VMMs and the applications themselves. A purely CPU-bound program will have almost no performance degradation as all instructions will be executed directly on hardware. Typically, virtual machines intercept privileged instructions (such as I/O) resulting in a performance hit for those instructions although new methods, such as those implemented by Xen, improve this factor. In our implementation, we experimented with VMWare Workstation and Xen and in our experience slowdown was never more than 30% and is often less than 5%. (The Xen slowdown was much less than 30%) Instead of running Grid software within VMs, we integrated VM deployment into the Grid infrastructure: mapping a client credential to a Unix account was replaced by deploying a VM and starting the client’s environment within it. We implemented the architecture using Globus Toolkit3.2, an open-source grid middleware toolkit which provides a framework for resource, data, and security management. 3 3 3 2 Migration Describing VM Properties 2 • Integrating Virtual Machines with Grid technology allows easy migration of applications from one node to another. The steps are as follows: • Using Grid software, the client freezes execution of the VM • The client then sends the “migrate” command to the VMManager, specifying the new host node as a parameter • After checking for the proper authorization, the VM is registered with the new host and a GridFTP call transfers the image • In terms of performance this is on a par with deployment – it is mainly bound by the length of transfer. In our tests, we migrated a 2GB VM image from two identical nodes through a Fast Ethernet connection. 1 1 A VM constitutes a virtual workspace configured to meet the requirements of Grid computations. We use an XML Schema to describes various aspects of such workspace including virtual hardware (RAM size, disk size, Virtual CD-ROM drives, serial ports, parallel ports), installed software including the operating system (e.g. kernel version, distribution type) as well as library signature, as well as other properties such as image name and VM owner. Based on those descriptions VMs can be selected, duplicated, or further configured. Legend - VMManager - VMRepository The graph to the right shows the proportion of time taken by the constituents of the deployment process, measured in seconds. Note that the graph does not include time for authorization, but those times are comparable to registration time. Also, the actual migration time depends on the network latency and bandwidth. The pause and resume times are dependent on 3rd party VMM. VM Deployment The VM deployment process has 3 major steps: The client queries the VM repository, sending a list of criteria describing a workspace. The repository returns a list of VM descriptors that match them. The client contacts the VMManager, sending it the descriptor of the VM they want to deploy, along with an identifier, and a lifetime for the VM. The VMManager authorizes the request using an access control list. The VM instance is registered with the VMManager and the VM is copied from the VMRepository. The VMManager then interfaces with the VMM on the resource to power on the VM. The low level features of our architecture are detailed in the diagram to the right. The diagram describes for nodes, each running a (potentially different) host OS. Each node is running a VMM and a VMManager Grid Service. On top of that layer, run the actual VMs, which are installed with Grid software, allowing them to be run as Grid nodes. The VMs could also be used as independent execution environments, without Grid middleware installed on them. (Instead they would run applications directly). The graph to the right shows the proportion of time taken by the constituents of the deployment process, measured in seconds. The authorization time is not included, but it is comparable to registration time. The dominant factor in overall deployment time depends on network latency and bandwidth. After a scientist has deployed a VM onto the resource, he may run an application in it. For this purpose, each of our VMs was configured with the Globus Toolkit. This picture represents a scientist running the TOPO program, creating an image of a transmembrane protein. How does using VMs help the Bioinformatics community? Do VMs fulfill their promise? Issues or Problems Encountered Broader base of resources: Our tests show that this first promise is met. Consider the following situation: a scientist can use a testbed on DOE Science Grid across several clusters. A scientist has access to 20 Solaris nodes in LBNL, 20 nodes in ANL’s Jazz Cluster (Linux nodes), and 20 Linux nodes on NERSC’s pdsf cluster. If only the Jazz nodes have the necessary configuration to run EMBOSS, it would take a lot more work to get EMBOSS to run on the LBNL and pdsf clusters. If we install EMBOSS on a VM, and then run an instance of the VM on each node we can use all 60 nodes instead of just 20. Easier deployment/distribution: Using VMs makes deployment easier and faster. In our tests we experimented with a 2 GB minimal VM image with the following results: EMBOSS installation: 45 minutes VM deployment on our testbed: 6 minutes 23 seconds Peace of mind (not having to debug installation): priceless! Fine Grained resource management: Depending on the implementation of, a VM can provide fine-grained resource usage enforcement critical in many scenarios in the Grid Enhanced security:VMs offer enhanced isolation and are therefore a more secure solution for representing user environments. When developing the architecture we encountered several important but interesting issues and problems we had to resolve. Clocks: While a VM image is frozen or powered-off, the VM’s clock does not update. We need a way to update a VM’s clock as soon as it is powered-on or unpaused. IP Addresses: We need a way to assign unique IP addresses to each new VM instance (i.e. each time a VM is deployed) so that multiple copies of the same VM can be deployed on the same subnet. Starting a Grid container: We also need a way to automatically start up a Grid container on startup of a VM if we want it to be a full-fledged Grid node, or at least launch a User Hosting Environment. We solved these issues by installing a daemon on the VM: upon deployment, it sets the IP address of the VM, launches a UHE and, if needed, updates the clock.

More Related