1 / 29

Virtualization

Virtualization. Part 1 – Concepts & XEN. Concepts. James Smith, Ravi Nair, “The Architectures of Virtual Machines,” IEEE Computer, May 2005, pp. 32-38. Mendel Rosenblum, Tal Garfinkel, “Virtual Machine Monitors: Current Technology and Future Trends,” IEEE Computer, May 2005, pp. 39-47.

Download Presentation

Virtualization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtualization Part 1 – Concepts & XEN

  2. Concepts • James Smith, Ravi Nair, “The Architectures of Virtual Machines,” IEEE Computer, May 2005, pp. 32-38. • Mendel Rosenblum, Tal Garfinkel, “Virtual Machine Monitors: Current Technology and Future Trends,” IEEE Computer, May 2005, pp. 39-47. • L.H. Seawright, R.A. MacKinnon, “VM/370 – a study of multiplicity and usefulness,” IBM Systems Journal, vol. 18, no. 1, 1979, pp. 4-17. • S.T. King, G.W. Dunlap, P.M. Chen, “Operating System Support for Virtual Machines,” Proceedings of the 2003 USENIX Technical Conference, June 9-14, 2003, San Antonio TX, pp. 71-84. • A. Whitaker, R.S. Cox, M. Shaw, S.D. Gribble, “Rethinking the Design of Virtual Machine Monitors,” IEEE Computer, May 2005, pp. 57-62. • G.J. Popek, and R.P. Goldberg, “Formal requirements for virtualizable third generation architectures,” CACM, vol. 17 no. 7, 1974, pp. 412-421. References and Sources CS 5204 – Fall, 2008

  3. Definitions • Virtualization • A layer mapping its visible interface and resources onto the interface and resources of the underlying layer or system on which it is implemented • Purposes • Abstraction – to simplify the use of the underlying resource (e.g., by removing details of the resource’s structure) • Replication – to create multiple instances of the resource (e.g., to simplify management or allocation) • Isolation – to separate the uses which clients make of the underlying resources (e.g., to improve security) • Virtual Machine Monitor (VMM) • A virtualization system that partitions a single physical “machine” into multiple virtual machines. • Terminology • Host – the machine and/or software on which the VMM is implemented • Guest – the OS which executes under the control of the VMM CS 5204 – Fall, 2008

  4. Origins - Principles • Efficiency • Innocuous instructions should execute directly on the hardware • Resource control • Executed programs may not affect the system resources • Equivalence • The behavior of a program executing under the VMM should be the same as if the program were executed directly on the hardware (except possibly for timing and resource availability) “an efficient, isolated duplicate of the real machine” Communications of the ACM, vol 17, no 7, 1974, pp.412-421 CS 5204 – Fall, 2008

  5. Origins - Principles Instruction types • Privileged an instruction traps in unprivileged (user) mode but not in privileged (supervisor) mode. • Sensitive • Control sensitive – attempts to change the memory allocation or privilege mode • Behavior sensitive • Location sensitive – execution behavior depends on location in memory • Mode sensitive – execution behavior depends on the privilege mode • Innocuous – an instruction that is not sensitive Theorem For any conventional third generation computer, a virtual machine monitor may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions. Signficance The IA-32/x86 architecture is not virtualizable. CS 5204 – Fall, 2008

  6. Origins - Technology • Concurrent execution of multiple production operating systems • Testing and development of experimental systems • Adoption of new systems with continued use of legacy systems • Ability to accommodate applications requiring special-purpose OS • Introduced notions of “handshake” and “virtual-equals-real mode” to allow sharing of resource control information with CP • Leveraged ability to co-design hardware, VMM, and guestOS IBM Systems Journal, vol. 18, no. 1, 1979, pp. 4-17. CS 5204 – Fall, 2008

  7. Application Guest OS Virtual Machine Application Application Guest OS Guest OS VMM Virtual Machine Virtual Machine RealMachine VMMs Rediscovered • Server/workload consolidation (reduces “server sprawl”) • Compatible with evolving multi-core architectures • Simplifies software distributions for complex environments • “Whole system” (workload) migration • Improved data-center management and efficiency • Additional services (workload isolation) added “underneath” the OS • security (intrusion detection, sandboxing,…) • fault-tolerance (checkpointing, roll-back/recovery) CS 5204 – Fall, 2008

  8. Applications System Calls API Libraries ABI Operating System ISA System ISA User ISA Hardware Architecture & Interfaces • Architecture: formal specification of a system’s interface and the logical behavior of its visible resources. • API – application binary interface • ABI – application binary interface • ISA – instruction set architecture CS 5204 – Fall, 2008

  9. VMM Types • System • Provides ABI interface • Efficient execution • Can add OS-independent services (e.g., migration, intrustion detection) • Process • Provdes API interface • Easier installation • Leverage OS services (e.g., device drivers) • Execution overhead (possibly mitigated by just-in-time compilation) CS 5204 – Fall, 2008

  10. System-level Design Approaches • Full virtualization (direct execution) • Exact hardware exposed to OS • Efficient execution • OS runs unchanged • Requires a “virtualizable” architecture • Example: VMWare • Paravirtualization • OS modified to execute under VMM • Requires porting OS code • Execution overhead • Necessary for some (popular) architectures (e.g., x86) • Examples: Xen, Denali CS 5204 – Fall, 2008

  11. Design Space (level vs. ISA) • Variety of techniques and approaches available • Critical technology space highlighted API interface ABI interface CS 5204 – Fall, 2008

  12. System VMMs • Structure • Type 1: runs directly on host hardware • Type 2: runs on HostOS • Primary goals • Type 1: High performance • Type 2: Ease of construction/installation/acceptability • Examples • Type 1: VMWare ESX Server, Xen, OS/370 • Type 2: User-mode Linux Type 1 Type 2 CS 5204 – Fall, 2008

  13. Hosted VMMs • Structure • Hybrid between Type1 and Type2 • Core VMM executes directly on hardware • I/O services provided by code running on HostOS • Goals • Improve performance overall • leverages I/O device support on the HostOS • Disadvantages • Incurs overhead on I/O operations • Lacks performance isolation and performance guarantees • Example: VMWare (Workstation) CS 5204 – Fall, 2008

  14. Whole-system VMMs • Challenge: GuestOS ISA differs from HostOS ISA • Requires full emulation of GuestOS and its applications • Example: VirtualPC CS 5204 – Fall, 2008

  15. GuestOS privileged instruction trap resource emulate change vmm resource change Strategies • De-privileging • VMM emulates the effect on system/hardware resources of privileged instructions whose execution traps into the VMM • aka trap-and-emulate • Typically achieved by running GuestOS at a lower hardware priority level than the VMM • Problematic on some architectures where privileged instructions do not trap when executed at deprivileged priority • Primary/shadow structures • VMM maintains “shadow” copies of critical structures whose “primary” versions are manipulated by the GuestOS • e.g., page tables • Primary copies needed to insure correct environment visible to GuestOS • Memory traces • Controlling access to memory so that the shadow and primary structure remain coherent • Common strategy: write-protect primary copies so that update operations cause page faults which can be caught, interpreted, and emulated. CS 5204 – Fall, 2008

  16. Virtualizing the IA-32 (x86) architecture • Architecture has protection rings 0..3 with OS normally in ring 0 and applications in ring 3… • …and VMM must run in ring 0 to maintain its integrity and control • …but GuestOS not running in ring 0 is problematic: • Some privileged instructions execute only in ring 0 but do not fault when executed outside ring 0 (remember privileged vs. sensitive?) • instructions for low latency system calls (SYSENTER/SYSEXIT) always transition to ring 0 forcing the VMM into unwanted emulation or overhead • For the Itanium architecture, interrupt registers only accessible in ring 0; forcing VMM to intercept each device driver access to these registers has severe performance consequences • Masking interrupts can only be done in ring 0 • Ring compression: paging does not distinguish privilege levels 0-2, GuestOS must run in ring 3 but is then not protected from its applications also running in ring 3 • Cannot be used for 64-bit guests on IA-32 • The fact that it is not running in ring 0 can be detected (is this important?) CS 5204 – Fall, 2008

  17. VMM machine OS physical process virtual entity address space page tables “shadow” page tables Memory Management • Isolation/protection of Guest OS address spaces • Efficient MM address translation GuestOS VMM CS 5204 – Fall, 2008

  18. Computer Laboratory XEN: paravirtualization • Paul Barham, et.al., “Xen and the Art of Virtualization,” Symposium on Operating Systems Principles 2003 (SOSP’03), October 19-22, 2003, Bolton Landing, New York. • Presentation by Ian Pratt available at http://www.cl.cam.ac.uk/netos/papers/2005-xen-may.ppt References and Sources CS 5204 – Fall, 2008

  19. Xen - Structure • Employs paravirtualization strategy • Deals with machine architectures that cannot be virtualized • Requires modifications to guest OS • Allows optimizations • “Domain 0” • has special access to control interface for platform management • Has back-end device drivers • Xen VMM • entirely event driven • no internal threads Xen 3.0 Architecture CS 5204 – Fall, 2008

  20. guest reads Virtual → physical Guest OS guest writes Accessed & Updates dirty bits Virtual → Machine VMM Hardware MMU MMU Virtualizion : Shadow-Mode CS 5204 – Fall, 2008

  21. guest reads Virtual → Machine guest writes Guest OS Xen VMM Hardware MMU MMU Virtualization : Direct-Mode CS 5204 – Fall, 2008

  22. guest reads Virtual → Machine first guest write Guest OS page fault Xen VMM Hardware MMU Writeable Page Tables : 1 – write fault CS 5204 – Fall, 2008

  23. guest reads X Virtual → Machine guest writes Guest OS Xen VMM Hardware MMU Writeable Page Tables : 2 - Unhook CS 5204 – Fall, 2008

  24. guest reads X Virtual → Machine guest writes Guest OS page fault Xen VMM Hardware MMU Writeable Page Tables : 3 - First Use CS 5204 – Fall, 2008

  25. guest reads Virtual → Machine guest writes Guest OS validate Xen VMM Hardware MMU Writeable Page Tables : 4 – Re-hook CS 5204 – Fall, 2008

  26. I/O • Safe hardware interfaces • I/O Spaces • Restricts access to I/O registers • Isolated Device Drive • Driver isolated from VMM in its own “domain” (i.e., VM) • Communication between domains via device channels • Unified interfaces • Common interface for group of similar devices • Exposes raw device interface (e.g., for specialized devices like sound/video) • Separate request/response from event notification • I/O descriptor rings • Used to communicate I/O requests and responses • For bulk data transfer devices (DMA, network), buffer space allocated out of band by GuestOS • Descriptor contains unique identifier to allow out of order processing • Multiple requests can be added before hypercall made to begin processing • Event notification can be masked by GuestOS for its convenience CS 5204 – Fall, 2008

  27. Device Channels • Connects “front end” device drivers in GuestOS with “native” device driver • Is an I/O descriptor ring • Buffer page(s) allocated by GuestOS and “granted” to Xen • Buffer page(s) is/are pinned to prevent page-out during I/O operation • Pinning allows zero-copy data transfer CS 5204 – Fall, 2008

  28. 1.1 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 L X V U L X V U L X V U L X V U SPEC INT2000 (score) Linux build time (s) OSDB-OLTP (tup/s) SPEC WEB99 (score) Benchmark suite running on Linux (L), Xen (X), VMware Workstation (V), and UML (U) System Performance • Benchmark suites • Spec INT200: compute intensive workload • Linux build time: extensive file I/O, scheduling, memory management • OSBD-OLTP: transaction processing workload, extensive synchronous disk I/O • Spec WEB99: web-like workload (file and network traffic) • Fair comparison? CS 5204 – Fall, 2008

  29. I/O Peformance • Systems • L: Linux • IO-S: Xen using IO-Space access • IDD: Xen using isolated device driver • Benchmarks • Linux build time: file I/O, scheduling, memory management • PM: file system benchmark • OSDB-OLTP: transaction processing workload, extensive synchronous disk I/O • httperf: static document retrievel • SpecWeb99: web-like workload (file and network traffic) CS 5204 – Fall, 2008

More Related