COS497 - Cloud Computing 5. Virtualization “Virtualization is an abstraction layer that decouples the physical hardware from the operating system to deliver greater IT resource utilization and flexibility.” – www.vmware.com.
COS497 - Cloud Computing 5. Virtualization “Virtualization is an abstraction layer that decouples the physical hardware from the operating system to deliver greater IT resource utilization and flexibility.” – www.vmware.com
Within computing, the concept of virtualization is not new. For years, there has been virtual memory, virtual hard disks, virtual networks, etc. - With virtual memory, for example, computer software gains access to more memory than is physically installed, via the background swapping of data from memory to disk storage, and back again, by the OS. - Similarly, virtualization techniques can be applied to other IT infrastructure layers - including networks, storage, laptop or server hardware, operating systems and applications. Basically, for the cloud, virtualization is using one physical machine to support multiple virtual machines that run in parallel and independently.
What is it? Virtualization is a framework or methodology of dividing the resources of a computer into multiple execution environments • Done by applying one or more concepts or techniques such as Hardware and software partitioning Time-sharing use of resources Partial or complete machine simulation Emulation Quality of service And many others
Not a new concept VM projects in the 1960s at IBM IBM and MIT headed research through the years and eventually developed the idea of a Virtual Machine Monitor (VMM) – an operating system-like software package that creates and supports virtual machines.
Problem Assessment – For customer organizations Organizations (businesses, universities, etc.) have to invest in hardware and software resources, as well as IT support (i.e. personnel) – this is costly, $$$$$ There are other problems, as well … Too many servers for too little work – organizations over-invest to cope with peak demands. Aging hardware reaching end of usable life – replacement costs. High infrastructure requirements Limited flexibility in shared environments
Low utilization metrics of servers across the organization …
High costs and infrastructure needs. Maintenance Leases Networking Floor space Cooling Power Disaster Recovery
Heterogeneous Environments - Different Operating Systems Different applications may require different operating systems
Apple Mac Intel AMD
Problem Assessment – For Internet service providers The big Internet service providers such as Google, Amazon, Yahoo, etc. have invested enormous sums of money on 100,000s of servers, and storage and the infrastructure that they require in order to deliver their services without delay. The daily up-keep of these resources in terms of money is also staggeringly high. However, all the resources are veryunder-utilized. So, why not sell these under-utilized resources to organizations? Moreover, a single server may support a number of “time-shared” applications.
Cloud providers adopted an approach known as multi-tenancy for supporting their SaaS offerings Multi-tenancyrefers to a principle in software where a singleinstance of the software runs on a server, but serves multipleclients (the tenants). E.g. one instance of a CRM package is used by many clients. This is more efficient for the Cloud providers.
A key benefit of virtualization is the ability to run multiple, differentoperating systems on a single physical system and share the underlying hardware resources. Virtualization is a criticalaspect of cloud computing, and utility computing in general. - Pooling resources (e.g. sharing processors) for higher utilization is a requirement of utility computing. In a cloud, this allows higher elasticity and system security
Benefits of Virtualization Someof the benefits that are typically provided by a virtualized system …
Background - Virtualization You cannot get physical access to the cloud server machines - The hardware can, and does, frequently change! To avoid dependency on specific hardware, you write your cloud (server) program to run on virtual machines (VMs).
Virtualization is based on the software concept of a Virtual Machine Monitor (VMM) Virtualization inserts a software layer (i.e. the VMM) at different points in the computer architecture. - Said in a different way, there are different types of virtualization. VMM designed for virtual machines • API is hardware-like to ease guest OS ports Acts like a micro-kernel OS, performing such low-level tasks as virtual memory mapping, scheduling, I/O management, etc.
What is a Virtual Machine Monitor? A virtual machine monitor (VMM),also called a hypervisor, is a program that allows multiple, possibly different, guestoperating systems to share a single hardware host– another form of multi-tenancy. Each guest operating system appears to have the host's processor, memory, and other resources all to itself. However, the VMMis actually controlling the host processor and resources, allocating what is needed to each guest operating system,in turn, and making sure that the guest operating systems (executing invirtual machines) cannot disrupt each other.
Virtual Machine (VM) “A VM is an efficient, isolated duplicate of a real machine” Duplicate:A VM should behave identically to the real machine. Programs cannot distinguish between execution on real or virtual hardware, except for - Fewer resources are available (and, potentially, are different between executions!) - Some timing differences (when dealing with devices) Isolated: Several VMs execute without interfering with each other. Efficient:A VM should execute at speeds close to that of real hardware. Requires that most instructions are executed directly by real hardware. Hypervisoraka virtual-machine monitor- Software implementing the VM.
A virtual machine is just a software package that simulates a real machine. A real machine may support (i.e. execute) many virtual machines – time-shared execution. The real machine shares its execution time between the virtual machines. Each virtual machine is loaded with a guest operating system, and (guest) applications that execute on top ofthe guest operating system – a virtual machine container.
Virtual Container Virtual Container App. B App. A Operating System Operating System App. A App. B Virtualization Layer Operating System Hardware Hardware Conventional system A single OS controls all hardware platform resources Virtualizedsystem It makes it possible to run multiplevirtual “containers” on a single physical platform
Virtual machine must model a real machine exactly and efficiently - Why? Want minimal slowdown in performance. - A VM needs to be run on the physical machine it virtualizes. (Obviously!) We will only concern ourselves with virtualizing at the ISA level Note: ISA = instruction-set architecture (The hardware-software interface – assembly language-level)
Aside:Instruction set architecture The instruction set architecture (ISA) is the part of the computer architecture related to programming, including the native data types, instructions, registers, addressing modes, memory architecture, interrupt and exception handling, and external I/O. An ISA includes a specification of the set of op-codes (machine language), and the native commands implemented by a particular processor. The ISA instructions can execute in one of two modes: - Unprivileged – most instructions - Privileged – a few, special instructions, executed by the operating system
When an application (executing in unprivileged mode) tries to execute a privileged instruction (e.g. perform some I/O operation – a system call), a trap is made into the operating system, and the operating system (executing in privileged mode) performs the operation on behalf of the application. The switch between application and operating, and vice versa, is known as a context switch. App OS
The ISA Interface • Interface between hardware and software • Important for the OS developer
Why Virtual Machines? • Historically used for easier sharing of expensive mainframes: – Run several (even different) OSs on same machine, called guest operating systems – Each using a subset of the physical resources. – Ran as a single-user, single-tasked OS in time-sharing mode. • Went out of fashion in 1980s – Time-sharing OSs common-place – Hardware too cheap to worry ... • Now back in fashion with the Cloud
Hypervisor (aka VMM) • Program that runs on real hardware to implement avirtual machine. • Controls resources – Partitions hardware – Schedules guest operating systems – Mediates access to shared resources E.g. console • Implications – Hypervisor executes in privileged mode. – Guest OS software executes in unprivilegedmode. – Privileged instructions in guest OS cause a trap into the hypervisor. – Hypervisor interprets/emulates them.
Main Concern of the VMM Biggest problem faced by the VMM is to present the hardware to the VM in a “safe, transparent and efficient way”. Safe? Whatever a VM does, it should not be able to affect other VMs or the VMM Maintain illusion by tricking the software into thinking it has the hardware to itself, and by hiding the true state of the hardware.
VMM Main Components Dispatcher – after trap, decides what to do. Allocator – provides VMs resources. Interpreter – simulate instruction which trapped.
Virtualization comes in three (traditional) variants: - High-level language virtual machines - Operating system-level virtualization - Hardware-level virtualization
1. High Level-Language Virtual Machines The virtualization layer sits as an application program on top of the native operating system. Can run any programs written for the virtual machine abstraction regardless of the operating system hosting that virtual machine. An example of this is the Java Virtual Machine which executes Java Byte Code produced by Java compilers. Java Byte Code JVM Applications OS Real Machine
Application Guest OS 2. Operating System-Level Virtualization Virtual Machine VMM Applications Virtualization layer sits between the host operating systemandthe application programs that run on the guest operating system. Virtual Machine runs applicationswritten for aguest operating system, but in a controlled environment. VM uses the host OS API. HostOS Real Machine
VMM runs as a guest application with a guest OS on top of host OS. Relies on host OS for memory management, processor scheduling, resource allocation, hardware drivers.
3. Hardware-Level Virtualization Virtualization layer sits right on top of the real hardware. The VMM is a thin software layer that exports a virtual machine abstraction to the guest operating systems. Since the VMM presents a version of the real machine, all software written for that hardware will run on that virtual machine. Original design from IBM in the 1960s.
Hardware Virtual Machine Monitors Virtual machines are exported by thin layer of software, the VMM. The hardware-level VMM runs directly on the hardware, and can “export” multiple VMs that look exactly like or similar to the real hardware the hardware.
The Traditional Approach A traditional VMM (Virtual Machine Monitor) presents its (virtual) “hardware” as being functionally identical to the physical hardware. - But this approach can be difficult to implement (Especially with x86 systems!) - There are also situations where it is useful to provide real and virtual resources (for example, virtual and real timers) Under this model, the VM would not have access to this information.
Traditional Virtual Machine • Original meaning of the term virtual machine – All guest and host software use the same ISA – VMM runs on bare hardware – privileged mode – VMMintercepts and implements all the privileged operations for the guest OS.
Attributes of All Virtual Machines Software Compatibility • VM provides compatible abstraction so all software written for the machine that VM is virtualizing will run on it. • Java example: “Write once, run anywhere”. Isolation • All software running on the virtual machine is contained within it, and cannot affect other VMs or processes.
Attributes of All Virtual Machines Encapsulation • Virtual machines provide a level of “indirection”. Any software running within them can be controlled and manipulated. • Can act like putting a filter on a print service to monitor content or perform additional book keeping. The Java VM, for example, can perform run-time error checking and garbage collection that C++ compiled code (say) cannot do running directly on the hardware. Performance • Any new software layer adds overhead to system
Modern Virtualization Types – Modern Terminology Two “architectures” for implementing virtualization in the cloud are the hosted and native types. The hostedtype provides a virtualization layer on top ofthe standard operating system, and supports the broadest range of hardware configurations. The guest operating systems and their applications run on top ofthe virtualization layer. In contrast, the nativetype is the first layer of software installed on the hardware (Hence, it is often referred to as a “bare metal” approach). - Since it has direct access to the hardware resources, the native type is more efficient than hosted types, enabling greater scalability, robustness and performance.
Native Virtualization Hosted Virtualization
• When hosted VMM hypervisor tries to execute a privileged instruction, it is trapped and passed to the host OS for execution. •Hosted VMM is less efficient than native VMM - Twice number of mode switches - Twice number of context switches •Hosted VMM can run, besides native applications, - a “Sandbox” for untrusted applications - and is convenient for running alternative OS on desktop Native versus Hosted VMM Native Hosted
Virtualization and the Challenges Speed and Performance Security Resource Isolation Functionality
For the cloud, the two major virtualization technology providers are VMWareandXen. But there are others such as Oracle’s VirtualBoxand KVM.
Xen was developed as a research project at the University of Cambridge Computer Laboratory in association with Microsoft and Intel Research in Cambridge, UK– an open source project in 2003. The Xen community develops and maintains Xen as free software. VMWare was founded in 1998 by graduates from UC, Berkeley. It is an American software company that provides cloud and virtualization software and services
Breaking it Down Virtualization (today) is implemented into two main ways: Full Virtualization via Binary Translation This is the approach that VMWare traditionally used. But nowadays also has VMMs that use para-virtualization. Para-virtualization – OS-assisted virtualization This is the approach that Xen traditionally used. But nowadays also has VMMs that use full virtualization.
Virtualization Approaches The x86 architecture (and its extensions) is the most popular computer architecture in enterprise datacenters today, hence avirtual infrastructure for the x86 architecture has tremendous benefits. The two leading software virtualization approaches for implementing hypervisors to date have been full (or transparent) virtualization (as traditionally used by VMWare’s VMM) and para-virtualization (as traditionally used by the Xen hypervisor). AMD and Intel have also introduced new processor instructions to assist virtualization software – hardware-assisted virtualization.
The Challenges of x86 Hardware Virtualization x86 operating systems are designed to run directly on the bare-metal hardware, so they naturally assume they fully “own” the computer hardware. The x86 architecture offers four levels of privilege known as Ring 0, 1, 2 and 3to operating systems and applications to manage access to the computer hardware. While user level applications typically run In Ring 3, the operating system needs to have direct access to the memory and hardware and mustexecute its privileged instructions in Ring 0.
Virtualizing the x86 architecture requires placing a virtualization layer under the guest operating system (which expects to be in the most privileged Ring 0) to create and manage the virtual machines that deliver shared resources. Further complicating the situation, some “sensitive” instructions cannot effectively be virtualized (aka non-virtualizable) as they have different semantics when they are not executed in Ring 0. The difficulty in trapping and translating these sensitive and privileged instruction requests at runtime was the challenge that originally made x86 architecture virtualization look impossible.
Virtualization Mechanics: Instruction Emulation • Traditional “trap and emulate” approach: - Guest OS attempts to access physical resource, e.g. I/O device. - Hardware raises exception (trap), invoking hypervisor's exception handler. - Hypervisor emulates result, based on access to virtual resource. • Most instructions do not trap - Makes efficient virtualization possible. - Requires that VM ISA is (almost) same as physical processor ISA. …. …. …. …. …. ….
“Impure” Virtualization Used for two reasons: • If the ISA is not trap-and-emulate virtualizable (as explained in previous slide) • To reduce virtualization overheads Change the guest OS, replacing “sensitive” instructions • By trapping code (hypervisor calls), and/or • By in-line emulation code Two standard approaches: • Binary translation - on-the-fly • Para-virtualization - beforehand
Full Virtualization via Binary Translation This approach translates kernel code to replace non-virtualizable instructions with new sequences of instructions that have the intended effect on the virtual hardware. Meanwhile, user level code is directly executed on the processor for high performance virtualization. This combination of binary translation and direct execution provides Full Virtualization as the guest OS is completely decoupled from the underlying hardware by the virtualization layer. The hypervisor translatesall “sensitive” operating system instructions (binary code) on-the-fly and caches the results for future use, while user-level instructions run unmodified at native speed.
Binary Translation • Locate “sensitive” instructions in guest OS binary and replace them on-the-fly by emulation code, or hypercalls to the hypervisor. • Pioneered by VMware • Can also detect combinations of sensitive instructions and replace by single emulation. • Does not require source, uses unmodified native binary - In this respect appears like pure virtualization! • Very tricky to get right (especially on x86!) • Needs to make some assumptions on “sane” behaviour of guest OS!
Para-Virtualization Para-virtualization involves modifying the OS kernel to replace non-virtualizable instructions with calls that communicate directly with the virtualization layer hypervisor. The hypervisor also provides hypercall interfaces for other critical kernel operations such as memory management, interrupt handling and time keeping. Para-virtualization is different from full virtualization, where the unmodified guest OS does not know it is virtualized and sensitive OS calls are trapped using binary translation.
Para-Virtualization • New name, old technique – First development in 1990, popularized for Cloud byXen. • Idea: Manually port the guest OS to modified ISA • Augment by explicit hypervisor calls (hypercalls) - Use more high-level API to reduce the number of traps - Remove un-virtualizable instructions - Remove “messy” ISA features which complicate virtualization • Generally out-performs pure virtualization and binary-rewriting • Drawbacks: - Significant programming effort! - Needs to be repeated for each guest-ISA/hypervisor combination - Requires guest OS source!
Hardware-Assisted Virtualization Hardware vendors such as Intel and AMD have introduced new ISA-level instructions. - Privileged instructions with a new CPU execution mode feature that allows the VMM to run in a new root mode below Ring 0. Privileged and sensitive calls are set to automatically trap to the hypervisor, removing the need for either binary translation or para-virtualization.
VMWare’s Approach Full virtualization is designed to provide total abstraction of the underlying physical system and creates a complete virtual system in which the guest operating systems can execute. No modification is required in the guest OS or application - The guest OS or application is not aware of the virtualized environment, so they have the capability to execute on the VM just as they would on a physical system. This approach can be advantageous because it enables complete decoupling of the software from the hardware. As a result, full virtualization can streamline the migration of applications and workloads between different physical systems. Full virtualization also helps provide complete isolation of different applications, which helps make this approach highly secure.
VMWare Approach – Full Virtualization VMware uses a combination of direct execution and binary translation techniquesto achieve full virtualization of an x86 system.
VMWare has two types of products: Hosted VMMs - VMWare Workstation, VMWare GSX, VMWare Player, VMWare Fusion, etc. Bare-Metal VMMs - VMWare ESX, ESXi
Xen’s Approach – Para-Virtualization Instead of making the virtual machine 100% functionally identical to the bare hardware, Xen makes use of para-virtualization. Para-virtualization is where the guest operating system is modified, and is designed to execute on a virtual machine that has a “similar”architecture to the underlying machine. Pros: Allows for improved performance Cons: The guestoperating system must undergo modification before it can be hosted by the Xen Hypervisor This can be a bit of a challenge! Need OS source code.
Para-virtualizationpresents each VM with an abstraction of the hardware that is similar, but not identical, to the underlying physical hardware. Para-virtualizationtechniques require modifications to the guest operating systems that are running on the VMs– via source code modification As a result, the guest operating systems are “aware” that they are executing on a VM - allowing for near-native performance. Para-virtualization eliminates the need for binary translation. But, while it is possible to modify open-source operating systems, such as Linux, it is not possible to modify “closed”-source operating systems such as Microsoft Windows. Hah! For such unmodified guest operating systems, a virtualization hypervisor must either adopt the full virtualization approach or rely on hardware extensions for virtualization in the processor architecture.
Xen Approach In a nutshell, para-virtualization is a technique in which the hypervisor provides an API, and the guest OS executing in the virtual machine calls that API, requiring guest OS modifications.
As of version 3.0, Xen also supports Hardware-Assisted Virtualization (HVM) - Virtualization support was added to the x86 ISA in newer processors – virtualization support is part of the machine code, i.e. ISA. - This enables full virtualization without the need for modifying the guest OS. => Xenversion 3.0 has the capability to run Microsoft Windows as a guest operating system unmodified if the host machine's processor supports hardware virtualization extensions.
More jargon … Hosted Virtual Machines Hosted Virtual Machines - The virtualizing software is installed on top of a traditional operating system. - Dual-Mode Virtual Machines >> Portion of the virtualization software runs at a privilege level within the Host operating system Examples: - VMWareGSX, Player, Workstation
System Virtual Machines In a system virtual machine, the virtualizing software (hypervisor) is installed in the place of a traditional operating system. - Also known as a “bare-metal” hypervisor - Guest OSs are installed on top of the hypervisor Eg: Xen, VMWare ESX