Distributed systems cs 15 440
This presentation is the property of its rightful owner.
Sponsored Links
1 / 51

Distributed Systems CS 15-440 PowerPoint PPT Presentation


  • 90 Views
  • Uploaded on
  • Presentation posted in: General

Distributed Systems CS 15-440. Virtualization- Part II Lecture 24, Dec 7, 2011 Majd F. Sakr, Mohammad Hammoud andVinay Kolar. Today…. Last session Virtualization- Part I Today’s session Virtualization – Part II Announcements: PS4 is due today by 11:59PM

Download Presentation

Distributed Systems CS 15-440

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Distributed systems cs 15 440

Distributed SystemsCS 15-440

Virtualization- Part II

Lecture 24, Dec 7, 2011

Majd F. Sakr, Mohammad Hammoud andVinay Kolar


Today

Today…

  • Last session

    • Virtualization- Part I

  • Today’s session

    • Virtualization – Part II

  • Announcements:

    • PS4 is due today by 11:59PM

    • Project 4is due on Dec 13 by 11:59PM (No deadline extension)

    • On Monday Dec 12, each team will present its work on project 4


Objectives

Objectives

Discussion on Virtualization

Resource virtualization

Resource virtualization

Partitioning and Multiprocessor virtualization

Virtual machine types

Virtualization, para-virtualization, virtual machines and hypervisors

Why virtualization, and virtualization properties


Computer system hardware

Computer System Hardware

CPU

Memory

MMU

Controller

Local Bus

Interface

High-Speed

I/O Bus

Frame Buffer

NIC

Controller

Bridge

LAN

Low-Speed

I/O Bus

CD-ROM

USB


Resource virtualization

Resource Virtualization

Resource Virtualization

CPU Virtualization

Memory Virtualization

Memory Virtualization

I/O Virtualization

I/O Virtualization


Cpu virtualization

CPU Virtualization

  • Interpretation and Binary Translation

  • Virtualizable ISAs


Cpu virtualization1

CPU Virtualization

  • Interpretation and Binary Translation

  • Virtualizable ISAs


Instruction set architecture

Instruction Set Architecture

  • Typically, the architecture of a processor defines:

    • A set of storage resources (e.g., registers and memory)

    • A set of instructions that manipulate data held in storage resources

  • The definition of the storage resources and the instructions that manipulate data are documented in what is referred to as Instruction Set Architecture (ISA)

  • Two parts in the ISA are important in the definition of VMs:

    • User ISA: visible to user programs

    • System ISA: visible to supervisor software (e.g., OS)


Ways to virtualize cpus

Ways to Virtualize CPUs

  • The key to virtualize a CPU lies in the execution of the guest instructions, including both system-level and user-level instructions

  • Virtualizing a CPU can be achieved in one of two ways:

    • Emulation: the only processor virtualization mechanism available when the ISA of the guest is different from the ISA of the host

    • Direct native execution: possible only if the ISA of the host is identical to the ISA of the guest


Emulation

Emulation

  • Emulation is the process of implementing the interface and functionality of one system (or subsystem) on a system (or subsystem) having different interface and functionality

  • In other words, emulation allows a machine implementing one ISA (the target), to reproduce the behavior of a software compiled for another ISA (the source)

  • Emulation can be carried out using:

    • Interpretation

    • Binary translation

Guest

Source ISA

Emulated by

Host

Target ISA


Basic interpretation

Basic Interpretation

Source Memory State

Source Context Block

  • Interpretation involves a 4-step cycle (all in software):

  • Fetching a source instruction

  • Analyzing it

  • Performing the required operation

  • Then fetching the next source instruction

Program Counter

Code

Condition Codes

Reg 0

Data

Reg 1

Reg n-1

Interpreter Code

Stack


Decode and dispatch

Decode-And-Dispatch

  • A simple interpreter, referred to as decode-and-dispatch, operates by stepping through the source program (instruction by instruction) reading and modifying the source state

  • Decode-and-dispatch is structured around a central loop that decodes an instruction and then dispatches it to an interpretation routine

  • It uses a switch statement to call a number of routines that emulate individual instructions

Source Code

Interpreter

Routines

Source Code

Dispatch

Loop

Decode-And-Dispatch Interpretation

Native Execution


Decode and dispatch drawbacks

Decode-And-Dispatch- Drawbacks

Source Code

Interpreter

Routines

  • The central dispatch loop of a decode-and-dispatch interpreter contains a number of branch instructions

    • Indirect branch for the switch statement

    • A branch to the interpreter routine

    • A second register indirect branch to return from the interpreter routine

    • And a branch that terminates the loop

  • These branches tend to degrade performance

Dispatch

Loop

Decode-And-Dispatch Interpretation


Indirect threaded interpretation

Indirect Threaded Interpretation

  • To avoid some of the branches, a portion of the dispatch code can be appended (threaded) to the end of each of the interpreter routines

  • To locate interpreter routines, a dispatch table and a jumpinstruction can be used when stepping through the source program

  • This scheme is referred to as indirect threaded interpretation

Interpreter

Routines

Interpreter

Routines

Source Code

Source Code

Dispatch

Loop

Indirect Threaded Interpretation

Decode-And-Dispatch Interpretation


Indirect threaded interpretation drawbacks

Indirect Threaded Interpretation- Drawbacks

Interpreter

Routines

Source Code

  • The dispatch table causes an overhead when looked up:

    • It requires a memory access and a register indirect branch

  • An interpreter routine is invoked every time the same instruction is encountered

    • Thus, the process of examining the instruction and extracting its various fields is always repeated

Indirect Threaded Interpretation


Predecoding 1

Predecoding (1)

  • It would be more efficient to perform a repeated operation only once

  • We can save away the extracted information of an instruction in an intermediate form

  • The intermediate form can then be simply reusedwhenever an instruction is re-encountered for emulation

  • However, a Target Program Counter (TPC) will be needed to stepthrough the intermediate code

PowerPC source code

Lwz r1, 8(r2) //load word and zero

Add r3, r3, r1 //r3 = r3 +r1

Stw r3, 0(r4) //store word

PowerPC program in

predecoded intermediate form

07

1208

(load word

and zero)

08

3103

(add)

37

3400

(store word)


Predecoding 2

Predecoding (2)

  • To avoid a memory lookup whenever the dispatch table is accessed, the opcode in the intermediate form can be replaced with the address of the interpreter routine

  • This leads to a scheme referred to as direct threaded interpretation

07

1208

(load word

and zero)

001048d0

1208

(load word

and zero)

08

3103

00104800

3103

(add)

(add)

37

3400

00104910

3400

(store word)

(store word)


Direct threaded interpretation

Direct Threaded Interpretation

Interpreter

Routines

Interpreter

Routines

Source Code

Intermediate Code

Source Code

Predecoder

Indirect Threaded Interpretation

Direct Threaded Interpretation


Direct threaded interpretation drawbacks

Direct Threaded Interpretation- Drawbacks

  • Direct threaded interpretation still suffers from major drawbacks:

  • It limits portability because the intermediate form is dependent on the exact locations of the interpreter routines

  • The size of predecoded memory image is proportional to the original source memory image

  • All source instructions of the same type are emulated with the same interpretation routine

Interpreter

Routines

Source Code

Intermediate Code

Predecoder


Binary translation

Binary Translation

  • Performance can be significantly enhanced by mapping each individual source binary instruction to its own customized target code

  • This process of converting the source binary program into a target binary program is referred to as binary translation

  • Binary translation attempts to amortize the fetch and analysis costs by:

    • Translating a block of source instructions to a block of target instructions

    • Caching the translated code for repeated use


Binary translation1

Binary Translation

Binary Translated Target

Code

Interpreter

Routines

Source Code

Source Code

Intermediate Code

Binary Translator

Predecoder

Direct Threaded Interpretation

Binary Translation


Static binary translation

Static Binary Translation

  • It is possible to binary translate a program in its entirety before executing the program

  • This approach is referred to as static binary translation

  • However, in real code using conventional ISAs, especially CISC ISAs, such a static approach can cause problems due to:

    • Variable-length instructions

    • Register indirect jumps

    • Data interspersed with instructions

    • Pads to align instructions

Data in instruction

stream

Inst. 1

Inst. 2

Inst. 3

jump

Reg.

Data

Inst. 5

Inst. 6

Pad for instruction

alignment

Uncond. Branch

Pad

Inst. 8

Jim indirect to ???


Dynamic binary translation

Dynamic Binary Translation

  • A general solution is to translate the binary while the program is operating on actual input data (i.e., dynamically) and interpret new sections of code incrementally as the program reaches them

  • This scheme is referred to as dynamic binary translation

Source Program Counter (SPC) to Target Program Counter (TPC)

Map Table

Translator

Interpreter

Miss

Hit

Emulation Manager

Code Cache


Dynamic binary translation1

Dynamic Binary Translation

Start with SPC

Look Up SPCTPC in Map Table

Hit in Table

No

Use SPC to Read Instructions from Source Memory Image

-----------------------

Interpret, Translate, and Place into Code Cache

Yes

Branch to TPC and Execute Translated Block

Write New SPCTPC Mapping into Map Table

Get SPC for Next Block


Cpu virtualization2

CPU Virtualization

  • Interpretation and Binary Translation

  • Virtualizable ISAs


Privilege rings in a system

Privilege Rings in a System

  • In the ISA, special privileges to system resources are permitted by defining modes of operations

  • Usually an ISA specifies at least two modes of operation:

    • System (also called supervisor, kernel, or privileged) mode: all resources are accessible to software

    • User mode: only certain resources are accessible to software

Apps

(User Level)

User Mode

System Mode

Kernel

Level 0

Level 1

Level 2

Level 3

Simple systems have 2 rings

Intel’s IA-32 allows 4 rings


Conditions for isa virtualizability

Conditions for ISA Virtualizability

  • In a native system VM, the VMM runs in system mode, and all other software run in user mode

  • A privileged instruction is defined as one that traps if the machine is in user mode and does not trap if the machine is in system mode

  • Examples of Privileged Instructions are:

    • Load PSW: If it can be accessed in user mode, a malicious user program can put itself in system mode and get control of the system

    • Set CPU Timer: If it can be accessed in user mode, a malicious user program can change the amount of time allocated to it before getting context switched


Types of instructions

Types of Instructions

  • Instructions that interact with hardware can be classified into three categories:

  • Control-sensitive: Instructions that attempt to change the configuration of resources in the system (e.g., memory assigned to a program)

  • Behavior-sensitive: Instructions whose behaviors or results depend on the configuration of resources

  • Innocuous: Instructions that are neither control-sensitive nor behavior-sensitive


Virtualization theorm

Virtualization Theorm

  • Virtualization Theorem: For any conventional third-generation computer, a VMM may be constructed if the set of sensitive instructions for that computer is a subset of the set of privileged instructions

User

Nonprivileged

Privileged

Privileged

Sensitive

Sensitive

Critical

Does not satisfy the theorem

Satisfies the theorem


Efficient vm implementation

Efficient VM Implementation

  • An OS running on a guest VM should not be allowed to change hardware resources (e.g., executing PSW and set CPU timer)

  • Therefore, guest OSs are all forced to run in user mode

An efficient VM implementation can be constructed if instructions that could interfere with the correct or efficient functioning of the VMM always trap in the user mode


Trapping to vmm

Trapping To VMM

Instruction Trap Occurs

Dispatcher

Privileged

Instruction

These instructions desire to

change machine resources

(e.g., load relocation bounds

register)

Interpreter Routine 1

Privileged

Instruction

Privileged

Instruction

Interpreter Routine 2

Allocator

Privileged

Instruction

These instructions do not

change machine resources

but access privileged resources

(e.g., IN, OUT, Write TLB)

Interpreter Routine n


Handling privileged instructions

Handling Privileged Instructions

  • Guest OS code in VM

  • (user mode)

  • Privileged Instruction (LPSW)

  • Next Instruction (Target of LPSW)

VMM code

(privileged mode)

Dispatcher

LPSW Routine:

Change mode to privileged

Check privilege level in VM

Emulate Instruction

Compute target

Restore mode to user

Jump to target


Critical instructions

Critical Instructions

  • Critical instructions are sensitive but not privileged– they do not generate traps in user mode

  • Intel IA-32 has several critical instructions

  • An example is POPF in IA-32 (Pop Stack into Flags Register) which pops the flag registers from a stack held in memory

  • One of the flags is the interrupt-enable flag, which can be modified only in the privileged mode

  • In the user mode, POPF can overwrite all flags except the interrupt-enable flag (for this it acts as no-op)

Can an efficient VMM be constructed with the presence of critical instructions?


Handling critical instructions

Handling Critical Instructions

  • Critical Instructions are problematic and they inhibit the creation of an efficient VMM

  • However, if an ISA is not efficiently virtualizable, this does not mean we cannot create a VMM

  • The VMM can scan the guest code before execution, discover all critical instructions, and replace them with traps (system calls) to the VMM

  • This replacement process is known as patching

  • Even if an ISA contains only ONE critical instruction, patching will be required


Patching of critical instructions

Patching of Critical Instructions

Code patch for

discovered

critical instruction

Trap to

VMM

Scanner and Patcher

Patched Code

Original Code


Code caching

Code Caching

  • Some of the critical instructions that trap to the VMM might require interpretation

  • Interpretation overhead might slow down the VMM especially if the frequency of critical instructions requiring interpretations increases

  • To reduce overhead, interpreted instructions can be cached, using a strategy known as code caching

  • Code caching is done on a block of instructions surrounding the critical instruction (larger blocks lend themselves better to optimization)


Caching interpreted code

Caching Interpreted Code

Specialized

Emulation Routines

Block 1

Block 1

Block 2

Block 3

Code section

emulated in code

cache

Control Transfer,

e.g., trap

Translation Table

Code

Cache

Block 2

Block 3

Two critical instructions

combined into a

single block.

Patched Program

VMM


Resource virtualization1

Resource Virtualization

Resource Virtualization

CPU Virtualization

Memory Virtualization

I/O Virtualization

I/O Virtualization


Memory virtualization

Memory Virtualization

In Real Machine

  • Virtual memory makes a distinction between the logical view of memory as seen by a program and the actual hardware memory as managed by the OS

  • The virtual memory support in traditional OSs is sufficient for providing guest OSs with the view of having (and managing) their own real memories

    • Such an illusion is created by the underlying VMM

Virtual Memory Address

(seen by a program running on OS)

Physical Memory Address

In Virtual Machine

Virtual Memory Address

(seen by a program running on guest OS)

Real Memory Address

Physical Memory Address


An example

An Example

Virtual Memory of Program 1 onVM1

Virtual Memory of Program 2 onVM1

Real Memory of VM1

Virtual Memory of Program 3 onVM2

Real Memory of VM2

1000

1500

1000

500

1000

Not Mapped

2000

3000

5000

3000

4000

Physical Memory of System

4000

500

1000

3000

Page Table for Program 1

Page Table for Program 2

Page Table for Program 3

Real Map Table for VM1at VMM

Real Map Table for VM2 at VMM


Resource virtualization2

Resource Virtualization

Resource Virtualization

CPU Virtualization

Memory Virtualization

I/O Virtualization


I o virtualization

I/O Virtualization

  • The virtualization strategy for a given I/O device type consists of:

    • Constructing a virtual version of the device

    • Virtualizing the I/O activities directed to the device

  • A virtual device given to a guest VM is typically (but not necessarily) supported by a similar, underlying physical device

  • When a guest VM makes a request to use the virtual device, the request is intercepted by the VMM

  • The VMM converts the request to the equivalent request understood by the underlying physical device and sends it out


Virtualizing devices

Virtualizing Devices

  • The technique that is used to virtualize an I/O device depends on whether the device is shared and, if so, the ways in which it can be shared

  • The common categories of devices are:

    • Dedicated devices

    • Partitioned devices

    • Shared devices

    • Spooled devices


Dedicated devices

Dedicated Devices

  • Some I/O devices must be dedicated to a particular guest VM or at least switched from one guest to another on a very long time scale

  • Examples of dedicated devices are: the display, mouse, and speakers of a VM user

  • A dedicated device does not necessarily have to be virtualized

  • Requests to and from a dedicated device in a VM can theoretically bypass the VMM

  • However, in practice these requests go through the VMM because the guest OS runs in a non-privileged user mode


Partitioned devices

Partitioned Devices

  • For some devices it is convenient to partition the available resources among VMs

  • For example, a disk can be partitioned into several smaller virtual disks that are then made available to VMs as dedicated devices

  • A location on a magnetic disk is defined in terms of cylinders, heads, and sectors (CHS)

  • The physical properties of the disk are virtualized by the disk firmware

  • The disk firmware transforms the CHS addresses into consecutively numbered logical blocks for use by host and guest OSs


Disk virtualization

Disk Virtualization

  • To emulate an I/O request for a virtual disk:

    • The VMM uses a map to translate the virtual parameters into real parameters

    • The VMM then reissues the request to the disk controller

Real Block Addresses

VMM

Guest OS

0006

000

VM1

---

Host OS

001

0002

CHSLBA

0008

002

003

Real Block Addresses

Guest OS

004

---

VM2

0002

Physical Disk Drive

(CHS)

Logical Block Addresses

(LBAs)

---

0005


Shared devices

Shared Devices

  • Some devices, such as a network adapter, can be shared among a number of guest VMs at a fine time granularity

  • For example, every VM can have its own virtual network address maintained by the VMM

  • A request by a VM to use the network is translated by the VMM to a request on a physical network port

    • To make this happen, the VMM uses its own physical network address and a virtual device driver

  • Similarly, incoming requests through various ports are translated into requests for virtual network addresses associated with different VMs


Network virtualization scenario i

Network Virtualization- Scenario I

  • In this example, we assume that the virtual network interface card (NIC) is of the same type as the physical NIC in the host system

VMM

VMM sends packet on virtual bridge to device driver of physical NIC (e.g., OUTS 0x280, …)

Device Driver

NIC device driver launches packet on network using wire signals

User on VM1

User sends message to external machine (e.g., using send())

OS on VM1

OS converts into I/O instructions for virtual NIC, (e.g., OUTS 0xf0…)

To Network


Network virtualization scenario ii

Network Virtualization- Scenario II

  • In this scenario, we assume that the desired communication is between two virtual machines on the same platform

VMM

VMM sends packet on virtual bridge to device driver of physical NIC (e.g., OUTS 0x280, …)

VMM raises interrupt in receiver’s OS

Device Driver

NIC device driver converts send message to a receive message for receiving VM

User on VM1

User sends message to local virtual machine (e.g., using send())

OS on VM1

OS converts into I/O instructions (e.g., OUTS 0xf0…)

User on VM2

Receiver gets packet

OS on VM2

Interrupt handler in OS generates I/O instructions to receive packet


Spooled devices

Spooled Devices

  • A spooled device, such as a printer, is shared, but at a much higher granularity than a device such as a network adapter

  • Virtualization of spooled devices can be performed by using a two-level spool table approach:

    • Level 1 is within the guest OS, with one table for each active process

    • Level 2 is within the VMM, with one table for each guest OS

  • A request from a guest OS to print a spool buffer is intercepted by the VMM, which copies the buffer into one of its own spool buffers

  • This allows the VMM to schedule requests from different guest OSs on the same printer


Distributed systems cs 15 440

Thank You!


  • Login