A Comparison of Software and Hardware Techniques for x86 Virtualization

1. A Comparison of Software and Hardware Techniques for x86 Virtualization Paper by Keith Adams & Ole Agesen (VMWare) Presentation by Jason Agron

2. Presentation Overview What is virtualization? Traditional virtualization techniques. Overview of Software VMM. Overview of Hardware VMM. Evaluation of VMMs. Conclusions Questions

3. �Virtualization� Defined by Popek & Goldberg in 1974. Establishes 3 essential characteristics of a VMM: Fidelity Running on VMM == Running directly on HW. Performance Performance on VMM == Performance on HW. Safety VMM manages all hardware resources (correctly?).

4. Is This Definition Correct? Yes, but it�s scope should be taken into account. It assumes the traditional �trap-and-emulate� style of full virtualization. This was extremely popular circa 1974. Completely �transparent�. It does not account for� Paravirtualization. Not transparent. Guest software is modified.

5. Full Virtualization Full == Transparent Must be able to �detect� when VMM must intervene. Definitions: Sensitive Instruction: Accesses and/or modifies privileged state. Privileged Instruction: Traps when run in an unprivileged mode.

6. Traditional Techniques De-privileging Run guest programs in a reduced privilege level so that privileged instructions trap. VMM intercepts the trap and emulates the functionality of the original call. Very similar to the way programs transfer control to the OS kernel during a system call.

7. Traditional Techniques Primary & Shadow Structures Each virtual system�s privileged state differs from that of the underlying HW. Therefore, the VMM must provide the �correct� environment to meet the guests� expectations. Guest-level primary structures reflect the state that a guest sees. VMM-level shadow structures are copies of primary structures. Kept coherent via �memory traces�.

8. Traditional Techniques Memory traces Traps occur when on-chip privileged state is accessed/modified. What about off-chip privileged state? i.e. page tables. They can be accessed by LOADs/STOREs. Either by CPU or DMA-capable devices. HW page protection schemes are employed to �detect� when this happens.

9. Refinements to Classical Virtualization Traps are expensive! Improve the Guest/VMM interface: AKA Paravirtualization. Allows for higher-level information to be passed to the VMM. Can provide features beyond the baseline of �classic� virtualization. Improve the VMM/HW interface: IBM�s System 370 - Interpretive Execution Mode. Guests allowed safe and direct access to certain pieces of privileged information w/o trapping.

10. Software VMM x86 - not �classically� virtualizable. Visibility of privileged state. i.e. Guest can observe it�s privilege level via un-protected %cs register. Not all sensitive instructions trap. i.e. Privileged execution of popf (pop flags) instruction modifies on-chip privileged state. Unprivileged execution must trap so that VMM can emulate it�s effects. Unfortunately, no trap occurs, instead a NO-OP.

11. Software VMM How can x86�s faults be overcome? What if guests execute on an interpreter? The interpreter can� Prevent leakage of privileged state. Ensure that all sensitive instructions are correctly detected. Therefore it can provide� Fidelity Safety Performance??

12. Interpreter-Based Software VMM Authors� Statement: An interpreter-based VMM will not provide adequate performance. A single native x86 instruction will take N instructions to interpret. Question: Is this necessarily true? Authors� Solution: Binary Translation.

13. Properties of This BT Dynamic and On-Demand Run-time translation interleaved with code execution. Code is translated only when about to execute. Laziness avoids problem of distinguishing code & data. System-level All translation rules are set by the x86 ISA. Subsetting Input is x86 ISA binary Output is a �safe� subset of the ISA. Mostly user-mode instructions. Adaptive Can optimize generated code over time

14. BT Process Input a TU (Translation Unit) Stopping at either: 12 instructions. Terminating instruction (usually control flow). Translate the TU into a CCF (Compiled Code Fragment). Place generated CCF into the TC (Translation Cache).

15. BT Process CCFs must be chained together to form a �complete� program. Each CCF ends in a continuation that acts as a link. Continuations are evaluated at run-time� Can be translated into jumps Can be �removed� (code merely falls through to next CCF). If a continuation is never �hit�� Then it is never transformed. Thus, the BT acts like a just-in-time compiler. Software VMM can switch between BT-mode and direct execution. Performance optimization.

16. Adaptive BT Traps are expensive. BT can avoid some traps. i.e. rdtsc instruction TC emulation << Call-out & emulate << Trap-and-emulate. Sensitive non-privileged instructions are harder to avoid. i.e. LOADs/STOREs to privileged data. Use adaptive BT to re-work code.

17. Adaptive BT Detect instructions that trap frequently Adapt the translation of these instructions. Re-translate to avoid trapping. Jump directly to translation. Call out to interpreter. Adaptive BT tries to eliminate more and more traps over time.

18. Hardware VMM Experimental VMM based on new x86 virtualization extensions. AMD�s SVM & Intel�s VT. New HW features: Virtual Machine Control Blocks (VMCBs). Guest mode privilege level. Ability to transfer control to/from guest mode. vmrun - host to guest. exit - guest to host.

19. Hardware VMM VMM executes vmrun to start a guest. Guest state is loaded into HW from in-memory VMCB. Guest mode is resumed and guest continues execution. Guests execute until they �toy� with control bits of the VMCB. An exit operation occurs. Guest saves data to VMCB. VMM state is loaded into HW - switches to host mode. VMM begins executing.

20. x86 Architecture Extensions

21. Qualitative Comparison Software wins in� Trap elimination via adaptive BT. HW replaces traps w/ exits. Emulation speed. Translations and call-outs essentially jump to pre-decoded emulation routines. HW VMM must fetch VMCB and decode trapping instructions before emulating.

22. Qualitative Comparison Hardware wins in� Code density. No translation = No replicated code segments Precise exceptions. BT approach must perform extra work to recover guest state for faults and interrupts. HW approach can just examine the VMCB. System calls. [Can] run w/o VMM intervention.

23. Qualitative Comparison (Summary) Hardware VMMs� Native performance for things that avoid exits. However exits are still costly (currently). Strongly targeted towards �trap-and-emulate� style. Software VMMs� Carefully engineered to be efficient. Flexible (b/c it isn�t HW).

24. Experiments 3.8 GHz Intel Pentium 4. HT disabled (b/c most virtualization products can�t handle this). The contenders� Mature commercial Software VMM. Recently developed Hardware VMM. Fair battle?

25. SPECint & SPECjbb Primarily user-level computations. Unaffected by VMMs Therefore, performance should be near native. Experimental results confirm this. 4% average slowdown for Software VMM. 5% average slowdown for Hardware VMM. The cause is �host background activity�. Windows jiffy rate << Linux jiffy rate Windows test closer to native than Linux test.

26. Apache ab Benchmark Tests I/O efficiency SW VMM (and HW VMM?) use host as I/O controller. Therefore ~2x overhead of normal I/O Experimental results confirm this� ~ 2x slowdown. Both HW and SW VMMs �suck�. Windows and Linux tests differ widely Windows - single process (less paging). HW VMM is better. Linux - multiple processes (more paging). SW VMM is better. Why (hint: VMCB)?

27. PassMark Benchmarks A synthetic suite of microbenchmarks. used to pinpoint various aspects of workstation performance. Large RAM test - exhausts memory Intended to test paging capability SW VMM wins. 2D Graphics test - hits system calls HW VMM wins.

28. Compile Jobs Test �Less� synthetic test. Compilation time of Linux Kernel, Apache, etc. SW VMM beats the HW VMM again. Big compilation job w/ lots of files = Lots of page faults. SW VMM is better at this than HW VMM. Compared to native speed� SW VMM is ~60% as fast. HW VMM is ~55% as fast.

29. ForkWait Test Test to stress process creation/destruction. System calls, context switching, page table modifications, page faults, context switching, etc. Native = 6.0 seconds. SW VMM = 36.9 seconds. HW VMM = 106.4 seconds.

30. Nanobenchmarks Tests used to exercise single �virtualization sensitive� operations. All tests are conducted using a specially developed guest OS -- FrobOS.

31. Nanobenchmarks Syscall (Native == HW << SW) HW VMM doesn�t intervene. SW VMM traps. In (SW << Native << HW) Native goes off-chip. SW VMM interacts with virtual CPU model. HW VMM intervenes Ptemod (Native << SW << HW) Both take a hit (both use shadowing) SW VMM can adapt, but still less than ideal. HW VMM can�t, so it must always do exit/vmrun.

32. Analysis of Results SW and HW VMMs are �even� except� When BT adaptation helps. i.e. page table faults vs.. exit/vmrun round-trips. They claim that �we have found few workloads that benefit from current HW extensions�. BUT� HW extensions are getting faster all of the time. But �stateless� HW VMM approach still has a memory bottleneck with VMCB access! Trouble w/ HW VMM is MMU virtualization. HW assisted MMU could relieve VMM of a lot of work! Being proposed by both AMD and Intel.

33. Future/Related Works CISC/RISC? Should the HW be more complex to support virtualization? Should a complex SW VMM be used? Open source? Open source OS code allows for paravirtualization. What should the OS/VMM interface be? It should be investigated, standardized, documented, and most importantly SUPPORTED! What should the OS/HW interface be? This should be looked at as well!

34. Conclusions Hardware extensions now allow x86 to execute guests directly (trap-and-emulate style). Comparison of SW and HW VMMs� Both are able to execute computation-bound workloads at near native speed. When I/O and process management is involved. SW prevails. When there are a lot of system calls. HW prevails.

35. Conclusions SW VMM techniques are very mature. Also, very flexible. New x86 extensions are relatively immature and present a fixed (inflexible) interface. Future work on HW extensions promises to improve performance. Hybrid SW/HW VMMs promise to provide benefits of both worlds. There is no �clear� winner at this time.

36. Questions???? References: K. Adams and O. Agesen (2006). A comparison of software and hardware techniques for x86 virtualization. In Proceedings of the 12th international Conference on Architectural Support For Programming Languages and Operating Systems. ASPLOS-XII. ACM Press, New York, NY, 2-13.

A Comparison of Software and Hardware Techniques for x86 Virtualization

A Comparison of Software and Hardware Techniques for x86 Virtualization

Presentation Transcript

A Comparison of Software and Hardware Techniques for x86 Virtualization

A Comparison of Software and Hardware Techniques for x86 Virtualization

Detection of Control Flow Errors Survey of Hardware and Software Techniques

Oracle VM Server Virtualization (x86) Overview

Software / Hardware Partitioning Techniques

Comparison of two techniques

Hardware virtualization

A Comparison of Crowd Simulation Techniques

A Comparison of Software and Hardware Techniques for x86 Virtualization

Enhancing Trusted Platform Modules with Hardware-Based Virtualization Techniques

6.828: PC hardware and x86

PC hardware and x86

Hardware virtualization

A critical assault upon “A Comparison of Software and Hardware Techniques for x86 Virtualization”

A Comparison of Software and Hardware Techniques for x86 Virtualization

Applying Virtualization Techniques to Software Engineering

Comparison of Software Cost Estimation Techniques An Overview

APPENDIX A HARDWARE AND SOFTWARE