Download
keith adams ole agesen 1st october 2009 presented by chwa hoon sung kang joon young n.
Skip this Video
Loading SlideShow in 5 Seconds..
A Comparison of Software and Hardware Techniques for x86 Virtualization PowerPoint Presentation
Download Presentation
A Comparison of Software and Hardware Techniques for x86 Virtualization

A Comparison of Software and Hardware Techniques for x86 Virtualization

224 Views Download Presentation
Download Presentation

A Comparison of Software and Hardware Techniques for x86 Virtualization

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Keith Adams, Ole Agesen 1st October 2009 Presented by ChwaHoon Sung, Kang Joon Young A Comparison of Software and Hardware Techniques for x86 Virtualization

  2. Virtualization virtualization

  3. Outline • Classic Virtualization • Software Virtualization • Hardware Virtualization • Comparison and Results • Discussion

  4. Classic Virtualization(Trap-and-Emulate) • De-Privilege OS • Executes guest operating systems directly but at lesser privilege level, user-level user mode apps OS kernel mode

  5. Classic Virtualization(Trap-and-Emulate) • De-Privilege OS • Executes guest operating systems directly but at lesser privilege level, user-level apps apps user mode OS OS virtual machine monitor kernel mode

  6. Trap-and-Emulate • Runs guest operating system deprivileged. • All privileged instructions trap into VMM. • VMM emulates instructions against virtual state. • Resumes direct execution from next guest instruction.

  7. Classic Virtualization (Cont’d) • Architectural Obstacles • Traps are expensive. (~3000 cycles) • Many traps unavoidable. (e.g., page faults) • Not all architectures support the trap-and-emulate. (x86)

  8. System Virtualization Classic Virtualization (Popek & Goldberg) Trap-and-emulate Enhancement Hardware VMM Software VMM Full-virtualization (VMware) Para-virtualization (Xen) Hardware Support for Virtualization (Intel VT & AMD SVM) 7

  9. Outline • Classic Virtualization • Software Virtualization • Hardware Virtualization • Comparison and Results • Discussion

  10. Software Virtualization • Until recently, the x86 architecture has not permitted classical trap-and-emulate virtualization. • Some privileged state is visible in user mode • Guest OS can observe that current privilege level (CPL) in code segment selection (%cs). • Not all privileged operations trap when run in user mode • Dual-purpose instructions don’t trap (popf). • Software VMMs for x86 have instead used binary translation of the guest code.

  11. Binary Translation • Translates the kernel code to replace privileged instructions with new sequences of instructions that have the intended effect on the virtual hardware. • The software VMM uses a translator with these properties. • Binary – input is machine-level code. • Dynamic – occurs at runtime. • On demand – code translated when needed for execution. • System level – makes no assumption about guest code. • Subsetting– translates from full instruction set to safe subset. • Adaptive – adjust code based on guest behavior to achieve efficiency.

  12. Binary Translation (Cont’d) • The translators input is full x86 instruction set, including all the privileged instructions; output is a safe subset of user-mode instructions

  13. Binary Translation Guest Code Translator Translation Cache CPU Emulation Routines TC Index Callouts

  14. Guest Code vPC movebx, eax cli Straight-line code and ebx, ~0xfff Basic Block movebx, cr3 sti ret Control flow

  15. Guest Code Translation Cache vPC movebx, eax movebx, eax start cli call HANDLE_CLI and ebx, ~0xfff and ebx, ~0xfff movebx, cr3 mov [CO_ARG], ebx sti call HANDLE_CR3 ret call HANDLE_STI jmp HANDLE_RET

  16. Guest Code Translation Cache vPC movebx, eax movebx, eax start cli mov [CPU_IE], 0 and ebx, ~0xfff and ebx, ~0xfff movebx, cr3 mov [CO_ARG], ebx sti call HANDLE_CR3 ret mov [CPU_IE], 1 test [CPU_IRQ], 1 jne call HANDLE_INTS jmp HANDLE_RET

  17. Performance Advantages of BT • Avoid privilege instruction traps • Example: rdtsc (read time-stamp counter) <- privileged instruction • Trap-and-emulate: 2030 cycles • Callout-and-emulate: 1254 cycles (not TC) • In TC emulation: 216 cycles

  18. Outline • Classic Virtualization • Software Virtualization • Hardware Virtualization • Comparison and Results • Discussion

  19. Hardware Virtualization • Recent x86 extension • 1998 – 2005: Software-only VMMs using binary translation • 2005: Intel and AMD start extending x86 to support virtualization. • First-generation hardware • Allows classical trap-and-emulate VMMs. • Intel VT (Virtualization Technology) • AMD SVM (Security Virtual Machine) • Performance • VT/SVM help avoid BT, but not MMU ops. (actually slower!) • Main problem is efficient virtualization of MMU and I/O, Not executing the virtual instruction stream.

  20. New Hardware Features • VMCB(Virtual Machine Control Block) • in-memory data structure • Contains the state of guest virtual CPU. • Modes • Non-root mode: guest OS runs at its intended privilege level(ring 0) (Not fully privileged) • Root mode: VMM is running at a new ring with an even higher privilege level(Fully privileged) • Instructions • vmrun: transfers from root to non-root mode. • exit: transfers from non-root to root mode.

  21. VM 1 VM 2 VM n Ring 3 Ring 3 Ring 3 Ring 0 Ring 0 Ring 0 VMCB n VMCB 2 VMCB 1 Intel VT-x Operations VMX Non-root Mode . . . VM Exit VMX Root Mode Ring 0 VM Run VMLAUNCH

  22. Benefits of Hardware Virtualization • Hardware VMM reduces guest OS dependency • Eliminates need for binary translation • Facilitates support for Legacy OS • Hardware VMM improves robustness • Eliminates need for complex SW techniques • Simpler and smaller VMMs • Hardware VMM improves performance • Fewer unwanted (Guest  VMM) transitions

  23. Outline • Classic Virtualization • Software Virtualization • Hardware Virtualization • Comparison and Results • Discussion

  24. Software VMM vs. Hardware VMM • BT tends to win in these areas: • Trap elimination – BT can replace most traps with fastercallouts. • Emulation Speed – callouts jump to predecoded emulation routine. • Callout avoidance – for frequent cases, BT may use in-TC emulation routines, avoiding even the callout cost. • The hardware VMM wins in these area: • Code Density – since there is no translation. • Precise exceptions – BT performs extra work to recover guest state for faults. • System calls – runs without VMM intervention.

  25. Experiments • Software VMM – VMware Player 1.0.1 • Hardware VMM – VMware implemented experimental hardware assisted VMM. • Host – HP workstation, VT-enabled • 3.8 GHz Intel Pentium • All experiments are run natively, on software VMM and on Hardware-assisted VMM.

  26. Forkwait Test • Test to stress process creation and destruction • system calls, context switching, page table modifications, page faults, etc. • Results – to create and destroy 40,000 processes • Host – 0.6 seconds • Software VMM – 36.9 seconds • Hardware VMM – 106.4 seconds

  27. Nanobenchmarks • Benchmark • Custom guest OS – FrobOS • Tests performance of single virtualization sensitive operation • Observations • Syscall(Native == HW << SW) • Hardware – No VMM intervention in so near native • Software – traps • in (SW << Native << HW) • Native – access a off-CPU register • Software VMM – translates “in” into a short sequence of instructions that access virtual model of the same. • Hardware – VMM intervention

  28. Nanobenchmarks (Cont’d) • Observations (Cont’d) • ptemod (Native << SW << HW) • Both use shadowing technique to implement guest paging using traces for coherency • PTE writes causes significant overhead compared to native

  29. Outline • Classic Virtualization • Software Virtualization • Hardware Virtualization • Comparison and Results • Discussion

  30. Opportunities • Microarchitecture • Hardware overheads will shrink over time as implementations mature. • Measurements on desktop system using a pre-production version Intel’s Coremicroarchitecture. • Hardware VMM algorithmic changes • Drop trace faults upon guest PTE modification, allowing temporary incoherency with shadow page tables to reduce costs. • Hybrid VMM • Dynamically selects the execution technique • Hardware VMM’s superior system call performance • Software VMM’s superior MMU performance • Hardware MMU support • Trace faults, context switches and hidden page faults can be handled effectively with hardware assistance in MMU virtualization.

  31. Conclusion • Hardware extensions allow classical virtualization on x86 architecture. • Extensions remove the need for Binary Translation and simplifies VMM design. • Software VMM fares better than Hardware VMM in many cases like context switches, page faults, trace faults, I/O. • New MMU algorithms might narrow the gap in performance.

  32. Server Workload • Benchmarks • Apache ab benchmarking tool – on Linux installation of Apache http server and on Windows installation • Tests I/O efficiency • Observations • Both VMMs perform poorly • Performance on Windows and Linux differ widely • Reason: Apache Configuration • Windows – single address space (less paging) • Hardware VMM is better • Linux – multiple address spaces (more paging) • Software VMM is better

  33. Desktop-Oriented Workload • Benchmark • PassMark on Windows XP Professional • The suite of microbenchmarks test various aspects of workstation performance. • Observations • Large RAM test • Exhausts memory. (paging capabilities) • Intended to test paging capability. • Software VMM is better. • 2D Graphics test • Involves system calls. • Hardware VMM is better.

  34. Less Synthetic Workload • Compilation times Linux kernel and Apache (on Cygwin) • Observation • Big compilation jobs – lots of page faults. • Software VMM is better in handling page faults.