1 / 34

Virtualisation Front Side Buses SMP systems

Virtualisation Front Side Buses SMP systems. COMP311 2007 Jamie Curtis. Virtualisation. Allows a “Host” OS to execute a “Guest” OS as a task No need for “Guest” OS’s cooperation Host is often called a Hypervisor or VMM VMM controls devices Often presents virtual devices to guest OS’s.

Download Presentation

Virtualisation Front Side Buses SMP systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Virtualisation Front Side Buses SMP systems COMP311 2007 Jamie Curtis

  2. Virtualisation • Allows a “Host” OS to execute a “Guest” OS as a task • No need for “Guest” OS’s cooperation • Host is often called a Hypervisor or VMM • VMM controls devices • Often presents virtual devices to guest OS’s

  3. Virtualisation cont. • PC architecture makes this tricky • Protected mode introduced privilege rings • 4 Levels – 0,1,2,3 with decreasing privilege • Operating systems assume full control (ring 0) • and userspace at ring 3 • There is no control on reading this state • Some instructions silently fail if not run at the correct privilege level instead of causing a fault

  4. Virtualisation cont. • 4 approaches to fix this • Emulation • Paravirtualisation • Binary Translation • Hardware Assisted • Emulation • Fake the entire machine • Very slow but doesn’t require the same host architecture as the guest

  5. Paravirtualisation • Alter the guest so that it doesn’t use the “bad” instructions. • Guest instead calls out to the VMM • Very fast with minimal overhead • Requires support from all guest OS’s • Effectively limited to open source OS’s • Championed by Xen

  6. Binary Translation • The VMM now dynamically translates the byte stream before it’s executed, replacing “bad” instructions as it goes • Lots of optimisations make this not nearly as bad as it sounds. • Allows running un-modified OS’s • Performance hit can be anything from 5 – 60% depending on workload. • Championed by VMware

  7. Hardware Virtualisation • Adds another privilege level for the VMM • Hardware maintains state for each guest • VMM gets very flexible control of what causes faults • Intel and AMD again have similar specs (but incompatible !) • AMD – “Pacifica”, Intel – “Vanderpool”

  8. Hardware Virtualisation cont. • Initial solutions don’t deal with enough in hardware, causing them often to be slower than BT • Transition into and out of guests is very slow • VMware and Xen both have added support • Hardware support is getting more complete in newer versions

  9. Virtual I/O Devices • Typically VMMs provide virtual hardware drivers to the guests • These may then map onto real hardware inside the VMM • Allowing secure and separated direct access to hardware is difficult

  10. AMD 10h Family • AMD’s latest architecture adds a number of important virtualisation enhancements • Primarily focused around making fewer calls into the VMM • Three main enhancements • Nested Page Tables • Tagged TLB • Device Exclusion Vector (DEV)

  11. Nested Page Tables • Current designs use shadowed page tables • The MMU is setup for the current guest / process • Changes to the MMU are trapped by the VMM • Nested page tables replicate the page table per VM. • Looks similar to the virtual address space provided by the OS to processes, just one more level up • Makes lookups through the page table slower

  12. Tagged TLB • TLB’s cache lookups through the page tables • Typically they are flushed each time you switch process or VM • Tagged TLB’s add a tag to reference which VM this tag relates to • No longer have to flush TLB’s when switching VM • Makes VM – VMM – VM transitions cheaper • Reduces the performance hit of nested page tables

  13. Device Exclusion Vector • Contains a mapping of what pages a device can access • Allows the VMM to give DMA access to specific pages • Allows a device to do DMA directly into a specific VM’s memory space

  14. Virtualisation Future • Linux now has 4 full virtualisation packages • VMWare , Xen, KVM, Lguest • Intel and AMD working on next-gen hardware support • Dense, multi-core solutions making virtualisation very attractive • Power and space efficient • Keeps getting more and more important

  15. Traditional Intel Architecture Northbridge Front Side Bus Southbridge

  16. Clock vs. Data rates • Clock rates no longer equal data rates • Watch specifications as both can look identical • e.g. • Intel's FSB is normally referred to as an 1333MHz bus • It is actually a 333MHz clock with a quad-pumped data rate • Should really be 1333MT/s

  17. Frontside Bus & Dual Core • 64 bit, quad-pumped 333MHz • 8 * 4 * 333 = 10.6GB/s • Half Duplex • Traditional P4 FSB has been point to point • Xeon’s SMP requires chipsets support a proper multipoint bus for the FSB. • How did a dual core Pentium D work ?

  18. Intel Dual Core • Slap two cores on the same die ! • How do they communicate ? • Over the FSB like a Xeon ! • Requires a new chipset to support it.

  19. Caches • There are two sets of L1 and L2 cache, one for each core. • What happens if both cores are caching the same memory location ? • Need a protocol to make sure this doesn’t cause problems • Cache Coherency Protocol

  20. Cache Coherency • Intel use the MESI Protocol • Modified • 1 cache contains a modified copy of the location • Exclusive • 1 cache contains a un-modified copy of the location • Shared • 2 or more caches contain un-modified copies of the location • Invalid • Another cache contains a modified copy of the location

  21. MESI • What happens if a core has an Invalid entry but tries to access it ? • The cache with the Modified entry needs to write the entry back to memory and become a Shared entry • This means an Intel Pentium D has to involve the FSB in all of it’s cache coherency updates even though they are on the same die

  22. FSB L2 Cache Bus Control L2 Control Core 1 Core 2 Intel Core • Core architecture is designed for dual-core • Cache is now shared between cores • Cache coherency is now between each L1 and L2

  23. Intel Core 2 Quad • Current quad core design adds two dual core designs together • Cache coherency between dies again happens over the FSB

  24. AMD K8 Architecture • Integrated Memory Controller • Very low latency • CPU determines memory technology • Requires both CPU and Motherboard to be changed for a new type of memory • Athlon 64, Socket 754 • Single channel, 64bit DDR at 200MHz • 8 * 2 * 200 = 3.2GB/s

  25. AMD K8 Memory Interfaces • Athlon 64, Socket 939 • Dual channel, 64bit DDR at 200MHz • 2 * 8 * 2 * 200 = 6.4 GB/s • Athlon 64, Socket AM2 • Dual channel, 64bit DDR2 at 400MHz • 2 * 8 * 2 * 400 = 12.8 GB/s • Opteron, Socket 940 • Dual channel, 64bit DDR at 200MHz • 2 * 8 * 2 * 200 = 6.4 GB/s

  26. AMD K8 FSB • AMD use the HyperTransport technology • Don’t confuse with the very different Intel Hyper Threading Technology • Packet based, point to point link that provides the lowest possible latency • Full duplex bi-directional link • Available in 2, 4, 8, 16 or 32 bits wide • 50MHz – 1.4GHz clock rates • Clock rates and bit widths can be asymmetric • Double-pumped data rate • 1.4GHz * 2 * 4 = 11.2GB/s per direction

  27. Athlon 64 FSB • The Athlon 64 has a single HyperTransport link to connect to I/O subsystem • 16bit, bi-directional, DDR at 800MHz (754) or 1GHz (939) • 2 * 2 * 2 * 1000 = 8GB/s • Can have either a single chip solution or stay with two chip, northbridge, southbridge combination

  28. Opteron • Uses HT for both the I/O interconnect and CPU interconnect • CPU interconnect requires an additional Cache Coherency protocol addition to HT • Come in three variants • 1xx – Memory controller and 3 HT links • 2xx – Memory controller and 3 HT links • 8xx – Memory controller and 3 HT links

  29. Opteron • What makes them different ? • Number of HT busses that support the CC protocol • 1xx – Zero • 2xx – One • 8xx – Three • This allows you to scale to different numbers of processors • 1xx – 1 way, 2xx – 2way, 8xx – 4 or 8 way

  30. AMD MOESI • Cache Coherency slightly different to Intel’s • Owner – This cache owns this memory location and it (not memory) services all requests for it from other caches • The request goes across the high speed dedicated HT bus

  31. AMD Dual Core • In the dual core situation, it becomes even better:

  32. AMD Dual Core • Requests now go across the System Request Interface • This runs at CPU core frequency • System Request Interface also controls HT – Memory access as well as HT – HT communication in 2xx and 8xx Opterons

  33. AMD Quad Core • AMD again targeted native quad core instead of dual-die, dual core • Introduced shared L3 cache

More Related