kit cischke 09 09 08 cs 5090
Download
Skip this Video
Download Presentation
Disco: Running Commodity Operating Systems on Scalable Multiprocessors

Loading in 2 Seconds...

play fullscreen
1 / 33

Disco: Running Commodity Operating Systems on Scalable Multiprocessors - PowerPoint PPT Presentation


  • 169 Views
  • Uploaded on

Kit Cischke 09/09/08 CS 5090. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. Overview. Background What are we doing here? A Return to Virtual Machine Monitors What does Disco do? Disco: A Return to VMMs How does Disco do it? Experimental Results

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Disco: Running Commodity Operating Systems on Scalable Multiprocessors' - shaquana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
overview
Overview
  • Background
    • What are we doing here?
  • A Return to Virtual Machine Monitors
    • What does Disco do?
  • Disco: A Return to VMMs
    • How does Disco do it?
  • Experimental Results
    • How well does Disco dance?
the basic problem
The Basic Problem
  • With the explosion of multiprocessor machines , especially of the NUMA variety, the problem of effectively using the machines becomes more immediate.
    • NUMA = Non-Uniform Memory Access – shows up a lot in clusters.
    • The authors point out that the problem applies to any major hardware innovation, not just multiprocessors.
potential solution
Potential Solution
  • Solution: Rewrite the operating system to address fault-tolerance and scalability.
  • Flaws:
    • Rewriting will introduce bugs.
    • Bugs can disrupt the system or the applications.
    • Instabilities are usually less-tolerated on these kinds of systems because of their application space.
    • You may not have access to the OS.
not so good
Not So Good
  • Okay. So that wasn’t so good. What else do we have?
  • How about Virtual Machine Monitors?
  • A new twist on an old idea, which may work better now that we have faster processors.
enter disco
Enter Disco

Disco is a system VM that presents a similar fundamental machine to all of the various OS’s that might be running on the machine.

These can be commodity OS’s, uniprocessor, multiprocessor or specialty systems.

disco vmm
Disco VMM
  • Fundamentally, the hardware is a cluster, but Disco introduces some global policies to manage all of the resources, which makes for better usage of the hardware.
  • We’ll use commodity operating systems and write the VMM. Rather than millions of lines of code, we’ll write a few thousand.
  • What if the resource needs exceed that of the commodity OS?
scalability
Scalability
  • Very simple changes to the commodity OS (maybe on the driver level or kernel extension) can allow virtual machines to share resources.
    • E.g., a parallel database could have a cache in shared memory and multiple virtual processors running on virtual machines.
  • Support for specialized OS’s that need the power of multiple processors but not all of the features offered by a commodity OS.
further benefits
Further Benefits
  • Multiple copies of an OS naturally addresses scalability and fault containment.
    • Need greater scaling? Add a VM.
    • Only the monitor and the system protocols (NFS, etc.) need to scale.
    • OS or application crashes? No problem. The rest of the system is isolated.
  • NUMA memory management issues are addressed.
  • Multiple versions of different OS’s provide legacy support and convenient upgrade paths.
not all sunshine roses
Not All Sunshine & Roses
  • VMM Overhead
    • Additional exception processing, instruction execution and memory to virtualize hardware.
    • Privileged instructions aren’t directly executed on the hardware, so we need to fake it. I/O requests need to be intercepted and remapped.
    • Memory overhead is rough too.
      • Consider having 6 copies of Vista in memory simultaneously.
  • Resource Management
    • VMM can’t make intelligent decisions about code streams without info from OS.
one last disadvantage
One Last Disadvantage
  • Communication
    • Sometimes resources simply can’t be shared the way we want.
  • Most of these can be mitigated though.
    • For example, most operating systems have good NFS support. So use it.
      • But… We can make it even better! (Details forthcoming.)
introducing disco
Introducing Disco
  • VMM designed for the FLASH multiprocessor machine
    • FLASH is an academic machine designed at Stanford University
    • Is a collection of nodes containing a processor, memory, and I/O. Use directory cache coherence which makes it look like a CC-NUMA machine.
    • Has also been ported to a number of other machines.
disco s interface
Disco’s Interface
  • The virtual CPU of Disco is an abstraction of a MIPS R10000.
    • Not only emulates but extends (e.g., reduces some kernel operations to simple load/store instructions.
  • A presented abstraction of physical memory starting at address 0 (zero).
  • I/O Devices
    • Disks, network interfaces, interrupts, clocks, etc.
    • Special interfaces for network and disks.
disco s implementation
Disco’s Implementation
  • Implemented as a multi-threaded shared-memory program.
    • Careful attention paid to memory placement, cache-aware data structures and processor communication patterns.
  • Disco is only 13,000 lines of code.
    • Windows Server 2003 - ~50,000,000
    • Red Hat 7.1 - ~ 30,000,000
    • Mac OS X 10.4 - ~86,000,000
disco s implementation1
Disco’s Implementation
  • The execution of a virtual processor is mapped one-for-one to a real processor.
    • At each context switch, the state of a processor is made to be that of a VP.
  • On MIPS, Disco runs in kernel mode and puts the processor in appropriate modes for what’s being run
    • Supervisor mode for OS, user mode for apps
  • Simple scheduler allows VP’s to be time-shared across the physical processors.
disco s implementation2
Disco’s Implementation
  • Virtual Physical Memory
    • This discussion goes on for 1.5 pages. To sum up:
    • The OS makes requests to physical addresses, and Disco translates them to machine addresses.
    • Disco uses the hardware TLB for this.
    • Switching a different VP onto a new processor requires a TLB flush, so Disco maintains a 2nd-level TLB to offset the performance hit.
    • There’s a technical issue with TLBs, Kernel space and the MIPS processor that threw them for a loop.
numa memory management
NUMA Memory Management

In an effort to mitigate the non-uniform effects of a NUMA machine, Disco does a bunch of stuff:

Allocating as much memory to have “affinity” to a processor as possible.

Migrates or replicates pages across virtual machines to reduce long memory accesses.

virtual i o devices
Virtual I/O Devices
  • Obviously Disco needs to intercept I/O requests and direct them to the actual device.
  • Primarily handled by installing drivers for Disco I/O in the guest OS.
  • DMA provides an interesting challenge, in that the DMA addresses need the same translation as regular accesses.
  • However, we can do some especially cool things with DMA requests to disk.
copy on write disks
Copy-on-Write Disks
  • All disk DMA requests are caught and analyzed. If the data is already in memory, we don’t have to go to disk for it.
  • If the request is for a full page, we just update a pointer in the requesting virtual machine.
  • So what?
    • Multiple VM’s can share data without being aware of it. Only modifying the data causes a copy to be made.
    • Awesome for scaling up apps by using multiple copies of an OS. Only really need one copy of the OS kernel, libraries, etc.
my favorite networking
My Favorite – Networking
  • The Copy-on-write disk stuff is great for non-persistent disks. But what about persistent ones? Let’s just use NFS.
  • But here’s a dumb thing: A VM has a copy of information it wants to send to another VM on the same physical machine. In a naïve approach, we’d let that data be duplicated, taking up extra memory pointlessly.
  • So, let’s use copy-on-write for our network interface too!
virtual network interface
Virtual Network Interface
  • Disco provides a virtual subnet for VM’s to talk to each other.
  • This virtual device is Ethernet-like, but with no maximum transfer size.
  • Transfers are accomplished by updating pointers rather than actually copying data (until absolutely necessary).
  • The OS sends out the requests as NFS requests.
  • “Ah,” but you say. “What about the data locality as a VM starts accessing those files and memory?”
    • Page replication and migration!
about those commodity os s
About those Commodity OS’s
  • So what do we really need to do to get these commodity operating systems running on Disco?
  • Surprisingly a lot and a little.
    • Minor changes were needed to IRIX’s HAL, amounting to 2 header files and 15 lines of assembly code. This did lead to a full kernel recompile though.
    • Disco needs device drivers. Let’s just steal them from IRIX!
    • Don’t trap on every privileged register access. Convert them into normal loads/stores to special address space, linked to the privileged registers.
more patching
More Patching
  • “Hinting” added to HAL to help the VMM not do dumb things (or at least do fewer dumb things).
  • When the OS goes idle, the MIPS (usually) defaults to a low power mode. Disco just stops scheduling the VM until something interesting happens.
  • Other minor things were done, but that required patching the kernel.
splashos
SPLASHOS
  • Some high-performance apps might need most or all of the machine. The authors wrote a “thin” operating system to run SPLASH-2 applications.
  • Mostly proof-of-concept.
experimental results
Experimental Results
  • Bad Idea: Target your software for a machine that doesn’t physically exist.
    • Like, I don’t know, FLASH?
  • Disco was validated using two alternatives:
    • SimOS
    • SGI Origin2000 Board that will form the basis of FLASH
experimental design
Experimental Design
  • Use 4 representative workloads for parallel applications:
    • Software Development (Pmake of a large app)
    • Hardware Development (Verilog simulator)
    • Scientific Computing (Raytracing and a sorting algorithm)
    • Commercial Database (Sybase)
  • Not only are they representative, but they each have characteristics that are interesting to study
    • For example, Pmake is multiprogrammed, lots of short-lived processes, OS & I/O intensive.
simplest results graph
Simplest Results Graph

Overhead of Disco is pretty modest compared to the uniprocessor results.

Raytrace is the lowest, at only 3%. Pmake is the highest, at 16%.

The main hits come from additional traps and TLB misses (from all the flushing Disco does).

Interestingly, less time is spent in the kernel in Raytrace, Engineering and Database.

Running a 64-bit system mitigates the impact of TLB misses.

memory utilization
Memory Utilization

Key thing here is how 8 VM’s doesn’t require 8x the memory of 1 VM.

Interestingly, we have 8 copies of IRIX running in less than 256 MB of physical RAM!

scalability1
Scalability

Page migration and replication were disabled for these runs.

All use 8 processors and 256 MB of memory.

IRIX has a terrible bottleneck in synchronizing the system’s memory management code

It also has a “lazy” evaluation policy in the virtual memory system that drags “normal” RADIX down.

Overall though, check out those performance gains!

page migration benefits
Page Migration Benefits

The 100% UMA results give a lower bound on performance gains from page migration and replication.

But in short, the policies work great.

real hardware
Real Hardware
  • Experiences on the real SGI hardware pretty much confirms the simulations, at least at the uniprocessor level.
  • Overheads tend to be in the range of 3-8% on Pmake and the Engineering simulation.
summing up
Summing Up
  • Disco works pretty well.
  • Memory usage scales well, processor utilization scales well.
  • Performance overheads are relatively small for most loads.
  • Lots of engineering challenges, but most seem to have been overcome.
final thoughts
Final Thoughts
  • Everything in this paper seems, in retrospect, to be totally obvious. However, the combination of all of these factors seems like it would have taken just a ton of work.
  • Plus, I don’t think I could have done it half as well, to be honest.
  • Targeting a non-existent machine seems a little silly.
  • Overall, interesting paper.
ad