Kit cischke 09 09 08 cs 5090
1 / 33

Disco: Running Commodity Operating Systems on Scalable Multiprocessors - PowerPoint PPT Presentation

  • Uploaded on

Kit Cischke 09/09/08 CS 5090. Disco: Running Commodity Operating Systems on Scalable Multiprocessors. Overview. Background What are we doing here? A Return to Virtual Machine Monitors What does Disco do? Disco: A Return to VMMs How does Disco do it? Experimental Results

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Disco: Running Commodity Operating Systems on Scalable Multiprocessors' - shaquana

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Kit cischke 09 09 08 cs 5090

Kit Cischke


CS 5090

Disco: Running Commodity Operating Systems on Scalable Multiprocessors


  • Background

    • What are we doing here?

  • A Return to Virtual Machine Monitors

    • What does Disco do?

  • Disco: A Return to VMMs

    • How does Disco do it?

  • Experimental Results

    • How well does Disco dance?

The basic problem
The Basic Problem

  • With the explosion of multiprocessor machines , especially of the NUMA variety, the problem of effectively using the machines becomes more immediate.

    • NUMA = Non-Uniform Memory Access – shows up a lot in clusters.

    • The authors point out that the problem applies to any major hardware innovation, not just multiprocessors.

Potential solution
Potential Solution

  • Solution: Rewrite the operating system to address fault-tolerance and scalability.

  • Flaws:

    • Rewriting will introduce bugs.

    • Bugs can disrupt the system or the applications.

    • Instabilities are usually less-tolerated on these kinds of systems because of their application space.

    • You may not have access to the OS.

Not so good
Not So Good

  • Okay. So that wasn’t so good. What else do we have?

  • How about Virtual Machine Monitors?

  • A new twist on an old idea, which may work better now that we have faster processors.

Enter disco
Enter Disco

Disco is a system VM that presents a similar fundamental machine to all of the various OS’s that might be running on the machine.

These can be commodity OS’s, uniprocessor, multiprocessor or specialty systems.

Disco vmm
Disco VMM

  • Fundamentally, the hardware is a cluster, but Disco introduces some global policies to manage all of the resources, which makes for better usage of the hardware.

  • We’ll use commodity operating systems and write the VMM. Rather than millions of lines of code, we’ll write a few thousand.

  • What if the resource needs exceed that of the commodity OS?


  • Very simple changes to the commodity OS (maybe on the driver level or kernel extension) can allow virtual machines to share resources.

    • E.g., a parallel database could have a cache in shared memory and multiple virtual processors running on virtual machines.

  • Support for specialized OS’s that need the power of multiple processors but not all of the features offered by a commodity OS.

Further benefits
Further Benefits

  • Multiple copies of an OS naturally addresses scalability and fault containment.

    • Need greater scaling? Add a VM.

    • Only the monitor and the system protocols (NFS, etc.) need to scale.

    • OS or application crashes? No problem. The rest of the system is isolated.

  • NUMA memory management issues are addressed.

  • Multiple versions of different OS’s provide legacy support and convenient upgrade paths.

Not all sunshine roses
Not All Sunshine & Roses

  • VMM Overhead

    • Additional exception processing, instruction execution and memory to virtualize hardware.

    • Privileged instructions aren’t directly executed on the hardware, so we need to fake it. I/O requests need to be intercepted and remapped.

    • Memory overhead is rough too.

      • Consider having 6 copies of Vista in memory simultaneously.

  • Resource Management

    • VMM can’t make intelligent decisions about code streams without info from OS.

One last disadvantage
One Last Disadvantage

  • Communication

    • Sometimes resources simply can’t be shared the way we want.

  • Most of these can be mitigated though.

    • For example, most operating systems have good NFS support. So use it.

      • But… We can make it even better! (Details forthcoming.)

Introducing disco
Introducing Disco

  • VMM designed for the FLASH multiprocessor machine

    • FLASH is an academic machine designed at Stanford University

    • Is a collection of nodes containing a processor, memory, and I/O. Use directory cache coherence which makes it look like a CC-NUMA machine.

    • Has also been ported to a number of other machines.

Disco s interface
Disco’s Interface

  • The virtual CPU of Disco is an abstraction of a MIPS R10000.

    • Not only emulates but extends (e.g., reduces some kernel operations to simple load/store instructions.

  • A presented abstraction of physical memory starting at address 0 (zero).

  • I/O Devices

    • Disks, network interfaces, interrupts, clocks, etc.

    • Special interfaces for network and disks.

Disco s implementation
Disco’s Implementation

  • Implemented as a multi-threaded shared-memory program.

    • Careful attention paid to memory placement, cache-aware data structures and processor communication patterns.

  • Disco is only 13,000 lines of code.

    • Windows Server 2003 - ~50,000,000

    • Red Hat 7.1 - ~ 30,000,000

    • Mac OS X 10.4 - ~86,000,000

Disco s implementation1
Disco’s Implementation

  • The execution of a virtual processor is mapped one-for-one to a real processor.

    • At each context switch, the state of a processor is made to be that of a VP.

  • On MIPS, Disco runs in kernel mode and puts the processor in appropriate modes for what’s being run

    • Supervisor mode for OS, user mode for apps

  • Simple scheduler allows VP’s to be time-shared across the physical processors.

Disco s implementation2
Disco’s Implementation

  • Virtual Physical Memory

    • This discussion goes on for 1.5 pages. To sum up:

    • The OS makes requests to physical addresses, and Disco translates them to machine addresses.

    • Disco uses the hardware TLB for this.

    • Switching a different VP onto a new processor requires a TLB flush, so Disco maintains a 2nd-level TLB to offset the performance hit.

    • There’s a technical issue with TLBs, Kernel space and the MIPS processor that threw them for a loop.

Numa memory management
NUMA Memory Management

In an effort to mitigate the non-uniform effects of a NUMA machine, Disco does a bunch of stuff:

Allocating as much memory to have “affinity” to a processor as possible.

Migrates or replicates pages across virtual machines to reduce long memory accesses.

Virtual i o devices
Virtual I/O Devices

  • Obviously Disco needs to intercept I/O requests and direct them to the actual device.

  • Primarily handled by installing drivers for Disco I/O in the guest OS.

  • DMA provides an interesting challenge, in that the DMA addresses need the same translation as regular accesses.

  • However, we can do some especially cool things with DMA requests to disk.

Copy on write disks
Copy-on-Write Disks

  • All disk DMA requests are caught and analyzed. If the data is already in memory, we don’t have to go to disk for it.

  • If the request is for a full page, we just update a pointer in the requesting virtual machine.

  • So what?

    • Multiple VM’s can share data without being aware of it. Only modifying the data causes a copy to be made.

    • Awesome for scaling up apps by using multiple copies of an OS. Only really need one copy of the OS kernel, libraries, etc.

My favorite networking
My Favorite – Networking

  • The Copy-on-write disk stuff is great for non-persistent disks. But what about persistent ones? Let’s just use NFS.

  • But here’s a dumb thing: A VM has a copy of information it wants to send to another VM on the same physical machine. In a naïve approach, we’d let that data be duplicated, taking up extra memory pointlessly.

  • So, let’s use copy-on-write for our network interface too!

Virtual network interface
Virtual Network Interface

  • Disco provides a virtual subnet for VM’s to talk to each other.

  • This virtual device is Ethernet-like, but with no maximum transfer size.

  • Transfers are accomplished by updating pointers rather than actually copying data (until absolutely necessary).

  • The OS sends out the requests as NFS requests.

  • “Ah,” but you say. “What about the data locality as a VM starts accessing those files and memory?”

    • Page replication and migration!

About those commodity os s
About those Commodity OS’s

  • So what do we really need to do to get these commodity operating systems running on Disco?

  • Surprisingly a lot and a little.

    • Minor changes were needed to IRIX’s HAL, amounting to 2 header files and 15 lines of assembly code. This did lead to a full kernel recompile though.

    • Disco needs device drivers. Let’s just steal them from IRIX!

    • Don’t trap on every privileged register access. Convert them into normal loads/stores to special address space, linked to the privileged registers.

More patching
More Patching

  • “Hinting” added to HAL to help the VMM not do dumb things (or at least do fewer dumb things).

  • When the OS goes idle, the MIPS (usually) defaults to a low power mode. Disco just stops scheduling the VM until something interesting happens.

  • Other minor things were done, but that required patching the kernel.


  • Some high-performance apps might need most or all of the machine. The authors wrote a “thin” operating system to run SPLASH-2 applications.

  • Mostly proof-of-concept.

Experimental results
Experimental Results

  • Bad Idea: Target your software for a machine that doesn’t physically exist.

    • Like, I don’t know, FLASH?

  • Disco was validated using two alternatives:

    • SimOS

    • SGI Origin2000 Board that will form the basis of FLASH

Experimental design
Experimental Design

  • Use 4 representative workloads for parallel applications:

    • Software Development (Pmake of a large app)

    • Hardware Development (Verilog simulator)

    • Scientific Computing (Raytracing and a sorting algorithm)

    • Commercial Database (Sybase)

  • Not only are they representative, but they each have characteristics that are interesting to study

    • For example, Pmake is multiprogrammed, lots of short-lived processes, OS & I/O intensive.

Simplest results graph
Simplest Results Graph

Overhead of Disco is pretty modest compared to the uniprocessor results.

Raytrace is the lowest, at only 3%. Pmake is the highest, at 16%.

The main hits come from additional traps and TLB misses (from all the flushing Disco does).

Interestingly, less time is spent in the kernel in Raytrace, Engineering and Database.

Running a 64-bit system mitigates the impact of TLB misses.

Memory utilization
Memory Utilization

Key thing here is how 8 VM’s doesn’t require 8x the memory of 1 VM.

Interestingly, we have 8 copies of IRIX running in less than 256 MB of physical RAM!


Page migration and replication were disabled for these runs.

All use 8 processors and 256 MB of memory.

IRIX has a terrible bottleneck in synchronizing the system’s memory management code

It also has a “lazy” evaluation policy in the virtual memory system that drags “normal” RADIX down.

Overall though, check out those performance gains!

Page migration benefits
Page Migration Benefits

The 100% UMA results give a lower bound on performance gains from page migration and replication.

But in short, the policies work great.

Real hardware
Real Hardware

  • Experiences on the real SGI hardware pretty much confirms the simulations, at least at the uniprocessor level.

  • Overheads tend to be in the range of 3-8% on Pmake and the Engineering simulation.

Summing up
Summing Up

  • Disco works pretty well.

  • Memory usage scales well, processor utilization scales well.

  • Performance overheads are relatively small for most loads.

  • Lots of engineering challenges, but most seem to have been overcome.

Final thoughts
Final Thoughts

  • Everything in this paper seems, in retrospect, to be totally obvious. However, the combination of all of these factors seems like it would have taken just a ton of work.

  • Plus, I don’t think I could have done it half as well, to be honest.

  • Targeting a non-existent machine seems a little silly.

  • Overall, interesting paper.