1 / 6

A Case for a Fault-Tolerant Virtual Machine

A Case for a Fault-Tolerant Virtual Machine. Andrey Ermolinskiy Mohit Chawla. Objectives. Building a fault tolerant Component to – Eliminate the problem of liveness ambiguity in a distributed system Prevent loss of state.

celerina
Download Presentation

A Case for a Fault-Tolerant Virtual Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Case for a Fault-Tolerant Virtual Machine Andrey Ermolinskiy Mohit Chawla

  2. Objectives • Building a fault tolerant Component to – • Eliminate the problem of liveness ambiguity in a distributed system • Prevent loss of state • For Example, A Fault Tolerant lock manager node will make distributed locking nearly trivial.

  3. State of the Art CHUBBY Building block that simplifies development of fault tolerant distributed applications and services GFS, Bigtable use chubby for maintenance of critical application state We are basically extending the idea one step further. We make the VMM layer handle liveness detection, failure recovery and state replication. And the user is provided with a highly fault tolerant component

  4. Novelty • We are moving the complexity of fault tolerance from the application into the virtual machine • FTVM appears to the external user as a single routable IP address and can be used to deploy critical component of a distributed system

  5. Risks • We may have to consider a simple case with only two virtual machines in total. • Getting more machines is one of the problems. • Modifying Xen for optimization with respect to the snapshot size may involve lot of time.

  6. Plan Week 10 - Finalize the design of how to select the primary in case a VM fails/slows down. - Basically which approach to use to determine the liveness of the Primary (Heartbeats vs NIC approach) Week 12 - Come up with a workload for the system - Either instrument Xen or run some daemons End of the Semester - Clear idea of a good strategy to select a primary. - Analysis of the timeout selection in the simulated datacenter environment - And see whether the approach really works and what is the overhead of transferring the snapshot

More Related