A Case for a Fault-Tolerant Virtual Machine. Andrey Ermolinskiy Mohit Chawla. Objectives. Building a fault tolerant Component to – Eliminate the problem of liveness ambiguity in a distributed system Prevent loss of state.
Building block that simplifies development of fault tolerant distributed applications and services
GFS, Bigtable use chubby for maintenance of critical application state
We are basically extending the idea one step further. We make the VMM layer handle liveness detection, failure recovery and state replication. And the user is provided with a highly fault tolerant component
- Finalize the design of how to select the primary in case a VM fails/slows down.
- Basically which approach to use to determine the liveness of the Primary (Heartbeats vs NIC approach)
- Come up with a workload for the system
- Either instrument Xen or run some daemons
End of the Semester
- Clear idea of a good strategy to select a primary.
- Analysis of the timeout selection in the simulated datacenter environment
- And see whether the approach really works and what is the overhead of transferring the snapshot