A Case for a Fault-Tolerant Virtual Machine

A Case for a Fault-Tolerant Virtual Machine Andrey Ermolinskiy Mohit Chawla

Objectives • Building a fault tolerant Component to – • Eliminate the problem of liveness ambiguity in a distributed system • Prevent loss of state • For Example, A Fault Tolerant lock manager node will make distributed locking nearly trivial.

State of the Art CHUBBY Building block that simplifies development of fault tolerant distributed applications and services GFS, Bigtable use chubby for maintenance of critical application state We are basically extending the idea one step further. We make the VMM layer handle liveness detection, failure recovery and state replication. And the user is provided with a highly fault tolerant component

Novelty • We are moving the complexity of fault tolerance from the application into the virtual machine • FTVM appears to the external user as a single routable IP address and can be used to deploy critical component of a distributed system

Risks • We may have to consider a simple case with only two virtual machines in total. • Getting more machines is one of the problems. • Modifying Xen for optimization with respect to the snapshot size may involve lot of time.

Plan Week 10 - Finalize the design of how to select the primary in case a VM fails/slows down. - Basically which approach to use to determine the liveness of the Primary (Heartbeats vs NIC approach) Week 12 - Come up with a workload for the system - Either instrument Xen or run some daemons End of the Semester - Clear idea of a good strategy to select a primary. - Analysis of the timeout selection in the simulated datacenter environment - And see whether the approach really works and what is the overhead of transferring the snapshot

A Case for a Fault-Tolerant Virtual Machine

A Case for a Fault-Tolerant Virtual Machine

Presentation Transcript

Fault-Tolerant Broadcast

F10: A Fault-Tolerant Engineered Network

Fault-Tolerant Broadcast

Fault-Tolerant CORBA

FAULT TOLERANT CORBA

Fault Tolerant MPI

Implementing Fault-Tolerant Services Using the State Machine Approach: A Tutorial

FTMP: A Fault-Tolerant Multicast Protocol

A Multi-Protocols Fault Tolerant MPI

Fault Tolerant Configuration

Fault-tolerant Control

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

A Survey of Fault Tolerant Methodologies for FPGA’s

A Fault Tolerant Protocol for Massively Parallel Machines

Fault-Tolerant State Machine Replication

fault-tolerant

A Fault-tolerant Architecture for Quantum Hamiltonian Simulation

Fault-tolerant routing

Fault-Tolerant Consensus