Experiences with Formal Specifications of Fault-Tolerant File Systems

Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu (University of Washington) Andrew Birrell(Microsoft Research) John MacCormick(Dickinson College)

Fault-Tolerant File Systems (FTFSs) • FTFSs are crucial components in today’s datacenters • They underlie most of what we do on the Web • Dependability & correctness of FTFSs are paramount Web services Google Earth Google Analytics Amazon services Google File System (GFS) Niobe Dynamo 2

FTFSs Are Extremely Complex • Contain sophisticated protocols for: • replica consistency, • recovery (replica addition to compensate for failures), • reconfiguration (replica removal due to failure), • load balancing, etc. • Hence, FTFS protocols and implementation are hard to get right

Formal Methods (FM) • Formal methods have been used extensively to increase trust in complex systems • Formal specification languages are unambiguous • Model checking and formal proofs are reliable • However, FTFS designers still rely solely on prose and intuitive reasoning • Prose may be ambiguous, inaccurate • Intuitive reasoning may be faulty

FTFS Design and Analysis Challenges Without formal methods, it is hard to: • Understand FTFS behavior and semantics • Intuitive reasoning is hard and error-prone • Explore alternative designs • Alternative designs may affect semantics in complex ways • Compare various FTFSs • Prose is ambiguous and code bases are huge (tens of thousands of lines of code)

Goal: Convince FTFS Builders to Use FM • Previous studies showed how and for what purposes to use FM for many classes of systems, e.g.: • Local/distributed FSs, processor caches, TCP congestion Our work: • Shows how and forwhat purposes to use FM for another specific class of important systems: fault-tolerant file systems • Identifies convenient ways in which FM help in understanding, designing & comparing FTFSs

Our Experience • We wrote TLA+ specifications for three protocols: • Chain replication (Cornell University) • Niobe (Microsoft) • GFS (Google) • Our experience shows that FM help solve FTFS challenges: • Comparing system mechanisms & tradeoffs • Understanding and proving semantics • Exploring alternative designs 7

Outline Specification effort Experiences with formal specifications for FTFS: Comparing system mechanisms Understanding and proving semantics Exploring alternative designs Conclusions 8

Specification Effort Question: How hard is it to build specifications? Answer: Moderately precise specifications are reasonably easy to produce

1. Comparing System Mechanisms • Case study: GFS vs. Niobe • From prose, they seemed very different systems • GFS: trades some consistency for throughput • Niobe: designed for strong consistency • Our TLA+ specifications highlight significant mechanism overlap and also key differences 11

Capturing Similarities & Differences • More than half of the TLA+ code-base is common • Specifications are small due to TLA+ expressiveness • Compare their total sizes to the tens of thousands of LOC of the systems’ implementations single-master, primary-secondary replication Common (291 lines) Niobe GFS (189 lines) (287 lines)

Differences Stand Out Clearly in TLA+ • Example: Write completion in GFS and Niobe w w 1 4 w w ACK ACK 2 3

Differences Stand Out Clearly in TLA+ • Example: Write completion in GFS and Niobe w w w w 1 4 1 4 w w Group reconfiguration w w ACK ACK ACK ACK 2 3 2 3

Understanding Tradeoffs • Example: Write completion in GFS and Niobe Tradeoff: Smaller latency, but writes may leave group inconsistent A write never leaves replica group in inconsistent state

Lesson: Formalism Helps in Comparison • Formal specifications distill key differences and similarities between systems • Understanding the key differences enables us to understand tradeoffs

2. Understanding FTFS Consistency • Hard to prove consistency models for FTFSs • For weakly consistent systems, it can be even harder • Solution: use refinement mapping • Reduce system to a really simple model • Prove the correctness of the reduction • Reason about the SimpleStore • For convenience, we use model-checking instead of full manual proofsat Step 2 SimpleStore consistency model consistency model reduction System

SimpleStores for the Three FTFSs • SimpleStores capture only client-visible behaviors and abstract out all protocol mechanisms • SimpleStores are easy to reason about Chain_SS Niobe_SS GFS_SS reduction reduction reduction Chain Chain Blue Niobe GFS

Chain’s Consistency Semantics Chain_SS linearizable linearizable Proof is straightforward (half a page) reduction Chain • Using convenient methods, we gained reliable insight into Chain’s consistency model 20

Niobe’s Consistency Semantics Chain_SS Niobe_SS GFS_SS linearizable linearizable reduction reduction ?? Chain Niobe GFS linearizable linearizable • Similar experience as with Chain • Thus, formal methods help in verifying standard consistency models for strongly-consistent FTFSs 21

GFS’ Consistency Semantics • Formal methods proved helpful in several ways An interesting conclusion (details in the paper): • Using refinement mappings, we were able to show that, under a small set of assumptions, GFS has regular-register semantic GFS_SS well-defined intermediate-level consistency model regular register reduction GFSassumptions regular register

Lesson: Formalism Helps Understand Semantics • Refinement mappings help in understanding & reliably verifying consistency models of FTFS • They are useful for both strongly consistent and weakly consistent FTFSs

3. Exploring Alternative Designs • Exploring alternative designs is much easier using our framework (TLA+ specs, SimpleStores, reductions) System SimpleStore reduction System model 25

Case-Study: Changing Niobe’s Design • Currently, Niobe’s clients read from primary only • Reading from any replica may improve throughput • Design question: What happens to Niobe if it adopts read-any policy? Niobe_SS GFS_SS linearizable regular register Nioberead-any GFSassumption Chain ? regular register regular register

Conclusions • FTFSs are extremely important in today’s Web • We showed how formalmethods can help improve our understanding and trust in FTFSs Lessons from our experience with three FTFSs: • Writing formal specifications is relatively easy • Formal methods enable: • Insightful comparison of mechanisms & tradeoffs • Reliable verification of consistency properties • Convenient investigation of alternativedesigns

Appendix 28

Related Work • FM are extensively used to reason about software [Bickford, et.al., 96] and hardware [Shimizu, et.al., 02] • However, FTFS builders have not adopted them yet • By sharing our experience, we hope to convince FTFS builders of the utility of specifying their systems formally • Using FM to improve understanding and trust in systems: • Previous works apply FM to various classes of systems: [Chkliaev, et.al., 00], [Crow, et.al., 98], [Joshi, et.al., 03], [Houston, et.al., 91] • The closest works are those looking at distributed FS (AFS, Coda) [Sivathanu, et.al., 05], [Wing, et.al., 97], [Yang, et.al., 04] • We show how to apply them in the specific context of FTFS • Reducing complex systems to simple ones in order to reason about semantics has been used before [Joshi, et.al., 03] • We apply this method to FTFSs

GFS Assumptions If: • A write never crosses chunk boundaries • GFS client library offers chunk-level operations • A write never goes to a stale replica • Implement this assumption using a lease mechanism Then: GFS_SS regular register reduction GFSassumptions regular register

Standard Consistency Models • Linearizability (Atomic register semantic) • Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S • The sequential interleaving S preserves thereal-time ordering of operations from H • Serializability • Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S • Regular register semantic • Read not concurrent with any write returns most recently written value • Read concurrent with some writes returns either the value of the in-process writes or the most recently written value • Safe register semantic • Read not concurrent with any write returns most recently written value • Read concurrent with some writes can return anything

Summary of Contributions • Identified a new important class of extremely complex systems: FTFSs • Showed three aspects of FTFS design & analysis for which FM prove especially valuable • Mechanism comparison, semantics understanding, and design space exploration • Showed how to apply specific FMs to FTFSs • Showed how to construct SimpleStores and what can be learned from them • SimpleStores are reusable between systems • We believe that our study, tailored toward FTFSs, can be more relevant to FTFS designers than more general studies 32

Lessons from Our Experience • Building high-level specifications for FTFS is relatively easy • It is also remarkably useful for understanding system • The exercise of writing specifications exposes similarities in seemingly dissimilar systems (GFS, Niobe) • Formal specifications also distill the key design differences • Specifications enable convenient verifications of consistency for both strongly and weakly consistent systems • Niobe and Chain are both linearizable • GFS can be upgraded to regular register via a clear set of assumptions • GFS’ design to read from any replica heavily influences its consistency • Intuition can fail often times • Niobe seemed to be reducible to Chain_SS, but actually was not

Chain SimpleStore Responses Requests reads writes writes reads read channel r1 r2 r3 read() SerialDB w6 commit(w5) w7 w5 write channel drop(w7) Chain_SS 34

The Temporal Logic of Actions (TLA+) • Formalism that combines a temporal logic with a logic of actions • Especially designed for specification of distributed asynchronous systems • TLA+ specifications model the system as a state machine: • Define system variables (state) • Model actions that the system can take as state transitions 35

Understanding Tradeoffs Smaller write latency, but writes may leave group inconsistent A write never leaves replica group in inconsistent state Old value Error 4 1 read 4 read 1 2 3 2 3

Experiences with Formal Specifications of Fault-Tolerant File Systems

Experiences with Formal Specifications of Fault-Tolerant File Systems

Presentation Transcript

Chapter Fault Tolerant Design of Digital Systems

CprE 545: FAULT-TOLERANT SYSTEMS

Fault Tolerant Distributed Systems

CprE 545: FAULT-TOLERANT SYSTEMS

Distributed systems II Fault-Tolerant AGREEMENT

CprE 545: Fault Tolerant Systems

CprE 545: Fault Tolerant Systems

Distributed systems II Fault-Tolerant AGREEMENT

CprE 545: FAULT-TOLERANT SYSTEMS

CprE 545: FAULT-TOLERANT SYSTEMS

Fault Tolerant Design of Distributed Automotive Systems

Experiences with Formal Specifications of Fault-Tolerant File Systems

CprE 545: FAULT-TOLERANT SYSTEMS

CprE 545: FAULT-TOLERANT SYSTEMS

CprE 545: Fault Tolerant Systems

Distributed systems II Fault-Tolerant Broadcast

FAULT-TOLERANT NETWORKS AND FAULT-TOLERANT ROUTING

fault-tolerant

Distributed systems II Fault-Tolerant AGREEMENT

Formal Modelling and Analysis of Business Information Systems with Fault Tolerant Middleware

Distributed systems II Fault-Tolerant AGREEMENT

Fault-Tolerant Computing Systems #2 Hardware Fault Tolerance