1 / 36

Experiences with Formal Specifications of Fault-Tolerant File Systems

Experiences with Formal Specifications of Fault-Tolerant File Systems. Roxana Geambasu (University of Washington) Andrew Birrell (Microsoft Research) John MacCormick (Dickinson College). Fault-Tolerant File Systems (FTFSs). FTFSs are crucial components in today’s datacenters

davisa
Download Presentation

Experiences with Formal Specifications of Fault-Tolerant File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Experiences with Formal Specifications of Fault-Tolerant File Systems Roxana Geambasu (University of Washington) Andrew Birrell(Microsoft Research) John MacCormick(Dickinson College)

  2. Fault-Tolerant File Systems (FTFSs) • FTFSs are crucial components in today’s datacenters • They underlie most of what we do on the Web • Dependability & correctness of FTFSs are paramount Web services Google Earth Google Analytics Amazon services Google File System (GFS) Niobe Dynamo 2

  3. FTFSs Are Extremely Complex • Contain sophisticated protocols for: • replica consistency, • recovery (replica addition to compensate for failures), • reconfiguration (replica removal due to failure), • load balancing, etc. • Hence, FTFS protocols and implementation are hard to get right

  4. Formal Methods (FM) • Formal methods have been used extensively to increase trust in complex systems • Formal specification languages are unambiguous • Model checking and formal proofs are reliable • However, FTFS designers still rely solely on prose and intuitive reasoning • Prose may be ambiguous, inaccurate • Intuitive reasoning may be faulty

  5. FTFS Design and Analysis Challenges Without formal methods, it is hard to: • Understand FTFS behavior and semantics • Intuitive reasoning is hard and error-prone • Explore alternative designs • Alternative designs may affect semantics in complex ways • Compare various FTFSs • Prose is ambiguous and code bases are huge (tens of thousands of lines of code)

  6. Goal: Convince FTFS Builders to Use FM • Previous studies showed how and for what purposes to use FM for many classes of systems, e.g.: • Local/distributed FSs, processor caches, TCP congestion Our work: • Shows how and forwhat purposes to use FM for another specific class of important systems: fault-tolerant file systems • Identifies convenient ways in which FM help in understanding, designing & comparing FTFSs

  7. Our Experience • We wrote TLA+ specifications for three protocols: • Chain replication (Cornell University) • Niobe (Microsoft) • GFS (Google) • Our experience shows that FM help solve FTFS challenges: • Comparing system mechanisms & tradeoffs • Understanding and proving semantics • Exploring alternative designs 7

  8. Outline Specification effort Experiences with formal specifications for FTFS: Comparing system mechanisms Understanding and proving semantics Exploring alternative designs Conclusions 8

  9. Specification Effort Question: How hard is it to build specifications? Answer: Moderately precise specifications are reasonably easy to produce

  10. Outline Specification effort Experiences with formal specifications for FTFS: Comparing system mechanisms Understanding and proving semantics Exploring alternative designs Conclusions 10

  11. 1. Comparing System Mechanisms • Case study: GFS vs. Niobe • From prose, they seemed very different systems • GFS: trades some consistency for throughput • Niobe: designed for strong consistency • Our TLA+ specifications highlight significant mechanism overlap and also key differences 11

  12. Capturing Similarities & Differences • More than half of the TLA+ code-base is common • Specifications are small due to TLA+ expressiveness • Compare their total sizes to the tens of thousands of LOC of the systems’ implementations single-master, primary-secondary replication Common (291 lines) Niobe GFS (189 lines) (287 lines)

  13. Differences Stand Out Clearly in TLA+ • Example: Write completion in GFS and Niobe w w 1 4 w w ACK ACK 2 3

  14. Differences Stand Out Clearly in TLA+ • Example: Write completion in GFS and Niobe w w w w 1 4 1 4 w w Group reconfiguration w w ACK ACK ACK ACK 2 3 2 3

  15. Understanding Tradeoffs • Example: Write completion in GFS and Niobe Tradeoff: Smaller latency, but writes may leave group inconsistent A write never leaves replica group in inconsistent state

  16. Lesson: Formalism Helps in Comparison • Formal specifications distill key differences and similarities between systems • Understanding the key differences enables us to understand tradeoffs

  17. Outline Specification effort Experiences with formal specifications for FTFS: Comparing system mechanisms Understanding and proving semantics Exploring alternative designs Conclusions 17

  18. 2. Understanding FTFS Consistency • Hard to prove consistency models for FTFSs • For weakly consistent systems, it can be even harder • Solution: use refinement mapping • Reduce system to a really simple model • Prove the correctness of the reduction • Reason about the SimpleStore • For convenience, we use model-checking instead of full manual proofsat Step 2 SimpleStore consistency model consistency model reduction System

  19. SimpleStores for the Three FTFSs • SimpleStores capture only client-visible behaviors and abstract out all protocol mechanisms • SimpleStores are easy to reason about Chain_SS Niobe_SS GFS_SS reduction reduction reduction Chain Chain Blue Niobe GFS

  20. Chain’s Consistency Semantics Chain_SS linearizable linearizable Proof is straightforward (half a page) reduction Chain • Using convenient methods, we gained reliable insight into Chain’s consistency model 20

  21. Niobe’s Consistency Semantics Chain_SS Niobe_SS GFS_SS linearizable linearizable reduction reduction ?? Chain Niobe GFS linearizable linearizable • Similar experience as with Chain • Thus, formal methods help in verifying standard consistency models for strongly-consistent FTFSs 21

  22. GFS’ Consistency Semantics • Formal methods proved helpful in several ways An interesting conclusion (details in the paper): • Using refinement mappings, we were able to show that, under a small set of assumptions, GFS has regular-register semantic GFS_SS well-defined intermediate-level consistency model regular register reduction GFSassumptions regular register

  23. Lesson: Formalism Helps Understand Semantics • Refinement mappings help in understanding & reliably verifying consistency models of FTFS • They are useful for both strongly consistent and weakly consistent FTFSs

  24. Outline Specification effort Experiences with formal specifications for FTFS: Comparing system mechanisms Understanding and proving semantics Exploring alternative designs Conclusions 24

  25. 3. Exploring Alternative Designs • Exploring alternative designs is much easier using our framework (TLA+ specs, SimpleStores, reductions) System SimpleStore reduction System model 25

  26. Case-Study: Changing Niobe’s Design • Currently, Niobe’s clients read from primary only • Reading from any replica may improve throughput • Design question: What happens to Niobe if it adopts read-any policy? Niobe_SS GFS_SS linearizable regular register Nioberead-any GFSassumption Chain ? regular register regular register

  27. Conclusions • FTFSs are extremely important in today’s Web • We showed how formalmethods can help improve our understanding and trust in FTFSs Lessons from our experience with three FTFSs: • Writing formal specifications is relatively easy • Formal methods enable: • Insightful comparison of mechanisms & tradeoffs • Reliable verification of consistency properties • Convenient investigation of alternativedesigns

  28. Appendix 28

  29. Related Work • FM are extensively used to reason about software [Bickford, et.al., 96] and hardware [Shimizu, et.al., 02] • However, FTFS builders have not adopted them yet • By sharing our experience, we hope to convince FTFS builders of the utility of specifying their systems formally • Using FM to improve understanding and trust in systems: • Previous works apply FM to various classes of systems: [Chkliaev, et.al., 00], [Crow, et.al., 98], [Joshi, et.al., 03], [Houston, et.al., 91] • The closest works are those looking at distributed FS (AFS, Coda) [Sivathanu, et.al., 05], [Wing, et.al., 97], [Yang, et.al., 04] • We show how to apply them in the specific context of FTFS • Reducing complex systems to simple ones in order to reason about semantics has been used before [Joshi, et.al., 03] • We apply this method to FTFSs

  30. GFS Assumptions If: • A write never crosses chunk boundaries • GFS client library offers chunk-level operations • A write never goes to a stale replica • Implement this assumption using a lease mechanism Then: GFS_SS regular register reduction GFSassumptions regular register

  31. Standard Consistency Models • Linearizability (Atomic register semantic) • Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S • The sequential interleaving S preserves thereal-time ordering of operations from H • Serializability • Any client-visible history H generated by the system is equivalent to a legal sequential interleaving S • Regular register semantic • Read not concurrent with any write returns most recently written value • Read concurrent with some writes returns either the value of the in-process writes or the most recently written value • Safe register semantic • Read not concurrent with any write returns most recently written value • Read concurrent with some writes can return anything

  32. Summary of Contributions • Identified a new important class of extremely complex systems: FTFSs • Showed three aspects of FTFS design & analysis for which FM prove especially valuable • Mechanism comparison, semantics understanding, and design space exploration • Showed how to apply specific FMs to FTFSs • Showed how to construct SimpleStores and what can be learned from them • SimpleStores are reusable between systems • We believe that our study, tailored toward FTFSs, can be more relevant to FTFS designers than more general studies 32

  33. Lessons from Our Experience • Building high-level specifications for FTFS is relatively easy • It is also remarkably useful for understanding system • The exercise of writing specifications exposes similarities in seemingly dissimilar systems (GFS, Niobe) • Formal specifications also distill the key design differences • Specifications enable convenient verifications of consistency for both strongly and weakly consistent systems • Niobe and Chain are both linearizable • GFS can be upgraded to regular register via a clear set of assumptions • GFS’ design to read from any replica heavily influences its consistency • Intuition can fail often times • Niobe seemed to be reducible to Chain_SS, but actually was not

  34. Chain SimpleStore Responses Requests reads writes writes reads read channel r1 r2 r3 read() SerialDB w6 commit(w5) w7 w5 write channel drop(w7) Chain_SS 34

  35. The Temporal Logic of Actions (TLA+) • Formalism that combines a temporal logic with a logic of actions • Especially designed for specification of distributed asynchronous systems • TLA+ specifications model the system as a state machine: • Define system variables (state) • Model actions that the system can take as state transitions 35

  36. Understanding Tradeoffs Smaller write latency, but writes may leave group inconsistent A write never leaves replica group in inconsistent state Old value Error 4 1 read 4 read 1 2 3 2 3

More Related