1 / 37

Byzantine Fault Isolation in the Farsite Distributed File System

Byzantine Fault Isolation in the Farsite Distributed File System. John R. Douceur and Jon Howell. Byzantine fault 'biz- ə n- t ē n fo lt n (1982) : a failure of a system component that produces arbitrary behavior. ˙. '. '.

leone
Download Presentation

Byzantine Fault Isolation in the Farsite Distributed File System

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Byzantine Fault Isolation in the Farsite Distributed File System John R. Douceur and Jon Howell

  2. Byzantine fault \'biz-ən- tēn folt\ n(1982) : a failure of a system component that produces arbitrary behavior ˙ ' ' Byzantine fault isolation \'biz-ən- tēn folt ī-sə-'lā- shən\ n(2006) : methodology for designing a distributed system that can, under Byzantine failure, operate with application-defined partial correctness ˙ ' ' ' BFI \ bē-ef-'ī\ n(2006) : Byzantine fault isolation ' Definitions Farsite \'fär-sīt\ n(2000) : serverless distributed file system developed at Microsoft Research, designed to be scalable, strongly consistent, and secure despite running on an untrusted infrastructure of desktop PCs

  3. Talk Outline • Context – Farsite system • Why BFT doesn’t scale • Farsite’s use of multiple BFT groups • The need for isolating Byzantine faults • Formal system specification • BFI in Farsite

  4. Farsite System client server server client server

  5. – Metadata Farsite System metadata users clients BFT group

  6. – Metadata Farsite System T = tolerable faults R = count of replicas R > 3 T • Using Byzantineagreement protocol,assign sequencenumbers to messages • Prepare-commitamong 2 T + 1 servers • Deterministicallyupdate metadata • Reply to client users clients BFT group

  7. The Cost of BFT Groups  1  4 computation message delays 5 2 messages 2 32

  8. Throughput vs. Scale 7 6 5 4 throughput multiple 3 2 1 0 1 2 3 4 5 6 7 machine count ideal typical flat BFT

  9. Workload Sharing Workload client server

  10. BFT at Scale

  11. Multiple BFT Groups

  12. Tree of BFT Groups

  13. Tree of BFT Groups / public users emacs cruft Alice Bob Outlook vi code docs C++ C# Proj X foo bar src bin src bin

  14. Delegation to New Group / public users emacs cruft Alice Bob Outlook vi code docs C++ C# Proj X foo bar src bin src bin

  15. / public users emacs cruft Alice Bob Outlook vi code docs C++ C# Proj X foo bar src bin src bin Pathname Resolution /users/Alice/code/C#/bar

  16. Machine Failures at Scale

  17. Group Failures at Scale

  18. System Failure at Scale

  19. Quantitative Fault Analysis • Example system • File system distributed among interacting BFT groups • Simplifying assumptions • Files are partitioned evenly among BFT groups • Machine failures are independent • Machine fault probability = 0.001 • Evaluate: operational fault rate • Probability that an operation on a randomly selected file exhibits a fault

  20. 0.45 –1 –3 –4 –5 –2 –7 –6 –6 –5 –6 0 610 310 610 10 10 10 10 10 10 10 10 Operational Faults vs. System Scale operational fault rate 1 10 100 1,000 10,000 100,000 system scale (count of BFT groups) BFT 4, no BFI BFT 7, no BFI BFT 10, no BFI BFT 4, ideal BFI BFT 4, tree (4) BFI BFT 4, tree (16) BFI

  21. BFI versus no BFI

  22. BFI versus no BFI 4-member BFT groups with BFI 10-member BFT groups without BFI  4  10 computation messages 200 32 throughput reduction: 60% 84%

  23. refinement ment NEW Improved! BFI via Formal Specification state state actions actions + faults + faults distributedsystemspec semanticspec

  24. Farsite Semantic Spec / tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read open move open handles pending operations

  25. Farsite Distributed-System Spec

  26. / tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read move open handles pending operations Farsite Refinement del

  27. Actions are State Transitions / a.cpp openhandles pending operations

  28. Proving Refinement Inductively / a.cpp openhandles pending operations

  29. / tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read del move open handles pending operations Refinement with Byzantine Faults

  30. Refinement with Byzantine Faults / tools code C++ emacs src bin cl.exe a.h a.cpp a.obj a.exe read del move open handles pending operations

  31. Semantic Fault Specification A tainted file may have arbitrary contents and attributes • Safety • A tainted file may have arbitrary contents and attributes • A tainted file may appear not linked into namespace • A tainted file may pretend not to have children it actually has • A tainted file may pretend to have children that do not exist • A tainted file may pretend another tainted file is a child or parent • Liveness • Operations involving a tainted file may not complete A tainted file may appear not linked into namespace A tainted file may pretend not to have children it actually has A tainted file may pretend to have children that do not exist A tainted file may pretend another tainted file is a child or parent Operations involving a tainted file may not complete / Hello world ,,)*&#()*&{ 1[9^^x **{ o [[ …. 2 %%% @@) ,. ,. {^ \-~-/ ^} " " ,". { <o> _ <o> } / } ==_ .:Y:. _=={ { _/ `--^--' \_} } / \ / \ / { ( ) y \ ! | | ! / ,-.i~ ~i i~ ~i,-. (!!( V )!!) ^-'-'-^-'-'-^ tools code C++ emacs src bin foo bar cl.exe a.h a.cpp a.obj a.exe

  32. Distributed-System Improvements Maintain redundant info across BFT group boundaries Augment messages with info that justifies correctness • Maintain redundant info across BFT group boundaries • Augment messages with info that justifies correctness • Ensure unambiguous chains of authority over data • Carefully order messages and state updates for operations involving multiple BFT groups Ensure unambiguous chains of authority over data Carefully order messages and state updates foroperations involving multiple BFT groups

  33. Summary of BFI Methodology • Formally specify your system • Semantic spec: user’s view of system • Distributed-system spec: designer’s view of system • Refinement interprets distributed-system spec in semantic terms • Modify distributed-system spec to express Byzantine faults • Simultaneously • Strategically weaken semantic spec to describe faults • Improve distributed-system spec to quarantine faults • Refinement lets you know when you are done

  34. Conclusions • BFT groups have negative throughput scaling • Scalable systems can be built from multiple BFT groups • System scale increases the probability of non-maskable Byzantine faults • If faults are not isolated, a single faulty group can corrupt the entire system. • BFI is a methodology for isolating Byzantine faults • BFI uses formal system specification • Improves fault tolerance without hurting throughput, unlike increasing BFT group size

  35. Contact Information JohnDo@microsoft.com Howell@microsoft.com http://research.microsoft.com/farsite

  36. Backup Slides

  37. Farsite Spec Stats • Semantic specification • 1800 lines of TLA+ • 114 definitions • Distributed-system specification • 11,500 lines of TLA+ • 775 definitions • Why so big? • Windows file-system semantics are complex • Scalability and strong consistency • Byzantine fault isolation

More Related