1 / 34

Tolerating File-System Mistakes with EnvyFS

Tolerating File-System Mistakes with EnvyFS. Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison. Lakshmi N. Bairavasundaram NetApp, Inc. File Systems in Today’s World. Modern file systems are complex

roland
Download Presentation

Tolerating File-System Mistakes with EnvyFS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tolerating File-System Mistakes with EnvyFS Swaminathan Sundararaman Andrea C. Arpaci-Dusseau Remzi H. Arpaci-Dusseau University of Wisconsin Madison Lakshmi N. Bairavasundaram NetApp, Inc.

  2. File Systems in Today’s World • Modern file systems are complex • Tens of thousands of lines of code (e.g., XFS 45K LOC) • Storage stack is also getting deeper • Hypervisor, network, logical volume manager • Need to handle a gamut of failures • Memory allocation, disk faults, bit flips, system crashes • Preserve integrity of its meta-data and user data Tolerating File-System Mistakes with EnvyFS

  3. File System Bugs • Bug reports for Linux 2.6 series from Bugzilla • ext3: 64, JFS: 17, ReiserFS: 38 • Some are FS corruption causing permanent data loss • FS bugs broadly classified into two categories • “fail-stop”: System immediately crashes • Solutions: Nooks [Swift 04], CuriOS [David08] • “fail-silent”: Accidentally corrupt on-disk state • Many such bugs uncovered [Prabhakaran05, Gunawi08, Yang04, Yang06b] Tolerating File-System Mistakes with EnvyFS

  4. Bugs are inevitable in file systems Challenge: how to cope with them? Tolerating File-System Mistakes with EnvyFS

  5. N-Version File Systems • Based on N-version programming [Avizienis77] • NFS servers [Rodrigues01], databases [Vandiver07], security [Cox06] Application • EnvyFS: Simple software layer • Store data in Nchild file systems • Operations performed on all children • Rely on a simple software layer • Challenge: reducing overheads while retaining reliability • SubSIST: Novel Single Instance Store EnvyFS layer … Child 1 Child N Child 2 SIS layer Disk driver Disk Tolerating File-System Mistakes with EnvyFS

  6. Results • Robustness • Traditional file systems handle few corruptions (< 4%) • EnvyFS3 tolerates 98.9% of single file system mistakes • Performance • Desktop workloads: EnvyFS3 has comparable performance • I/O intensive workloads: • Normal mode: EnvyFS3 + SubSIST acceptable performance • Under memory pressure: EnvyFS3 + SubSIST large overheads • Potential as a debugging tool for FS developers • Pinpoint the source of “fail-silent” bug in ext3 Tolerating File-System Mistakes with EnvyFS

  7. Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Conclusion Tolerating File-System Mistakes with EnvyFS

  8. N-Version Systems Development process: • Producing the specification of software • Implementing N versions of the software • Creating N-version layer • Executes different versions • Determines the consensus result Tolerating File-System Mistakes with EnvyFS

  9. 1. Producing Specification • Our own specification ? • Impractical: Requires wide scale changes to file systems • Specifications take years to get accepted • Can we leverage existing specification ? • Yes, can leverage VFS, but there are some issues • VFSnot precise for N-versioning purpose • Needs to handle cases where specification is not precise • e.g., Ordering directory entries, inode number allocation Tolerating File-System Mistakes with EnvyFS

  10. Imprecise VFS Specification File 1 File 2 File 3 Ordering directory entries • Issue: • No specified return order • Can’t blindly compare entries • Solution: • Read all entries from a directory (dir: test in our case) from all FSes • Match entries from FSes • Return majority results Dir: test EnvyFS layer File 1 File 2 File 3 No Entries Readdir: test File 2 File 3 File 1 File 3 File 1 File 2 File 1 File 2 File 3 FS X FS Z FS Y … File 1 File 2 File 3 Dir: test Dir: test Dir: test Tolerating File-System Mistakes with EnvyFS

  11. Imprecise VFS Specification (cont) • Inode number allocation • Inode numbers returned through system calls • Each child file system issues different inode numbers • Possible solution: Force file systems to use same algorithm? • Our solution: Issue inode numbers at EnvyFS layer EnvyFS layer File 1 10 File 2 15 File 3 16 File 2 04 File 3 44 File 1 36 File 3 99 File 1 65 File 2 43 Stat: File 1 File 1 | ?? 15 Virt # FS 1 FS 2 FS 3 15 10 36 65 FS X FS Z File 1 | 10 File 1 | 36 File 1 |65 FS Y Inode Mapping Table Inode Mapping Table not persistently stored Dir: test Dir: test Dir: test Inode Numbers Tolerating File-System Mistakes with EnvyFS

  12. 2. Implementing N versions of FS • Painful process • High cost of development, long time delays • Lucky! Hard work already done for us • 30 different disk based file systems in Linux 2.6 • Which file systems to use? • ext3, JFS, ReiserFS in a three-version FS • Others should work without modifications Tolerating File-System Mistakes with EnvyFS

  13. 3. Creating N-Version Layer Application • N-Version layer (EnvyFS) • Inserted beneath VFS • Simple design to avoid bugs • Example: Reading a file • Allocate N data buffers • Read data block from the disk • Compare: data, return code, file position • Return: data, return code • Issues: • Allocate memory for each read operation • Extra copy from allocated buffer to application • Comparison overheads Read (file, 1 block) err , VFS layer Read (file, 1 block) err , EnvyFS Layer Inode Mapping Table Wrappers Comparators F err = Read (…) err = Read (…) err = Read (…) F F … pos: x JFS ext3 D D pos: x D D D D D D ReiserFS pos: x Disk Tolerating File-System Mistakes with EnvyFS

  14. Reading a File in EnvyFS Application • Solution: • Same application buffer for all FS • TCP-like checksums for data comparison • Compare: checksums, return code, file position • Read data until majority Read (file, 1 block) err , VFS layer Read (file, 1 block) err , EnvyFS Layer Inode Mapping Table Wrappers Comparators F err = Read (…) err = Read (…) err = Read (…) F F … pos: x ext3 JFS D D pos: x D D D D D D ReiserFS pos: x Disk 435 435 436 FS 1 # FS 2 # … FS N # … Checksums Tolerating File-System Mistakes with EnvyFS

  15. Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Conclusion Tolerating File-System Mistakes with EnvyFS

  16. Case for Single Instance Storage (SIS) Application • Ideal: One disk per FS • Practical: One disk for all FS • Overheads • Effective storage space: 1/N • N times more I/O (Read/write) • Challenge:Maintain diversity while minimizing overheads VFS layer EnvyFS layer 1 1 1 2 2 N N … FS 1 FS N FS 2 … Part 1 Part 2 Part N … Disk Disk 1 Disk 2 Disk N Disk Req. Queue Tolerating File-System Mistakes with EnvyFS

  17. SubSIST: Single Instance Store Application • Variant of an Single Instance Store • Selectively merges data blocks • Block addressable SIS • Exports virtual disks to FSes • Manages mapping, free space info. • Not persistently stored on disk • EnvyFS writes through N file systems • N data blocks merged to 1 data block • Content hashes not stored persistently • Meta-data blocksnot merged • Inter FS blocks and notintra FS VFS layer EnvyFS layer … FS 1 FS N FS 2 Vdisk 1 Vdisk 2 Vdisk N SubSIST CHash Layer Read Cache D D D D D D D Free Space Management M M M Disk Tolerating File-System Mistakes with EnvyFS

  18. Handling Data Block Corruptions? Application • Corruption to data in a single FS • Due to bugs, bit flips, storage stack • Corrupt data blocks not merged • All other N-1 data blocks merged • Corrupt data block fixed at next read • Corruption to data block inside disk • Single copy of data • Different code paths • Different on-disk structures VFS layer EnvyFS layer … FS 1 FS N FS 2 D D D Vdisk 1 Vdisk 2 Vdisk N SubSIST CHash Layer Read Cache D D D D D D D D D Free Space Management Disk Tolerating File-System Mistakes with EnvyFS

  19. Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Reliability • Performance • Conclusion Tolerating File-System Mistakes with EnvyFS

  20. Reliability Evaluation: Fault Injection VFS Robustness of EnvyFS in recovering from a child file system’s mistake? • Corruption: bugs in FS / storage stack • Types of disk blocks • superblock, inode, block bitmap, file data, … • Perform different file ops • mount, stat, creat, unlink, read, … • Report user visible results • All results are applicable with SubSIST except corruption to data blocks EnvyFS layer ext 3 ReiserFS JFS Pseudo Device Driver Type-aware fault injection [Prabhakaran05] B Block Driver B B B B Disk B B B Tolerating File-System Mistakes with EnvyFS

  21. path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Normal Data loss Cannot mount e Ops fail Data corrupt Crash Read-only Depends N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Result Matrix Tolerating File-System Mistakes with EnvyFS

  22. path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Data loss Cannot mount N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Ext3 stores many superblock copies; but, does not handle superblock corruption Tolerating File-System Mistakes with EnvyFS

  23. path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Data loss Cannot mount Ops fail Crash N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC • In addition to operations failing, inode corruption leads to data loss • Unlink: system crash during unmount Tolerating File-System Mistakes with EnvyFS

  24. path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Normal Data loss Cannot mount e Ops fail Data corrupt Crash Read-only Depends N/A ext3 E INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC Tolerating File-System Mistakes with EnvyFS

  25. path traversal SET-1 (stat, …) SET-2 (chmod) read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 (fsync) umount Normal N/A EnvyFS3 EnvyFS Kernel panic in ext3 R E J INODE DIR BMAP IMAP INDIRECT DATA SUPER JSUPER GDESC EnvyFS3 works in every scenario Tolerating File-System Mistakes with EnvyFS

  26. Potential for Bug Isolation ext3 EnvyFS3 Unlink on corrupt inode: - ext3_lookup (bug) - ext3_unlink • Unlink on corrupt inode: • - ext3_lookup (bug) • ext3 inode does not match others • Further ops not issued Time Time Unmount (panic) In EnvyFS3, a problem is noticed the first time child file system returns wrong results In typical use, a problem is noticed only on panic Tolerating File-System Mistakes with EnvyFS

  27. path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount Normal Data loss Cannot mount a Ops fail Data corrupt Crash Read-only Depends N/A JFS J INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Tolerating File-System Mistakes with EnvyFS

  28. path traversal SET-1 SET-2 read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET-3 umount Normal Crash N/A EnvyFS3 EnvyFS R E J INODE DIR BMAP IMAP INTERNAL DATA SUPER JSUPER JDATA AGGR-INODE IMAPDESC IMAPCNTL Kernel panic in EnvyFS3 Tolerating File-System Mistakes with EnvyFS

  29. OpenSSH Benchmark Performance Evaluation 3 % overhead • Experimental setup • AMD Opteron 2.2 GHz Processor • 2GB RAM • 80 GB Hitachi Deskstar 7200-rpm SATA disk • Linux 2.6.12 • 4GB disk partition for each file system • CPU Intensive • OpenSSH 4.5 • -- Copy, untar and make Elapsed Time (in Seconds) Performance of EnvyFS3 is comparable to a single file system File Systems Tolerating File-System Mistakes with EnvyFS

  30. Postmark Benchmark EnvyFS3: 8x + SubSIST: 4x • I/O Intensive • Mimics busy mail server workload • Transaction: creates, deletes, reads, appends, … • Postmark Configuration • 2500 files • File size: 4Kb – 40Kb • No. of transactions: 10K and 100K 851 EnvyFS3: 1.7x + SubSIST: 11.5% EnvyFS3: 3.3x + SubSIST: -32% Elapsed Time (in Seconds) 430 406 271 243 128 129.0 107 78 39.0 34 26.4 29 14.7 9.6 Tolerating File-System Mistakes with EnvyFS

  31. Summary of Results • Robustness • Traditional file systems vulnerable to corruptions • EnvyFS3 tolerates almost all mistakes in one FS • Performance • Desktop workloads: EnvyFS3 has comparable performance • I/O intensive workloads: • Regular Operations: EnvyFS3 + SubSIST acceptable performance • Memory pressure:EnvyFS3 + SubSIST has large overhead Tolerating File-System Mistakes with EnvyFS

  32. Outline • Introduction • Building reliable file systems • Reducing overheads with SubSIST • Evaluation • Conclusion Tolerating File-System Mistakes with EnvyFS

  33. Conclusion • Bugs/mistakes are inevitable in any software • Must cope, not just hope to avoid • EnvyFS: N-version approach to tolerating FS bugs • Built using existing specification and file systems • SubSIST: single instance store • Decreases overheads while retaining reliability Tolerating File-System Mistakes with EnvyFS

  34. Thank You! Advanced Systems Lab (ADSL) University of Wisconsin-Madison http://www.cs.wisc.edu/adsl Tolerating File-System Mistakes with EnvyFS

More Related