1 / 26

CSC 660: Advanced OS

CSC 660: Advanced OS. Filesystem Case Studies. Topics. Early Filesystems (FS, FFS) Journaling Filesystems B Tree Filesystems Network Filesystems GoogleFS Common Problems. Filesystem History. FS (1974) Fast Filesystem (FFS) / UFS (1984) Log-structured Filesystem (1991) ext2 (1993)

maxim
Download Presentation

CSC 660: Advanced OS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC 660: Advanced OS Filesystem Case Studies CSC 660: Advanced Operating Systems

  2. Topics • Early Filesystems (FS, FFS) • Journaling Filesystems • B Tree Filesystems • Network Filesystems • GoogleFS • Common Problems CSC 660: Advanced Operating Systems

  3. Filesystem History • FS (1974) • Fast Filesystem (FFS) / UFS (1984) • Log-structured Filesystem (1991) • ext2 (1993) • ext3 (2001) • WAFL (1994) • XFS (1994) • Reiserfs (1998) • ZFS (2004) CSC 660: Advanced Operating Systems

  4. FS • First UNIX filesystem (1974) • Simple • Layout: superblock, inodes, then data blocks. • Unused blocks stored in free linked list, not bitmap. • 512 byte blocks, no fragments. • Short filenames. • Slow: 2% of raw disk bandwidth. • Disk seeks consume most file access time due to small block size and high fragmentation. • Later doubled perf by using 1KB blocks. CSC 660: Advanced Operating Systems

  5. FFS • BSD (1984), basis for SYSV UFS • More complex • Cylinder groups: inodes, bitmaps, data blocks. • Larger blocks (4K) with 1K fragments. • Block layout based on physical disk parameters. • Long filenames, symlinks, file locks, quotas. • 10% space reserved by default. • Faster: 14-47% of raw disk bandwidth. • Creating a new file requires 5 seeks. • 2 inode seeks, 1 file data, 1 dir data, 1 dir inode • User/kernel memory copies take 40% of disk op time. CSC 660: Advanced Operating Systems

  6. Log-structured Filesystem (LFS) • All data stored as sequential log entries. • Divided into large log segments. • Cleaner defragments, produces new segments. • Fast recovery: checkpoint + roll forward. • Performance: 70% of raw disk bandwidth. • Large sequential writes vs multiple writes/seeks. • Inode map tracks dynamic locations of inodes. CSC 660: Advanced Operating Systems

  7. ext2 and ext3 • FFS + performance features. • Variable block size (1K-4K), no fragments. • Partitions disk into block groups. • Data block preallocation + read ahead. • Fast symlinks (stored in inode.) • 5% space reserved by default. • Very fast. • ext3 adds journaling capabilities. CSC 660: Advanced Operating Systems

  8. WAFL • Network Appliance (1994) • Metadata in files • Root inode points to inode file. • Filesystem is tree of blocks with inode file. • Write metadata anywhere faster with RAID. • Allows filesystem to be expanded on fly. CSC 660: Advanced Operating Systems

  9. WAFL Copy on write snapshots • Hourly (4/day, keep 2d), Daily (keep 7d) • Users can get deleted files from .snapshot dirs. • Snapshots created by just copying root inode. • Creates consistency point snapshot every few seconds. • Writes only to unused blocks between consistency snaps. • Recovery = last consistency point + replay NVRAM log. CSC 660: Advanced Operating Systems

  10. XFS • SGI (1994) • Complex • Uses B+ trees to track free space, index dirs, locate file blocks and inodes. • Dynamic inode allocation, metadata journaling, volume manager, multithreaded, allocate on flush. • 64-bit filesystem (filesystems up to 263 bytes.) • Fast: 90-95% of raw disk bandwidth. CSC 660: Advanced Operating Systems

  11. Reiserfs • Multiple different versions (v1-4) • Complex • Uses B+ trees (v3) or dancing trees (v4). • Journaling, allocate on flush, COW, tail-packing • High perf with small files, large directories. • Second to ext2 in perf (v3.) CSC 660: Advanced Operating Systems

  12. ZFS • Sun (2004) • Complex • Variable block size + compression. • Built-in volume manager (striping, pooling.) • Self-healing with 64-bit checksums + mirroring. • COW transactional model (live data never overwritten) • Fast snapshots (just don’t release old blocks.) • 128-bit filesystem. CSC 660: Advanced Operating Systems

  13. Network Filesystems • Idea: Use filesystem to transparently share files between computers. • Solution: • Client mounts network fs as normal. • Client filesystem code sends packets to server(s). • Server responds with data stored on a regular on-disk filesystem. CSC 660: Advanced Operating Systems

  14. NFS • Sun • v2 (1984) • v3 (1992) TCP + 64-bit. • Implementation • System calls via Sun RPC calls. • Stateless: client obtains filesystem ID on mount, then uses filesystem ID (like filehandle) in subsequent reqs. • UNIX-centric (UIDs, GIDs, permissions) • Server authenticates by client IP address. • Client UIDs mapped to server w/ root quashing. • Danger: Client root user can su to any desired UID. CSC 660: Advanced Operating Systems

  15. CIFS • Microsoft (1998) • Derived from 1980s IBM SMB net filesystem. • Implementation • Originally ran over NetBIOS, not TCP/IP. • \\svr\share\path Universal Naming Convention • Auth: NTLM (insecure), NTLMv2, Kerberos • MS Windows-centric (filenames, ACLs, EOLs) CSC 660: Advanced Operating Systems

  16. AFS • CMU (1988) • Implementation • Distributed filesystem: merges fs of multiple svrs. • Cells are administrative domains within AFS. • Cells contain multiple servers. • Each server provides multiple volumes. • Global namespace: /afs/abc.com • Security: Kerberos + ACLs. • Better caching with callbacks from server. • Volume replication with RO copies on other svrs. CSC 660: Advanced Operating Systems

  17. NFSv4 • IETF (2000) • Based on 1998 Sun draft. • New Features • Only one protocol. • Global namespace. • Security (ACLs, Kerberos, encryption) • Cross platform + internationalized. • Better caching via delegation of files to clients. CSC 660: Advanced Operating Systems

  18. GoogleFS Assumptions • High rate of commodity hardware failures. • Small number of huge files (multi-GB +). • Reads: large streaming + small random. • Most modifications are appends. • High bandwidth >> low latency. • Applications / filesystem co-designed. CSC 660: Advanced Operating Systems

  19. GoogleFS Architecture CSC 660: Advanced Operating Systems

  20. GoogleFS Architecture • Master server • Metadata: namespace, ACL, chunk mapping. • Chunk lease management, garbage collection, chunk migration. • Chunk servers • Serve chunks (64MB + checksum) of files. • Chunks replicated on multiple (3) servers. CSC 660: Advanced Operating Systems

  21. GoogleFS Writing • Client asks master which chunksvr has lease. • Master responds: leaseholder + replicas. • Client pushes data to all replicas. • Client sends write to primary replica. • Primary forwards req. • Secondaries reply to primary on completion. • Primary replies to client. CSC 660: Advanced Operating Systems

  22. Common Problems • Consistency after crash. • Large contiguous allocations. • Metadata allocation. CSC 660: Advanced Operating Systems

  23. Consistency • Detect + Repair • Use fsck to repair. • Journal replay. • Always Consistent • Copy on write. CSC 660: Advanced Operating Systems

  24. Large Contiguous Allocations • Pre-allocation. • Block groups. • Multiple block sizes. CSC 660: Advanced Operating Systems

  25. Metadata Allocation • Fixed number in one location. • Fixed number spread across disk. • Dynamically allocated in files. CSC 660: Advanced Operating Systems

  26. References • Florian Buchholz, “The structure of the Reiser file system,” http://homes.cerias.purdue.edu/~florian/reiser/reiserfs.php, 2006. • Remy Card, Theodore T’so, Stephen Tweedie, “Design and Impementation of the Second Extended Filesystem,” http://web.mit.edu/tytso/www/linux/ext2intro.html, 1994. • Sanjay Ghemawat et. al., “The Google File System,” SOSP, 2003. • Christopher Hertel, Implementing CIFS, Prentice Hall, 2003. • Val Henson, “A Brief History of UNIX Filesystems,” http://infohost.nmt.edu/~val/fs_slides.pdf • Dave Hitz, James Lau, Michael Malcolm, “File System Design for an NFS File Server Appliance,” Proceedings of the USENIX Winter 1994 Technical Conference, http://www.netapp.com/library/tr/3002.pdf • John Howard et. al., “Scale and Performance in a Distributed File System,” ACM Transactions on Computer Systems 6(1), 1988. • Marshall K. McKusick, “A Fast File System for Unix,” Transactions on Computer Systems 2(3), 1984. • Brian Powlowski et. a., “The NFS Version 4 Protocol,” SANE 2000. • Daniel Robbins, “Advanced File System Implementor’s Guide,” IBM Developer Works, http://www-128.ibm.com/developerworks/linux/library/l-fs9.html, 2002. • Claudia Rodriguez et al, The Linux Kernel Primer, Prentice-Hall, 2005. • Mendel Rosenblum and John K. Osterhout, “The Design and Implementation of a Log-structured Filesystem,” 13th ACM SOSP, 1991. • R. Sandberg, “Design and Implementation of the Sun Network Filesystem,” Proceedings of the USENIX 1985 Summer Conference, 1985. • Adam Sweeney et. al., “Scalability in the XFS File System,” Proceedings of the USENIX 1996 Annual Technical Conference, 1996. • Wikipedia, http://en.wikipedia.org/wiki/Comparison_of_file_systems CSC 660: Advanced Operating Systems

More Related