Csc 660 advanced os
This presentation is the property of its rightful owner.
Sponsored Links
1 / 26

CSC 660: Advanced OS PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on
  • Presentation posted in: General

CSC 660: Advanced OS. Filesystem Case Studies. Topics. Early Filesystems (FS, FFS) Journaling Filesystems B Tree Filesystems Network Filesystems GoogleFS Common Problems. Filesystem History. FS (1974) Fast Filesystem (FFS) / UFS (1984) Log-structured Filesystem (1991) ext2 (1993)

Download Presentation

CSC 660: Advanced OS

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Csc 660 advanced os

CSC 660: Advanced OS

Filesystem Case Studies

CSC 660: Advanced Operating Systems


Topics

Topics

  • Early Filesystems (FS, FFS)

  • Journaling Filesystems

  • B Tree Filesystems

  • Network Filesystems

  • GoogleFS

  • Common Problems

CSC 660: Advanced Operating Systems


Filesystem history

Filesystem History

  • FS (1974)

  • Fast Filesystem (FFS) / UFS (1984)

  • Log-structured Filesystem (1991)

  • ext2 (1993)

  • ext3 (2001)

  • WAFL (1994)

  • XFS (1994)

  • Reiserfs (1998)

  • ZFS (2004)

CSC 660: Advanced Operating Systems


Csc 660 advanced os

FS

  • First UNIX filesystem (1974)

  • Simple

    • Layout: superblock, inodes, then data blocks.

    • Unused blocks stored in free linked list, not bitmap.

    • 512 byte blocks, no fragments.

    • Short filenames.

  • Slow: 2% of raw disk bandwidth.

    • Disk seeks consume most file access time due to small block size and high fragmentation.

    • Later doubled perf by using 1KB blocks.

CSC 660: Advanced Operating Systems


Csc 660 advanced os

FFS

  • BSD (1984), basis for SYSV UFS

  • More complex

    • Cylinder groups: inodes, bitmaps, data blocks.

    • Larger blocks (4K) with 1K fragments.

    • Block layout based on physical disk parameters.

    • Long filenames, symlinks, file locks, quotas.

    • 10% space reserved by default.

  • Faster: 14-47% of raw disk bandwidth.

    • Creating a new file requires 5 seeks.

    • 2 inode seeks, 1 file data, 1 dir data, 1 dir inode

    • User/kernel memory copies take 40% of disk op time.

CSC 660: Advanced Operating Systems


Log structured filesystem lfs

Log-structured Filesystem (LFS)

  • All data stored as sequential log entries.

    • Divided into large log segments.

    • Cleaner defragments, produces new segments.

  • Fast recovery: checkpoint + roll forward.

  • Performance: 70% of raw disk bandwidth.

    • Large sequential writes vs multiple writes/seeks.

    • Inode map tracks dynamic locations of inodes.

CSC 660: Advanced Operating Systems


Ext2 and ext3

ext2 and ext3

  • FFS + performance features.

    • Variable block size (1K-4K), no fragments.

    • Partitions disk into block groups.

    • Data block preallocation + read ahead.

    • Fast symlinks (stored in inode.)

    • 5% space reserved by default.

    • Very fast.

  • ext3 adds journaling capabilities.

CSC 660: Advanced Operating Systems


Csc 660 advanced os

WAFL

  • Network Appliance (1994)

  • Metadata in files

    • Root inode points to inode file.

    • Filesystem is tree of blocks with inode file.

    • Write metadata anywhere faster with RAID.

    • Allows filesystem to be expanded on fly.

CSC 660: Advanced Operating Systems


Csc 660 advanced os

WAFL

Copy on write snapshots

  • Hourly (4/day, keep 2d), Daily (keep 7d)

  • Users can get deleted files from .snapshot dirs.

  • Snapshots created by just copying root inode.

  • Creates consistency point snapshot every few seconds.

  • Writes only to unused blocks between consistency snaps.

  • Recovery = last consistency point + replay NVRAM log.

CSC 660: Advanced Operating Systems


Csc 660 advanced os

XFS

  • SGI (1994)

  • Complex

    • Uses B+ trees to track free space, index dirs, locate file blocks and inodes.

    • Dynamic inode allocation, metadata journaling, volume manager, multithreaded, allocate on flush.

    • 64-bit filesystem (filesystems up to 263 bytes.)

    • Fast: 90-95% of raw disk bandwidth.

CSC 660: Advanced Operating Systems


Reiserfs

Reiserfs

  • Multiple different versions (v1-4)

  • Complex

    • Uses B+ trees (v3) or dancing trees (v4).

    • Journaling, allocate on flush, COW, tail-packing

    • High perf with small files, large directories.

    • Second to ext2 in perf (v3.)

CSC 660: Advanced Operating Systems


Csc 660 advanced os

ZFS

  • Sun (2004)

  • Complex

    • Variable block size + compression.

    • Built-in volume manager (striping, pooling.)

    • Self-healing with 64-bit checksums + mirroring.

    • COW transactional model (live data never overwritten)

    • Fast snapshots (just don’t release old blocks.)

    • 128-bit filesystem.

CSC 660: Advanced Operating Systems


Network filesystems

Network Filesystems

  • Idea: Use filesystem to transparently share files between computers.

  • Solution:

    • Client mounts network fs as normal.

    • Client filesystem code sends packets to server(s).

    • Server responds with data stored on a regular on-disk filesystem.

CSC 660: Advanced Operating Systems


Csc 660 advanced os

NFS

  • Sun

    • v2 (1984)

    • v3 (1992) TCP + 64-bit.

  • Implementation

    • System calls via Sun RPC calls.

    • Stateless: client obtains filesystem ID on mount, then uses filesystem ID (like filehandle) in subsequent reqs.

    • UNIX-centric (UIDs, GIDs, permissions)

    • Server authenticates by client IP address.

      • Client UIDs mapped to server w/ root quashing.

      • Danger: Client root user can su to any desired UID.

CSC 660: Advanced Operating Systems


Csc 660 advanced os

CIFS

  • Microsoft (1998)

    • Derived from 1980s IBM SMB net filesystem.

  • Implementation

    • Originally ran over NetBIOS, not TCP/IP.

    • \\svr\share\path Universal Naming Convention

    • Auth: NTLM (insecure), NTLMv2, Kerberos

    • MS Windows-centric (filenames, ACLs, EOLs)

CSC 660: Advanced Operating Systems


Csc 660 advanced os

AFS

  • CMU (1988)

  • Implementation

    • Distributed filesystem: merges fs of multiple svrs.

      • Cells are administrative domains within AFS.

      • Cells contain multiple servers.

      • Each server provides multiple volumes.

      • Global namespace: /afs/abc.com

    • Security: Kerberos + ACLs.

    • Better caching with callbacks from server.

    • Volume replication with RO copies on other svrs.

CSC 660: Advanced Operating Systems


Nfsv4

NFSv4

  • IETF (2000)

    • Based on 1998 Sun draft.

  • New Features

    • Only one protocol.

    • Global namespace.

    • Security (ACLs, Kerberos, encryption)

    • Cross platform + internationalized.

    • Better caching via delegation of files to clients.

CSC 660: Advanced Operating Systems


Googlefs assumptions

GoogleFS Assumptions

  • High rate of commodity hardware failures.

  • Small number of huge files (multi-GB +).

  • Reads: large streaming + small random.

  • Most modifications are appends.

  • High bandwidth >> low latency.

  • Applications / filesystem co-designed.

CSC 660: Advanced Operating Systems


Googlefs architecture

GoogleFS Architecture

CSC 660: Advanced Operating Systems


Googlefs architecture1

GoogleFS Architecture

  • Master server

    • Metadata: namespace, ACL, chunk mapping.

    • Chunk lease management, garbage collection, chunk migration.

  • Chunk servers

    • Serve chunks (64MB + checksum) of files.

    • Chunks replicated on multiple (3) servers.

CSC 660: Advanced Operating Systems


Googlefs writing

GoogleFS Writing

  • Client asks master which chunksvr has lease.

  • Master responds: leaseholder + replicas.

  • Client pushes data to all replicas.

  • Client sends write to primary replica.

  • Primary forwards req.

  • Secondaries reply to primary on completion.

  • Primary replies to client.

CSC 660: Advanced Operating Systems


Common problems

Common Problems

  • Consistency after crash.

  • Large contiguous allocations.

  • Metadata allocation.

CSC 660: Advanced Operating Systems


Consistency

Consistency

  • Detect + Repair

    • Use fsck to repair.

    • Journal replay.

  • Always Consistent

    • Copy on write.

CSC 660: Advanced Operating Systems


Large contiguous allocations

Large Contiguous Allocations

  • Pre-allocation.

  • Block groups.

  • Multiple block sizes.

CSC 660: Advanced Operating Systems


Metadata allocation

Metadata Allocation

  • Fixed number in one location.

  • Fixed number spread across disk.

  • Dynamically allocated in files.

CSC 660: Advanced Operating Systems


References

References

  • Florian Buchholz, “The structure of the Reiser file system,” http://homes.cerias.purdue.edu/~florian/reiser/reiserfs.php, 2006.

  • Remy Card, Theodore T’so, Stephen Tweedie, “Design and Impementation of the Second Extended Filesystem,” http://web.mit.edu/tytso/www/linux/ext2intro.html, 1994.

  • Sanjay Ghemawat et. al., “The Google File System,” SOSP, 2003.

  • Christopher Hertel, Implementing CIFS, Prentice Hall, 2003.

  • Val Henson, “A Brief History of UNIX Filesystems,” http://infohost.nmt.edu/~val/fs_slides.pdf

  • Dave Hitz, James Lau, Michael Malcolm, “File System Design for an NFS File Server Appliance,” Proceedings of the USENIX Winter 1994 Technical Conference, http://www.netapp.com/library/tr/3002.pdf

  • John Howard et. al., “Scale and Performance in a Distributed File System,” ACM Transactions on Computer Systems 6(1), 1988.

  • Marshall K. McKusick, “A Fast File System for Unix,” Transactions on Computer Systems 2(3), 1984.

  • Brian Powlowski et. a., “The NFS Version 4 Protocol,” SANE 2000.

  • Daniel Robbins, “Advanced File System Implementor’s Guide,” IBM Developer Works, http://www-128.ibm.com/developerworks/linux/library/l-fs9.html, 2002.

  • Claudia Rodriguez et al, The Linux Kernel Primer, Prentice-Hall, 2005.

  • Mendel Rosenblum and John K. Osterhout, “The Design and Implementation of a Log-structured Filesystem,” 13th ACM SOSP, 1991.

  • R. Sandberg, “Design and Implementation of the Sun Network Filesystem,” Proceedings of the USENIX 1985 Summer Conference, 1985.

  • Adam Sweeney et. al., “Scalability in the XFS File System,” Proceedings of the USENIX 1996 Annual Technical Conference, 1996.

  • Wikipedia, http://en.wikipedia.org/wiki/Comparison_of_file_systems

CSC 660: Advanced Operating Systems


  • Login