File Systems

File Systems CSE451 Andrew Whitaker

Outline • File System Interface • The programmer/user’s perspective • File System Implementation

File System Goal #1 • Allow a single disk (or partition) to be treated as many smaller storage containers • Files can have arbitrary size • Files can grow and shrink • Size is not stated up front

“path” File System Goal #2 • Provide a hierarchical name-space for referring to files • Key idea: directories as containers for files / home/ var/ tmp/ usr/ chris andrew kris

File System Goal #3 • Protected sharing of information • Allow users / programs to share data • Provide access control mechanisms to limit sharing drwxr-xr-x 4 gaetano www 4096 Mar 15 2005 sewpc drwxrwx--x 4 zahorjan www 4096 Mar 15 2005 software drwxrwxr-x 9 levy www 4096 Mar 16 2005 sosp16 -rw------- 1 lazowska www 2006 Oct 9 1998 staff drwxrwxr-x 3 beame ctheory 4096 Jun 1 2002 stoc96

Workload Characteristics • Most files are small • Median size ~= 4 kb • A few files are very large • A “heavy-tailed” distribution • Most files are read sequentially • Many files are quickly deleted • Windows NT: 80% of newly created files are deleted within 4 seconds

File System Implementation • Let’s start simple: • No directories • All files are at the “root” • Files are identified by a unique number

Blocks and Sectors • Disk exposes sectors (512 bytes) • Files are built from blocksof 1+ sectors • File system maps from “virtual” blocks (within a file) to physical disk blocks file 2 file 1 disk

déjà vu: File Systems versus Paging • Similarity: chunk-based allocation • Address spaces are built from pages • Files built from blocks • These are often the same size! • OS maintains the mapping between virtual and physical resources • Page tables map from virtual page to physical frame • File system maps from “virtual” block to physical disk block

Differences Between Paging and File Systems • Persistence • File system state must survive restarts • Translation performance • Virtual address translation must be very fast (done at processor speed) • Block mapping can be much slower • Layout issues • Disk performance is highly influenced by layout • Paging performance is (largely) unaffected • Any page frame is as good as any other • Files rarely have holes

Basic Disk Layout • Data region contains actual file data • Metadata region contains information about files and the file system • Block size • Block mappings (virtual block to physical block) • Protection information Metadata Data

Approach #1: Pre-allocated Disk Partitions • On file creation, carve out a contiguous disk allocation • Record the partition info in the meta-data region Note: this is exactly like base/limit registers for memory

Problems With Static Partitions • Must know (or guess) file size in advance • Penalty for getting this wrong is high • Tends to create external fragmentation • Space between partitions • Major advantage: perfect data layout • Contiguous layout is optimal for sequential reads and writes disk file 0 file 1 file 2 file 3 file 4

Alternative to Static Partitions • Allocate disk space lazily • Allow for block allocations that are not contiguous • Eliminates external fragmentation • But, results in sub-optimal data layout file Challenge: must keep track of virtual-to-physical block mappings disk

Approach #2: Block Tables (Silbershatz: Index Blocks) • In the meta-data region, maintain an array of block tables • Block table maintains the mappings from virtual file blocks to physical disk blocks … Block table for file 0 Block table for file 1 Block table for file 2 Block table for file 3

Possible Block Table Implementation block address virtual block # offset Disk data region Block 0 block table Block 1 physical address Block 2 Phys block # Phys block # offset Block 3 … Block 4 What does this remind you of?

Analyzing Block Tables • This is very close to what UNIX does! • “Block table” is called an inode • One remaining problem: choosing the block table size • Small size prohibits large files • Large size wastes space for small files • Solution: multi-level block-tables • Allocate a small number of mappings in the inode • Allow for indirection to supply mappings for larger files

UNIX i-nodes (Unix Version 7) • Each i-node contains 13 pointers • The first 10 are “direct” • Pointers to real data blocks • The 11th pointer is a “single indirect block” • A pointer to a block full of pointers to real data blocks • The 12th pointer is a “doubly indirect block” • A pointer to a block full of pointers to blocks full of pointers to real data blocks • The 13th pointer is a “triply indirect block” • You get the idea…

0 1 … 10 11 … … … … … … 12 i-nodes, Visualized Q: How is this different than multiple level page tables?

Checkpoint • What we have • Arbitrary size files that can grow and shrink dynamically • What we don’t have • File names • Directories

Completing the File System • Let’s create special files that contain the mappings from file names to numbers • Let’s call these files “directories”

UNIX Directory Implementation • Directories are implemented as files • Contains mappings from file names to I-nodes • Directories can contain other directories • This gives us the file system hierarchy • The root directory has a well-known I-node

Path name translation • Let’s say you want to open “/one/two/three.txt” fd = open(“/one/two/three.txt”, O_RDWR); • What goes on inside the file system? • Read the i-node for “/” • Read the directory contents for this i-node • Read the i-node for “one” • Read the directory contents for this i-node • Read the i-node for “two” • Read the directory contents for this i-node • Find the i-node for “three.txt • Create an open-file entry for this i-node

File Links • The same file can have multiple names • Because every file is uniquely identified by a number

Hard Link • A hard link is a mapping from a file name (path) to an i-node • Stored in a directory file • Each link refers to the same file • open (“foo.txt”) is equivalent to open (“bar.txt”) • What happens on deletion? • Each i-node contains a reference count • On link deletion, decrement the ref count • When the count reaches zero, the OS releases the file

Soft Links • Problems with hard links: • They can’t span file systems (why?) • They can’t refer to directories (why?) • Soft links address these issues • A soft link is a file containing a complete path • When the OS encounters a soft link, it re-writes the path to include the linked location • Note: soft links do not modify the i-node ref count • This makes it possible to have “broken” soft links

Summary • Files serve as a virtualized storage abstraction • Arbitrary size • Grow and shrink dynamically • The process of mapping from virtual to physical blocks resembles page tables • With some key differences • In UNIX, files are identified by number • Directories are files that map from names to numbers

File Systems

File Systems

Presentation Transcript

File Systems

File Systems

File Systems

File Systems

File Systems

File-Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems

File Systems