1 / 27

File Systems

File Systems. CSE451 Andrew Whitaker. Outline. File System Interface The programmer/user’s perspective File System Implementation. File System Goal #1. Allow a single disk (or partition) to be treated as many smaller storage containers Files can have arbitrary size

psikes
Download Presentation

File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Systems CSE451 Andrew Whitaker

  2. Outline • File System Interface • The programmer/user’s perspective • File System Implementation

  3. File System Goal #1 • Allow a single disk (or partition) to be treated as many smaller storage containers • Files can have arbitrary size • Files can grow and shrink • Size is not stated up front

  4. “path” File System Goal #2 • Provide a hierarchical name-space for referring to files • Key idea: directories as containers for files / home/ var/ tmp/ usr/ chris andrew kris

  5. File System Goal #3 • Protected sharing of information • Allow users / programs to share data • Provide access control mechanisms to limit sharing drwxr-xr-x 4 gaetano www 4096 Mar 15 2005 sewpc drwxrwx--x 4 zahorjan www 4096 Mar 15 2005 software drwxrwxr-x 9 levy www 4096 Mar 16 2005 sosp16 -rw------- 1 lazowska www 2006 Oct 9 1998 staff drwxrwxr-x 3 beame ctheory 4096 Jun 1 2002 stoc96

  6. Workload Characteristics • Most files are small • Median size ~= 4 kb • A few files are very large • A “heavy-tailed” distribution • Most files are read sequentially • Many files are quickly deleted • Windows NT: 80% of newly created files are deleted within 4 seconds

  7. File System Implementation • Let’s start simple: • No directories • All files are at the “root” • Files are identified by a unique number

  8. Blocks and Sectors • Disk exposes sectors (512 bytes) • Files are built from blocksof 1+ sectors • File system maps from “virtual” blocks (within a file) to physical disk blocks file 2 file 1 disk

  9. déjà vu: File Systems versus Paging • Similarity: chunk-based allocation • Address spaces are built from pages • Files built from blocks • These are often the same size! • OS maintains the mapping between virtual and physical resources • Page tables map from virtual page to physical frame • File system maps from “virtual” block to physical disk block

  10. Differences Between Paging and File Systems • Persistence • File system state must survive restarts • Translation performance • Virtual address translation must be very fast (done at processor speed) • Block mapping can be much slower • Layout issues • Disk performance is highly influenced by layout • Paging performance is (largely) unaffected • Any page frame is as good as any other • Files rarely have holes

  11. Basic Disk Layout • Data region contains actual file data • Metadata region contains information about files and the file system • Block size • Block mappings (virtual block to physical block) • Protection information Metadata Data

  12. Approach #1: Pre-allocated Disk Partitions • On file creation, carve out a contiguous disk allocation • Record the partition info in the meta-data region Note: this is exactly like base/limit registers for memory

  13. Problems With Static Partitions • Must know (or guess) file size in advance • Penalty for getting this wrong is high • Tends to create external fragmentation • Space between partitions • Major advantage: perfect data layout • Contiguous layout is optimal for sequential reads and writes disk file 0 file 1 file 2 file 3 file 4

  14. Alternative to Static Partitions • Allocate disk space lazily • Allow for block allocations that are not contiguous • Eliminates external fragmentation • But, results in sub-optimal data layout file Challenge: must keep track of virtual-to-physical block mappings disk

  15. Approach #2: Block Tables (Silbershatz: Index Blocks) • In the meta-data region, maintain an array of block tables • Block table maintains the mappings from virtual file blocks to physical disk blocks … Block table for file 0 Block table for file 1 Block table for file 2 Block table for file 3

  16. Possible Block Table Implementation block address virtual block # offset Disk data region Block 0 block table Block 1 physical address Block 2 Phys block # Phys block # offset Block 3 … Block 4 What does this remind you of?

  17. Analyzing Block Tables • This is very close to what UNIX does! • “Block table” is called an inode • One remaining problem: choosing the block table size • Small size prohibits large files • Large size wastes space for small files • Solution: multi-level block-tables • Allocate a small number of mappings in the inode • Allow for indirection to supply mappings for larger files

  18. UNIX i-nodes (Unix Version 7) • Each i-node contains 13 pointers • The first 10 are “direct” • Pointers to real data blocks • The 11th pointer is a “single indirect block” • A pointer to a block full of pointers to real data blocks • The 12th pointer is a “doubly indirect block” • A pointer to a block full of pointers to blocks full of pointers to real data blocks • The 13th pointer is a “triply indirect block” • You get the idea…

  19. 0 1 … 10 11 … … … … … … 12 i-nodes, Visualized Q: How is this different than multiple level page tables?

  20. Checkpoint • What we have • Arbitrary size files that can grow and shrink dynamically • What we don’t have • File names • Directories

  21. Completing the File System • Let’s create special files that contain the mappings from file names to numbers • Let’s call these files “directories”

  22. UNIX Directory Implementation • Directories are implemented as files • Contains mappings from file names to I-nodes • Directories can contain other directories • This gives us the file system hierarchy • The root directory has a well-known I-node

  23. Path name translation • Let’s say you want to open “/one/two/three.txt” fd = open(“/one/two/three.txt”, O_RDWR); • What goes on inside the file system? • Read the i-node for “/” • Read the directory contents for this i-node • Read the i-node for “one” • Read the directory contents for this i-node • Read the i-node for “two” • Read the directory contents for this i-node • Find the i-node for “three.txt • Create an open-file entry for this i-node

  24. File Links • The same file can have multiple names • Because every file is uniquely identified by a number

  25. Hard Link • A hard link is a mapping from a file name (path) to an i-node • Stored in a directory file • Each link refers to the same file • open (“foo.txt”) is equivalent to open (“bar.txt”) • What happens on deletion? • Each i-node contains a reference count • On link deletion, decrement the ref count • When the count reaches zero, the OS releases the file

  26. Soft Links • Problems with hard links: • They can’t span file systems (why?) • They can’t refer to directories (why?) • Soft links address these issues • A soft link is a file containing a complete path • When the OS encounters a soft link, it re-writes the path to include the linked location • Note: soft links do not modify the i-node ref count • This makes it possible to have “broken” soft links

  27. Summary • Files serve as a virtualized storage abstraction • Arbitrary size • Grow and shrink dynamically • The process of mapping from virtual to physical blocks resembles page tables • With some key differences • In UNIX, files are identified by number • Directories are files that map from names to numbers

More Related