1 / 63

File Systems: Design and Implementation

File Systems: Design and Implementation. Operating Systems Spring 2004. What is it all about?. File system is a service which supports an abstract representation of the secondary storage Supported by OS Why is a file system needed?

amber-bowen
Download Presentation

File Systems: Design and Implementation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. File Systems:Design and Implementation Operating Systems Spring 2004 OS Spring’04

  2. What is it all about? • File system is a service which supports an abstract representation of the secondary storage • Supported by OS • Why is a file system needed? • What is so special about the secondary storage (as opposed to the main memory)? OS Spring’04

  3. Memory Hierarchy OS Spring’04

  4. Small (MB/GB) Expensive Fast (10-6/10-7 sec) Volatile Directly accessible by CPU Interface: (virtual) memory address Large (GB/TB) Cheap Slow (10-2/10-3 sec) Persistent Cannot be directly accessed by CPU Data should be first brought into the main memory Main memory vs. Secondary storage OS Spring’04

  5. Some numbers… • 1GB=230 ~109 Bytes • 1TB=240 ~1012 (terabyte) • 1PB=250 ~1015 (petabyte) • 1EB=260 ~1018 (exabyte) • 232 ~ 4 x 109: Genome base pairs • 264 ~ 16 x 1018: Brain electrons • 2256 ~ 65,536 x 1072: Particles in Universe OS Spring’04

  6. Secondary storage structure • A number of disks directly attached to the computer • Network attached disks accessible through a fast network • Storage Area Network (SAN) • Simple disks • Smart disks OS Spring’04

  7. Internal disk structure OS Spring’04

  8. Data Access • Sector size is the minimum read/write unit of data (usually 1KB) • Access: (#surface, #track, #sector) • Smart disk drives hide out the internal disk layout • Access: (#sector) • Moving arm assembly (Seek) is expensive • Sequential access is x100 times faster than the random access OS Spring’04

  9. Overview • File system services • What user applications see • File system implementation • What the data on disk looks like, bit by bit • The runtime support of FS operations • The FS service and its implementation are deeply intertwined • Performance is the paramount issue for the file system implementation OS Spring’04

  10. File System services • File system is a layer between the secondary storage and the application • Presents the secondary storage as a collection of persistent objects with unique names, called files • Provides mechanisms for mapping the data between the secondary storage and the main memory OS Spring’04

  11. What is a file (קובץ) • File is a named persistent collection of data • Unstructured, sequential (UNIX) • Data is accessed by specifying the offset • Collection of records (database systems) • Supports associative access • give me all records with “Name=Yossi” • Attributes: owner, permissions, modification time, size, etc… OS Spring’04

  12. File system interface • File data access • READ: Bring a specified chunk of data from file into the process virtual address space • WRITE: Write a specified chunk of data from the process virtual address space to the file • CREATE, DELETE, SEEK, TRUNCATE • open, close, set_attributes • Many semantical issues: • Automatic size-extension • Holes • Persistence of open files • More … OS Spring’04

  13. Accessing File Data: File Control Block • A control structure, File Control Block (FCB), is associated with each file in the file system • Each FCB has a unique identifier (FCB ID) • UNIX: i-node, identified by i-node number • FCB structure: • File attributes • A data structure for accessing the file’s data OS Spring’04

  14. Accessing File Data • Given the file name • Get to the file’s FCB using the file system catalog • Use the FCB to get to the desired offset within the file data OS Spring’04

  15. Accessing File Data: Catalog • The catalog maps a file name to the FCB • Checks permissions • This can be done for each file data access • Inefficient: Do this once when the file is first referenced • file_handle=open(file_name): • search the catalog and bring FCB into the memory • UNIX: in-memory FCB: in-core i-node • close(file_handle): release FCB from memory OS Spring’04

  16. The Catalog Organization • FCBs are stored in predefined locations on the disk • UNIX: i-node list • Hierarchical structure: • Some FCBs are just a list of pointers to other FCBs • Directories • UNIX: directory is a file whose data is an array of (file_name, i-node#) pairs • Recursive mapping OS Spring’04

  17. Directories • Provide name to file mapping • May provide additional attributes per file • Different from regular files • Support operations like create, delete, list • Prevent duplicate names • May be organized as a hash table for efficient searching • Mostly common structure: hierarchy • Supports hierarchical pathnames OS Spring’04

  18. Searching the UNIX catalog • /a/b/c => i-node of /a/b/c • Get the root i-node: • The i-node number of ‘/’ is pre-defined (2) • Use the root i-node to get to the ‘/’ data • Search (a, i-node#) in the root’s data • Get the a’s i-node • Get to the a’s data and search for (b, i-node#) • Get the b’s i-node • Etc… • Permissions are checked all along the way • Each dir in the path must be (at least) executable OS Spring’04

  19. Extending the directory hierarchy • Multiple volumes • Unix: Mount/un-mount volume on directory • Transparent pathname traversal: in-core mount table, in-core i-node of mount point and or mounted root. • Remote volumes • Distributed file systems: Sun NFS, AFS/Coda, etc. OS Spring’04

  20. NFS • Collection of remote file service protocols • VFS: Virtual file system layer • Client: system call -> VFS -> local FS/NFS client • Server: system call/remote invocation -> VFS -> local FS • Compatible with most local FS implementations OS Spring’04

  21. VFS model • Unix-like file system services: files, directories, links, .. • Fhandle provides working-file capability, as well as file attributes • Remote mount provides a seamless name space • Lookup(path) instead of open • Lookup does not cross mount points (version 3) OS Spring’04

  22. RPC communication • Support for heterogeneous clients • Stateless server • No client caching, write-thru policy • No authenticated sessions • No persistence • fhandle must be unique • File locking handled separately by a lock manager • No server-failure recovery needed OS Spring’04

  23. NFS: Advanced issues • File sharing by multiple clients • Caching • Locking and fault tolerance • Security and access control OS Spring’04

  24. Sharing • Unix single machine: writes take immediate effect • File persistence on open • NFS version 3: • Write thru in principle • Session semantics in practice • File locking • Read/write lock, per file range of bytes • Wait queue with no callbacks • Share reservation • Supported to facilitate NFS on Windows clients OS Spring’04

  25. Fault Tolerance • RPC • Retransmit on timeouts • Suppress duplicates via duplicate-cache • Return cached-response on duplicate request • File locking • Version 4 issues leases with expiration and renewal • Introduce problems of clock synchronization, and renewal reliability OS Spring’04

  26. Allocating disk blocks to file data • Assume unstructured files • Array of bytes • Efficient offset -> disk block mapping • Efficient disk access for both sequential and random patterns • Minimizing number of long seeks • Efficient space utilization • Minimizing external/internal fragmentation OS Spring’04

  27. Static Contiguous Allocation • Allocate each file a fixed number of blocks at the creation time • #blocks is pre-defined or supplied as an argument • Efficient offset lookup • Only the block # of the offset 0 is needed • Efficient disk access • Inefficient space utilization • Internal, external fragmentation • No support for dynamic extension OS Spring’04

  28. Static Contiguous Allocation Catalog OS Spring’04

  29. Extent-based allocation • File gets blocks in contiguous chunks called extents • Multiple contiguous allocations • For large files, B-tree is used for efficient offset lookup OS Spring’04

  30. Extent-based allocation OS Spring’04

  31. Extent-based allocation • Efficient offset lookup and disk access • Support for dynamic growth/shrink • Dynamic memory allocation techniques are used (e.g., first-fit) • External/internal fragmentation may be a problem • Depending on the implementation, requirements, etc… OS Spring’04

  32. Single-block allocation • Extent-based allocation with a fixed extent size of one disk block • File blocks are scattered anywhere on the disk • Inefficient sequential access • UNIX block allocation • Linked allocation • MS-DOS File Allocation Table (FAT) OS Spring’04

  33. Block Allocation in UNIX • 10 direct pointers • 1 single indirect pointer: points to a block of N pointers to data blocks • 1 double indirect pointer: points to a block of N pointers each of which points to a block of N pointers to data blocks • 1 triple indirect pointer… • Overall addresses 10+N+N2+N3 disk blocks OS Spring’04

  34. Block Allocation in UNIX OS Spring’04

  35. Block Allocation in UNIX • Optimized for small files • Outdated empirical studies indicate that 98% of all files are under 80 KB • Poor performance for random access of large files (redirections) • No external fragmentation • Wasted space in pointer blocks for large sparse files • Modern UNIX implementations use the extent-based allocation OS Spring’04

  36. Linked Allocation • Each file is a linked list of disk blocks • Offset lookup: • Efficient for sequential access • Inefficient for random access • Access to large files may be inefficient as the blocks are scattered • Solution: block clustering • No fragmentation, wasted space for pointers in each block OS Spring’04

  37. Linked Allocation Catalog OS Spring’04

  38. File Allocation Table (FAT) • A section at the beginning of the disk is set aside to contain the table • Indexed by the block numbers on disk • An entry for each disk block (or for a cluster thereof) • FAT Entries corresponding to blocks belonging to the same file are chained • The last file block, unused blocks and bad blocks have special markings OS Spring’04

  39. FAT Catalog entry OS Spring’04

  40. FAT Pros and Cons • Improved random access • just search a small table instead of the whole disk • Inefficient sequential access • Seek back to the table and forth to the block for each file block! • Block allocation is easy • just find the first 0 marked block OS Spring’04

  41. Free space management • Disk bitmap: represent the disk block allocation as an array of bits • Bit for each disk block: 1 - non-allocated block, 0 - allocated block • Simple and efficient in finding free blocks • Wastes space on disk • Linked list of free blocks (UNIX) • Efficient for finding a single free block OS Spring’04

  42. File I/O • CPU cannot access the file data directly • Must be first brought to the main memory • Problem: • Scenario 1: user process reads a block, meanwhile the process gets swapped out of memory • Scenario 2: user process reads/writes 1 byte in block • Scenario 3: user process continuously reads/writes a file • Scenario 4: two processes access the same block • Solution: Read/Write mapping using buffer cache • Memory mapped files OS Spring’04

  43. Read/Write Mapping • File data is made available to applications via a pre-allocated main memory region • Buffer cache • The file systems transfers data between the buffer cache and disk in granularity of disk blocks • The data is explicitly copied from/to buffer cache to/from the application address space OS Spring’04

  44. Read/Write Mapping OS Spring’04

  45. Reading data (Disk block=1K) OS Spring’04

  46. Writing data (Disk block=1K) OS Spring’04

  47. Buffer Cache management • All disk I/O goes through the buffer cache • Both user data and control data (e.g., i-node) are cached • LRU replacement • Dirty (modified) marker to indicate whether write-back is needed OS Spring’04

  48. Advantages • Strict separation of concerns • Hiding disk access peculiarities from the user • Block size, memory alignment, memory allocation in multiples of the block size, etc… • Disk blocks are cached • Aggregation for small transfers (locality) • Block re-use across processes • Transient data might be never written to disk OS Spring’04

  49. Disadvantages • Extra copying • Disk->buffer cache->user space • Vulnerability to failures • Does not care about the user data blocks • The control data blocks (metadata) is the real problem • E.g., i-nodes, pointer blocks can be in cache when a failure occurs • As a result the file system internal state might be corrupted OS Spring’04

  50. Memory mapped files • A file (or a portion thereof) is mapped into a contiguous region of the process virtual memory • UNIX: mmap system call • Mapping operation is very efficient: • just marking • The access to file is governed by the virtual memory subsystem OS Spring’04

More Related