1 / 20

Log-structured File Systems

Log-structured File Systems. Myeongcheol Kim (mckim@dcslab.snu.ac.kr) School of Computer Science and Engineering Seoul National University. Contents. Motivation Log-structured File System Effective Sequential Write The Inode Map Garbage Collection Crash Recovery Summary. Motivation.

kessel
Download Presentation

Log-structured File Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Log-structured File Systems Myeongcheol Kim (mckim@dcslab.snu.ac.kr) School of Computer Science and Engineering Seoul National University

  2. Contents Motivation Log-structured File System Effective Sequential Write The Inode Map Garbage Collection Crash Recovery Summary

  3. Motivation • Growing memory size • Most of reads would be service in the cache. • File system performance would largely be determined by write performance. • Growing performance gap between random and sequential I/O • Rapid increase in transfer bandwidth • Slow decrease in seek and rotational delay costs • Workloads • FFS spreads information around the disk for a file. • A file creation incurs many small writes. • File systems were not RAID-aware.

  4. Log-structured File System (LFS) Question How to make all writes sequential write? • A file system developed at Berkeley in the early 90’s • A group led by Professor John Ousterhout and graduate student Mendel Rosenblum • Performance goal • High performance for small writes • Matching or exceeding performance for reads and large writes • Idea • Buffering all updates in an in-memory segment • Writing it in one long, sequential transfer

  5. Writing To Disk Sequentially D I blk[0]: A0 A0 • LFS never overwrites existing data, but rather always write segments to free locations. • Data block (D) and metadata (I: inode)

  6. Writing Sequentially And Effectively write_at(A) write_at(A+1) write_at(A+1) Trotation - δ T+Trotation T T+δ rotational delay Time • What if there is a delay between two sequential writes? • Simply writing in sequential order is not enough to achieve peak performance. • Write buffering • LFS buffers updates in an in-memory segment. • The segment is written to disk all at once. • As long as the segment is large enough, writes will be efficient.

  7. Writing Sequentially And Effectively blk[0]: A4 blk[0]: A0 blk[0]: A0 blk[0]: A4 D[k,0] D[j,1] D[j,2] D[j,0] In-memory segment blk[1]: A1 blk[1]: A1 blk[2]: A2 blk[2]: A2 Inode[k] Inode[j] D[j,1] D[j,2] D[k,0] D[j,0] Disk Inode[j] A4 Inode[k] A0 A1 A2 • Example • LFS buffers two sets of updates into a small segment.

  8. How Much To Buffer? Size of data (MB) Disk transfer rate (MB/s) Time to write Effective rate of writing Fraction of the peak rate (0 < F < 1) • Positioning overhead vs. transfer rate • A fixed amount of positioning overhead is paid for a write. • The more you write, the better you amortize the positioning cost.

  9. Problem: Finding Inodes • In a typical file system, finding inode is easy. • The location of inode table is fixed on the disk. • Array-based indexing with the given inode number • In LFS, it is more difficult. • Inodes are scattered throughout the disk. • The latest version of an inode keeps moving due to out-of-place write.

  10. Solution Through Indirection D blk[0]: A0 map[k]: A1 A0 I[k] imap A1 • The inode map (imap) • A level of indirection between inode numbers and the inodes • For a given inode number, it produces the disk address of the most recent version of the inode. • Location of the imap • Fixed location • Performance would suffer due to more disk seeks. • Moving imap • LFS writes the chunk of inode map right next to all the other new information.

  11. The Check Region D blk[0]: A0 map[k]: A1 A0 imap [k…k+N]: A2 CR I[k] imap A2 A1 0 • Checkpoint region (CR) • Resides in a fixed place (address 0) on disk • Contains pointers to the latest pieces of the inode map. • Reading a file from disk 1) Reading CR, reading in the entire inode map, and caching it in memory 2) Looking up inode-number to inode-disk-address mapping in the imap 3) Proceeding exactly as a typical UNIX file system

  12. What About Directories? D[k] I[k] blk[0]: A0 D[dir] I[dir] blk[0]: A2 imap map[k]: A1 (foo, k) map[dir]: A3 A0 A1 A2 A3 • Directory structure of LFS is identical to classic UNIX file systems. • A collection of mappings (name, inode number) • Recursive update problem • When the update of an inode entails an update to the directory • It mandates updates all the way up the file system tree. • LFS avoids this problem with inode map. • Only imap is updated while the directory holds the same name-to-inumber mapping.

  13. A New Problem: Garbage Collection blk[0]: A4 blk[0]: A4 blk[0]: A0 blk[0]: A0 D0 D0 I[k] I[k] D0 D1 I[k] I[k] A0 (both garbage) A4 A0 (garbage) A4 • Garbage collection • Freeing the dead blocks for use in a subsequent writes • LFS cleaner • Works on a segment-by-segment basis to prevent performance drop • Work flow 1) Reading periodically old (partially-used) segments 2) Determining the liveness of the blocks within the segments 3) Writing out a new set of segments with just the live blocks 4) Freeing up the old segments

  14. A New Problem: Garbage Collection (cont.) • Mechanism • How can LFS tell which blocks within a segment are live? • Policy • How often should the cleaner run? • Which segments should it pick to clean?

  15. Determining Block Liveness (N, T) = SegmentSummary[A]; inode = Read(imap[N]); if (inode[T] == A) // block D is alive else // block D is garbage blk[0]: A0 map[k]: A1 SS D I[k] imap A0:(k,0) A0 A1 • For each data block, LFS records the following in the segment summary block. • Inode number of the file it belongs to • Block offset within in the file • Liveness checking procedure

  16. Determining Block Liveness (cont.) • Optimization • Version number • Stored in the inode map for each inode (V1) • Stored in the each entry of segment summary block (V2) • Incremented when a file is truncated or deleted • If V1 doesn’t match V2, the block can be discarded immediately without examining the file’s inode.

  17. Policy: Which Blocks To Clean, And When? • When to clean? • Periodically • During idle time • When disk is full • Which blocks to clean? • Challenging question and the subject of many research papers • Example • Segregating hot and cold segment • Hot segment: waiting a long time before cleaning it • Cold segment: cleaning the segment sooner

  18. Crash Recovery And The Log • Checkpoint • A position in the log at which all of the file system structures are consistent and complete. • CR contains • Addresses of all the imap blocks • Timestamp • A pointer to the last segment written • Segments are written in a log and CR is updated periodically (every 30 seconds or so) • Crash could happen during • Writing to segment • Writing to the CR

  19. Crash Recovery And The Log (cont.) Discarded! imap: A2 TS: 15 imap: A9 TS: 20 CR0 CR1 imap imap imap: A2 TS: 15 imap: A9 TS: X CR0 CR1 imap imap … … … … A2 A9 0 A2 A9 0 • Crash during CR update operation • Timestamp is written at the end of the CR update. • Two CRs are maintained. • Most recent CR will always be chosen. • Crash during writing to a segment • Only updates recorded in the CR get recovered. • Last many seconds of updates would be lost. • Roll forward • Trying to recover as many valid updates as possible starting from the last checkpoint.

  20. Summary • LFS enables highly efficient writing by exploiting sequential bandwidth. • Gathering all updates into an in-memory segment • Writing them out together sequentially • Garbage collection • Mechanism and policy • Concern over cleaning costs became the focus of much controversy in LFS. • Fast and efficient crash recovery

More Related