Outline for Today

Outline for Today • Objective • Metadata complications • More on naming • Attribute-based file naming:“Why can’t I find my files?” • Administrative • Not yet.

File size File type Protection - access control information History: creation time, last modification,last access. Location of file - which device Location of individual blocks of the file on disk. Owner of file Group(s) of users associated with file Metadata

Operations on Directories (UNIX) • link (oldpathname, newpathname) - make entry pointing to file • unlink (filename) - remove entry pointing to file • mknod (dirname, type, device) - used (e.g. by mkdir utility function) to create a directory (or named pipe, or special file) • getdents(fd, buf, structsize) - reads dir entries

Metadata & Performance • There are two popular approaches for improving the performance of metadata operations and recovery: • Journaling • Soft Updates • Journaling systems record metadata operations on an auxiliary log • Soft Updates usesordered writes(Ganger & Patt, OSDI 94)

Metadata Operations • Metadata operations modify thestructureof the file system • Creating, deleting, or renamingfiles, directories, or special files • Data must be written to disk in such a way that the file system can be recovered to a consistent state after a system crash

General Rules of Ordering • Never point to a structure before it has been initialized (inode < direntry) • Never re-use a resource before nullifying all previous pointers to it • Never reset the old pointer to a live resource before the new pointer has been set (renaming)

Metadata Integrity • FFS uses synchronous writes to guarantee the integrity of metadata • Any operation modifying multiple pieces of metadata will write its data to disk in a specific order • These writes will beblocking • Guarantees integrity and durability of metadata updates

Deleting a file i-node-1 abc def i-node-2 ghi i-node-3 Assume we want to delete file “def”

Deleting a file i-node-1 abc ? def ghi i-node-3 Cannot delete i-node before directory entry “def”

Deleting a file • Correct sequence is • Write to disk directory block containing deleted directory entry “def” • Write to disk i-node block containing deleted i-node • Leaves the file system in a consistent state

Creating a file i-node-1 abc ghi i-node-3 Assume we want to create new file “tuv”

Creating a file i-node-1 abc ghi i-node-3 tuv ? Cannot write directory entry “tuv” before i-node

Creating a file • Correct sequence is • Write to disk i-node block containing new i-node • Write to disk directory block containing new directory entry • Leaves the file system in a consistent state

Synchronous Updates • Used by FFS to guarantee consistency of metadata: • All metadata updates are done through blocking writes • Increases the cost of metadata updates • Can significantly impact the performance of whole file system

SOFT UPDATES • Use delayed writes (write back) • Maintain dependency informationabout cached pieces of metadata: This i-node must be updated before/after this directory entry • Guarantee that metadata blocks are written to disk in the required order

First Problem • Synchronous writes guaranteed that metadata operations were durable once the system call returned • Soft Updates guarantee that file system will recover into a consistent state but not necessarily the most recent one • Some updates could be lost

Second Problem • Cyclical dependencies: • Same directory block contains entries to be created and entries to be deleted • These entries point to i-nodes in the same block

Example def --- i-node-2 ---------- NEW xyz NEW i-node-3 Block A Block B We want to delete file “def” and create new file “xyz”

Example • Cannot write block A before block B: • Block A contains a new directory entry pointing to block B • Cannot write block B before block A: • Block A contains a deleted directory entry pointing to block B

def The Solution • Roll back metadata in one of the blocks to an earlier, safe state (Safe state does not contain new directory entry) --- Block A’

The Solution • Write first block with metadata that were rolled back (block A’ of example) • Write blocks that can be written after first block has been written (block B of example) • Roll forward block that was rolled back • Write that block • Breaks the cyclical dependency but must nowwrite twice block A

Journaling • Journaling systems maintain an auxiliary log that records all meta-data operations • Write-ahead loggingensures that the log is written to diskbefore any blocks containing data modified by the corresponding operations. • After a crash, can replay the log to bring the file system to a consistent state

Journaling • Log writes are performed in addition to the regular writes • Journaling systems incur log write overhead but • Log writes can be performed efficiently because they are sequential • Metadata blocks do not need to be written back after each update

Journaling • Journaling systems can provide • same durability semantics as FFS if log is forced to disk after each meta-data operation • the laxer semantics of Soft Updates if log writes are buffered until entire buffers are full • Will discuss two implementations • Log to file • Write Ahead File System

Log-to-File • Maintains a circular log in a pre-allocated file in the FFS (about 1% of file system size) • Buffer manager uses a write-ahead logging protocol to ensure proper synchronization between regular file data and the log

Log-to-File • Buffer header of each modified block in cache identifies the first and last log entries describing an update to the block • System uses • First item to decide which log entries can be purged from log • Second item to ensure that all relevant log entries are written to disk before the block is flushed from the cache

WAFS • Implements its log in an auxiliary file system:Write Ahead File System (WAFS) • Can be mounted and unmounted • Can append data • Can return data by sequential or keyed reads • Keys for keyed reads are log-sequence-numbers (LSNs) that correspond to logical offsets in the log

WAFS • Log is implemented as a circular buffer within the physical space allocated to the file system. • Buffer header of each modified block in cache contains LSNs of first and last log entries describing an update to the block

WAFS • Major advantage of WAFS is additional flexibility: • Can put WAFS on separate disk drive to avoid I/O contention • Can even put it in NVRAM • Normally usessynchronous writes • Metadata operations are persistent upon return from the system call • Same durability semantics as FFS

Recovery • Superblock has address of last checkpoint • LFFS-file has frequent checkpoints • LFFS-wafs much less frequent checkpoints • First recover the log • Read then the log from logical end (backward pass) and undo all aborted operations • Do forward pass and reapply all updates that have not yet been written to disk

Other Approaches • Using non-volatile cache (Network Appliances) • Ultimate solution: can keep data in cache forever • Additional cost of NVRAM • Simulating NVRAM with • Uninterruptible power supplies • Hardware-protected RAM (Rio): cache is marked read-only most of the time

Other Approaches • Log-structured file systems • Not always possible to write all related meta-data in a single disk transfer • Sprite-LFS adds small log entries to the beginning of segments • BSD-LFS make segments temporary until all metadata necessary to ensure the recoverability of the file system are on disk.

Feature Comparison

Summary of Journaling vs. Soft Updates • Journaling alone is not sufficient to “solve” the meta-data update problem • Cannot realize its full potential when synchronous semantics are required • When that condition is relaxed, journaling and Soft Updates perform comparably in most cases

File size File type Protection - access control information History: creation time, last modification,last access. Location of file - which device Location of individual blocks of the file on disk. Owner of file Group(s) of users associated with file Extending Metadata • <attribute, value> pairs

A Naming Problem usr Find the lecture where metadata was discussed project coursearchive cwd spring02 fall99 spring01 fall00 spring00 cps210 fall01 fall02 fall03 spring99 cps110 cps210 cps110 cps110 cps210 cps210 cps110 cps110

spring02 fall99 spring01 fall00 spring00 fall01 fall02 fall03 spring99 A Naming Problem usr Find the lecture where metadata was discussed project coursearchive cwd cps210 … cps110

A Naming Problem usr Find the lecture where metadata was discussed project coursearchive cps210 cwd spring02 fall99 spring01 fall00 spring00 cps210 spring02 fall01 fall02 fall03 spring99 cps110 cps210 spring01 cps110 cps110 cps210 cps210 spring00 spring99 cps110 cps110 With symbolic links

A Naming Problem • It gets worse: /home/home5/carla/talks 2 laptops (one lives at work, one at home) desktop machine at home • Forest not a tree! • Growing more like kudzu

Attributes in File Systems • Metadata: <category, value> • How to assign? • User provided – too much work • Content analysis – restricted by formats • Semantic file system provided transducers • Context analysis • Access-based or inter-file relationships • Once you have them • Virtual directories – “views” • Indexing

Virtual Directories usr Find the lecture where metadata was discussed project coursearchive Query: <class, cps210> cwd spring02 fall99 spring01 fall00 spring00 cps210 spring02 fall01 fall02 fall03 spring99 cps110 cps210 spring01 cps110 cps110 cps210 cps210 spring00 spring99 cps110 cps110 Automated symbolic links

Versions? Virtual Directories usr Find the lecture where metadata was discussed project Query: <type, ppt> AND<topic, files> coursearchive cwd spring02 fall99 spring01 fall00 spring00 cps210 fall01 fall02 fall03 spring99 cps110 cps210 raid.ppt cps110 cps110 cps210 cps210 cps110 cps110 Lecture10.ppt Lecture10.ppt metadata.ppt

Issues with Virtual Directories • What if I want to create a file under a virtual directory that doesn’t have a path location already? • How does the system maintain consistency? We should make sure that when a file changes, its contents are still consistent with the query. • What if somewhere a new file is created that should match the query and be included? • What if currently matching file is changed to not match? • How do I construct a query that captures exactly the set of files I wish to group together?

Example: HAC File System(Gopal & Manber, OSDI99) • Semantic directories created within the hierarchy (given a pathname in the tree) by issuing a query over the scope inherited from parent • Physically exist as directory files containing symlinks • Creates symbolic links to all files that satisfy query • User can also explicitly add symbolic links to this semantic directory as well as remove ones returned by the query as posed. • Query is a starting point for organization. • Reevaluate queries whenever something in scope changes..

Context-based Relationships • Premise: Context is what user might remember best. • Previous work • Hoarding for disconnected access(inter-file relationships) • Google: textual context for link and feedback from search behavior (assumption of popularity over many users)

Access-based • Use context of user’s session at access time • Application knowledge – modify apps to provide hints • Example: subject of email associated with attached file • Feedback from “find” type queries • Searches are for rarely accessed files and usually only one user – limits statistical info

Inter-file • Attributes can be shared/propagated among related files • Determining relationships • User access patterns – temporal locality • Inter-file content analysis • Similarity – duplication -- hashing • Versions

Challenges • Mechanisms • Storage of large numbers of attributes that get automatically generated • User interface • Context switches • Creating false positive relationships

Background: Inter-file Relationships

Hoarding - Prefetching for Disconnected Information Access • Caching for availability (not just latency) • Cache misses, when operating disconnected, have no redeeming value. (Unlike in connected mode, they can’t be used as the triggering mechanism for filling the cache.) • How to preload the cache for subsequent disconnection? Planned or unplanned. • What does it mean for replacement?

Outline for Today