Describes changes from 512-byte UNIX file system to 4.2 Berkeley Release

A Fast File System for UNIXMcKusick, Joy, Leffler, and FabryACM Transactions on Computer Systems, 2:3, August 1984, pp 181-197. Describes changes from 512-byte UNIX file system to 4.2 Berkeley Release

A Fast File System for UNIXMcKusick, Joy, Leffler and Fabry Original Unix File System : simple but slow • can achieve 20KB/sec throughput (2% of disk maximum throughput) I-nodes Data Blocks (512 bytes) Superblock • Superblock: contains basic information about the file system • I-nodes contain information about the • ownership • time-stamp • last modification etc. • direct/indirect pointers to data blocks • etc.

More on Old Unix File System • Each Disk divided into one or more partitions • Each partition may contain one file system • file system never spans multiple partitions. • File system described by its superblock – contains basic parameters of the file system • Number of data blocks in file system, count of maximum number of files, pointer to free list. • Within file system are files (!) • Some distinguished as directories and have pointers to files that may themselves be directories. • Every file has a descriptor associated with it called an inode. • Inode: contains info describing ownership of file, time stamps marking last modification and access times for the file, and an array of indices that point to the data blocks for the file. May also contain references to indirect blocks.

Old File System • 150-megabyte traditional UNIX file system consists of 4 megabytes of inodes and 146 megabytes of data. • The size of the blocks is too small - just 512 bytes • file index becomes too large • transfer rate is low • Consecutive blocks (of a file) not close together (suboptimal data block allocation) • Poor access timings for sequential searches! • I-nodes far from data blocks (segregation of I-nodes/data blocks) • Long seeks required to access a file • I-nodes of a directory not necessarily clustered • Poor performance for the “ls” command.

First Effort to Improve • Make the size of the data block bigger.. • Use 1024 bytes (instead of 512) • Speedup was somewhat > 2. • Each disk access accesses twice the amount of data • Most files were accessed without the help of “indirect” blocks (now direct blocks contained twice as much data as in the 512 page size case) • Throughout doubled but still only 4% of the disk throughput used! • Another (serious) problem affecting performance was the management of the list of Free Blocks. • Initially, was ordered (for optimal access) • Quickly it became scrambled.. • The latter forced long seeks for reading blocks • 175 kbytes/sec to 30 kbytes/sec. • Only solution: dump, rebuild, and restore file system

Free list Management List List Free Block Allocated Block

New System • Overview • Optimizing Storage Utilization • File System Parameterization • Layout Policies

New File System: Overview • Each drive contains one or more partitions • A file system “lives” in a disk partition • Superblock (info that does not change) gets replicated to protect against loss • The size of the data block is set to 4096 (achieves files of 2^32 size with only two levels of indirection) • The size of data block is kept at the superblock • Possible for file systems with different block sizes to be simultaneously accessible on the same system. • Block size has to be decided at file system creation

New File System • Disk partitions are divided into cylinder groups: • One or more consecutive cylinders on disk. • Contain • superblock • I-nodes • bitmaps of free blocks • usage summary information • Switch from Free-Block List management to bit-map • Bitmap per cylinder: 00011100101011100111111 • Easier to find contiguous free blocks • For each cylinder group one I-node is allocated per 2048 bytes of disk space; should be more than enough

New File System Cylinder Group 2 Cylinder Group 1 • Placement of cylinder bookkeeping information • At the beginning: all redundant info would be at the top platter (bad for hardware failure – all bookkeeping info vanishes). • Bookkeeping information could be placed on a “spiral-down” fashion. • Any single track, cylinder, or platter can be lost without losing all copies of the superblock. • Data blocks can be placed between the start of the cylinder and the cylinder group information (except for the first cylinder group). Disk Head Assembly

Optimizing Storage Utilization • Large block sizes (4096 or 8192 bytes) could help in transferring volumes of data together, thus increasing disk throughput • The problem is that most Unix files are of small size • Out of a 920 Megabytes FS… Block size Large Blocks do not really solve the problem as they create a LOT of waste

How to solve the problem of Storing Small Files • Large blocks can be chopped into small segments • These segments are called FRAGMENTS • Fragments are used for small files • They are individually addressable • Every file certainly ends with a fragment(s). • Limit number of fragments to 2, 4, 8 per data block • Lower bound is 512 bytes per fragment • Size of the map increases Bits in Map Example of blocks/fragments in a 4096/1024 FS; each bit records Status of a fragment • Space allocated to a file when program does a write system call. System checks if size has increased; if so …

Space Allocation to a File If a file needs to be extended to hold new data: • There is enough space left in already allocated block or fragment to hold the new data. The new data are written into available space • File contains no fragmented blocks; last block does not have enough room • Allocate as many full blocks are needed. • For the last block, allocate as many fragments as needed • File contains one or more frag’s (but not enough to hold new data) • Unite the fragments and new data and do as in step (2) • Problem with expanding a file one fragment at a time is that data may be copied many times as fragm’ted block expands to full block • Fragment reallocation can be minimized if writes work with FULL blocks at a time (except partial blocks at the end of the file). • Since file systems with different block sizes may reside on the same system, file system interface extended to provide application programs the optimal size for a read or write. • Optimal size for FS writes • Block size of the FS from which the file is accessed. • For pipes/sockets, the size of the underlying buffer.

Cylinder Groups • Keep I-nodes close to their data blocks • I-nodes pertinent to directories should be kept together • Think of cylinder group as “small(er)” Unix FS • Locality is IMPORTANT in achieving better performance • Do not let disk partitions fill up • Free_Space_Reserve should be always less than 90(or so) • Otherwise, FS throughput falls into less than half • Spread dissimilar things far apart • This creates space for related files to be clustered • Minimize seek latency

File System Parameterization • Parameterize processor capabilities and mass storage characteristics so FFS can take advantage. • Rotationally optimal blocks • If need to do an interrupt need to allow time for rotation. • Typically not needed if have an I/O channel. • Cylinder group summary info includes a count of the available blocks in a cylinder at different rotational positions (at some resolution)

Locality in the Berkeley System • Maintain files within a directory in the same cylinder group • Keep locality of inodes in a directory • Keep locality of files in a directory • Spread directories out among cylinder groups • Allocate runs of blocks within a cylinder group

Layout Policies: Global & Local • Global Layout Policies try to cluster related information && spread unrelated data: • Layout policy tries to place files of a directory to a cylinder group • A new directory is placed in a cylinder group that has the greater than average number of free I-nodes AND the smallest number of directories in it. • Data blocks of a file are accessed together • The placement routines try to put all pages in the same cylinder group (preferably a rotationally optimal positions) • Avoid “over-localization” as local cylinder groups may run out of space (forcing data to scatter over to other cylinder groups) • Over-localization (taken to extreme) may yield a huge single cluster (similar to the OLD FS).

Local Layout Policies Local policies: • Handle requests for specific blocks • If available. Simply use them • Otherwise, check a sequence of alternatives The four level allocation strategy used is: • Get next block that is the next Rotationally Optimal Block. • If no such block exists in the cylinder, use the next block rotationally close on the same cylinder group. • If the cylinder group is full, rehash on cylinder group to choose another group(to look for a new block). • If the above fails, use an exhaustive search for all cylinder groups.

Performance Evaluation • Run ls for deep filesystems: factor of 2 improvement in disk accesses. • Only files: factor of 8 • Transfer rates do not change over time. Much more tightly tied to free space. When full goes down by factor 2 • Reads and writes faster: • Biggest factor is block size • Overhead greater, but fewer blocks • For large files, it is shown that 20-40% of disk bandwidth can be achieved. • Compared to original Unix FS, 10-20 times improvement • Small files display better performance

Enhancements • Long File Names • almost arbitrary length • File Locking (with flock) • Old file system had no provisions for locking files. • Had to use a separate “lock file.” • Kludgy • Hard Locks v. Advisory Locks. • Implemented advisory (since sysadmin has to override) • Exclusive v. Shared • No deadlock detection attempted.

Enhancements • Symbolic Links • Previous: Multiple directory entries in the same file system to reference a single file. • Each directory entry “links” a file’s name to an inode and its contents. • Inodes do not reside in directories, but exist separately and are referenced by links. • When all links to an inode removed, inode is deallocated. • Does not allow references across different file systems or intermachine linkage. • Solution: symbolic links. • Symbolic link implemented as a file that contains a pathname. • When system encounters, prepends it and name interpreted.

Describes changes from 512-byte UNIX file system to 4.2 Berkeley Release

Describes changes from 512-byte UNIX file system to 4.2 Berkeley Release

Presentation Transcript

The UNIX File System

The UNIX File System (1)

Unix File System API

File System of Unix

File System – Unix baed

UNIX File System (UFS)

The UNIX File System

A FAST FILE SYSTEM FOR UNIX

Unix File System

Unix File System Internal Structures

The UNIX File System

UNIX File System

File System in UNIX

The Unix File System

UNX122 – Unix File System

The UNIX File System

UNIX File System

The UNIX File System

UNIX FILE SYSTEM

UNIX File System Layout

Unix File System API

UNIX FILE SYSTEM