File Systems 6.1 Files 6.2 Directories 6.3 File system implementation 6.4 Example file systems Chapter 6
Files • Requirements for long term information storage: • Must store large amounts of data • Information stored must survive the termination of the process using it • Multiple processes must be able to access the information concurrently • Solution: Store information on disk or other external media in units called files. • The file system is a part of operating system that manages files.
Files • File naming - filename.file extention • File structure • unstructured sequence of bytes. MS-DOS/UNIX • record sequence. CP/M • B-tree.
File Naming Typical file extensions.
File Structure • Three kinds of files • byte sequence • record sequence • tree
Files • File types • Regular files contain user information either ASCII or binary. • Directories are system files for maintaining the structure of the file system. • Character special files are used to model serial I/O devices such as terminals, printers, and networks. • Block special files are used to model disks. • A UNIX executable file stats with a magicnumber, identifying the file as an executable file.
File Types (a) An executable file (b) An archive
File Access • Sequential access • read all bytes/records from the beginning • cannot jump around, could rewind or back up • convenient when medium was magnetic tape • Random access • bytes/records read in any order • essential for database systems • Two methods are used for specifying where to start reading. • read and then move file marker • move file marker (seek), then read
File Attributes • Operating systems associate extra information with each file, called file attributes. Possible file attributes
Create Delete Open Close Read Write Append Seek Get attributes Set Attributes Rename File Operations
Memory-Mapped Files (Ex11.c) • To facilitate access to files, systems provides system calls to map files into the address space of a running process and remove (unmap) the files from the address space. • File services as a backing store for the process and when the process finishes all mapped, modified pages are written back to their files. • Advantage: eliminate need for I/O. • Disadvantage: • difficult to know size of output file. In case of all zeroes, 10 0's ?? or 100 0's ?? • Mapped file modified by one process is read differently by another. Two processes need to see consistent views of the file. • file may be too large to fit.
Memory-Mapped Files (a) Segmented process before mapping files into its address space (b) Process after mapping existing file abc into one segment creating new segment for xyz
Directories • File systems have directories or folders to keep track of files. • A single-level directory has one directory (root) containing all the files. • A two-level directory has a root directory and user directories. • A hierarchical directory has a root directory and arbitrary number of subdirectories. • Two different methods are used to specify file names in a directory tree: • Absolute path name consists of the path from the root directory to the file. • Relative path name consists of the path from the current directory (working directory).
Directories - A single level directory system • A single level directory system • contains 4 files • owned by 3 different people, A, B, and C
Two-level Directory Systems Letters indicate owners of the directories and files
Hierarchical Directory Systems A hierarchical directory system
Directories • The path name would be written: • Winodws \usr\ast\mailbox • UNIX /usr/ast/mailbox • MULTICS >usr>ast>mailbox • Dot and dot dot are two special entries in the file system. • Dot (.) refers to the current directory. • Dot dot (..) refers to its parent.
Path Names A UNIX directory tree
Create Delete Opendir Closedir Readdir Rename Link Unlink Directory Operations
File System Implementation • File system layout: • MRB (Master Boot Record) is used to boot the computer. • The partition table gives the starting and ending addresses of each partition. • Partitions: • The first block, boot block, of the active partition is read in by the MRB program when the system is booted. • The superblock contains all the key parameters about the file system. • Free blocks information • i-nodes tells all about the file. • Root directory • Directories and files
File System Implementation A possible file system layout
File System Implementation • Implementing file storage is keeping track of which disk blocks go with which files. • Contiguous Allocation - store each file as contiguous block of data. • Advantage: • Simple to implement • Read performance is excellent • Disadvantage: • Disk fragmentation • The maximum file size must be known when file is created. • Example: CD-ROMs, DVDs, and write-once optical media • Linked List Allocation - keep linked list of disk blocks • Disadvantage: • random access slow • amount of data in a block not a power of 2
Implementing Files (a) Contiguous allocation of disk space for 7 files (b) State of the disk after files D and E have been removed
Implementing Files Storing a file as a linked list of disk blocks
File System Implementation • Linked List Allocation using an index - take table pointer word from each block and put them in an index table, FAT (File Allocation Table) in memory. • Disadvantage - entire table must be in memory all the time • I-node (index-node) lists the attributes and disk addresses of the file's blocks.
Implementing Files Linked list allocation using a file allocation table in RAM
Implementing Files An example i-node
Implementation directories • When a file is opened, the file system uses the path name to locate the directory entry. • The directory provides information needed to find the disk blocks. • disk address of the entire file (contiguous blocks) • the number of first block (linked list) • the number of i-node (i-node) • Where to store attributes? In directory or i-node?
Implementing Directories (a) A simple directory – MS-DOS/Windows fixed size entries disk addresses and attributes in directory entry (b) Directory in which each entry just refers to an i-node - UNIX
Implementation directories • Handling long file names in a directory: • Fixed-length names (Waste space) • In-line (When a file is removed, a variable-sized gap is introduced.) • Heap (The heap management needs extra effort.) • How to search files in each directory? • Linearly • Hash table • Cache the results of searches
Implementing Directories • Two ways of handling long file names in directory • (a) In-line • (b) In a heap
Shared files • A shared file is used to allow a file to appear in several directories. • The connection between a directory and the shared file is called a link. The file system is a Directed Acyclic Graph (DAG). • Problem: If directories contain disk address, a copy of the disk address will have to be made in directory B. What if A or B append the file, the new blocks will only appear in one directory.
Shared files • Solution: • Do not list disk block addresses in directories but in a little data structure. (use i-nodes) (hard link) • Create a new file of type link which contains the path name of the file to which it is linked symbolic linking
Shared Files File system containing a shared file
Shared files • ln file1 file2 • Problem of hard link - when should i-node be removed? Suppose A: rm file2 could set count = 1 and leave i-node intact. when count = 0, delete file and i-node. • Problem above does not occur in symbolic link because only the owner directory has a pointer to i-node. The problem is extra overhead in the traversing path. • Other problem is having multiple copies of a file may set copied when dumping an files onto a disk (tar). • do not descend path involving symbolic links.
Shared Files (a) Situation prior to linking (b) After the link is created (c)After the original owner removes the file
Disk space management • Strategies for storing an n byte file: • Allocate n consecutive bytes of disk space - segment • Allocate a number [n/k] blocks of size k bytes each - paging • Problem – if the file grows it will have to be moved on the disk, it is an expensive operation and causes external fragmentation. • Solution – All file systems chop files up into fixed-size blocks that need not to be adjacent.
Block size • When block size increase, disk space utilization decrease (space efficiency decrease and internal fragmentation). • When block size decrease, data transfer rate decrease (time efficiency decrease) • usual size k = 512bytes, 1k (UNIX), or 2k
Disk Space Management • Dark line (left hand scale) gives data rate of a disk • Dotted line (right hand scale) gives disk space efficiency • All files 2KB Block size
Block size • Example: disk with 131072 bytes per track. rotation time = 8.33 msec average seek time = 10 msec. time to read a block of k bytes = 10 + 8.33/2 + (k/131072) * 8.33 = 10 + 4.165 + k/131072 * 8.33 If k = 1 KB = 1024 bytes = 14.165 + 1024/131072 * 8.33 = 14.165 + 0.065 = 14.23 msec • Disk space efficiency = % of block used by data. • Observation: Assume that all files are 1 kbytes, on average 1/2 of last block is empty.
Keeping Track OF Free Blocks • Use linked list of disk blocks: each block holds as many free disk block numbers as will fit. • With 1 KB block and 32-bit disk block number 1024 * 8/32 = 256 disk block numbers 255 free blocks (and) 1 next block pointer. • Use bit-map: A disk with (n) blocks requires a bit map with (n) bits • Free blocks are represented by 1's • Allocated blocks represented by 0's • 16GB disk has 224 1-KB and requires 224 bits 2048 blocks • Using a linked list = 224/255 = 65793 blocks. However, these blocks can be freed up as the disk is filled up. • Bit map generally better if it can be kept completely in memory.
Disk Space Management (a) Storing the free list on a linked list (b) A bit map
Disk Space Management (a) Almost-full block of pointers to free disk blocks in RAM - three blocks of pointers on disk (b) Result of freeing a 3-block file (c) Alternative strategy (split the full block of pointers) for handling 3 free blocks - shaded entries are pointers to free disk blocks
Disk Space Management Quotas for keeping track of each user’s disk use
File System Reliability • The loss of a file system can be catastrophic. • Methods to safeguard a file system: • Bad Block Management • Backups • Bad Block Management • Hardware solution - dedicate a sector to a "bad block list“ when disk controller is initiated, the bad block list is read and a spare block is picked to replace each bad block. The mapping is recorded in the bad block list. • Software solution - user or file system carefully construct a file containing all the bad blocks
File System Reliability • Backups are made to handle: recover from disaster or stupidity. • Considerations of backups • Entire or part of the file system • Incremental dumps: dump only files that have changed • Compression • Backup an active file system • Security
File System Reliability • Two strategies can be used for dumping a disk to tape: • Physical dump: starts at block 0 to the last one. • Advantages: simple and fast • Disadvantages: backup everything • Logical dump: starts at one or more specified directories and recursively dumps all files and directories found that have changed since some given base date.
File System Reliability • A file system to be dumped • squares are directories, circles are files • shaded items, modified since last dump • each directory & file labeled by i-node number File that has not changed