CE01000-3Operating Systems Lecture 19 Linux/Unix – File System
Overview of lecture In this lecture we will look at: Files in Linux/Unix – Ordinary files, Directory files, Special files Virtual File System System V Free data blocks inode free lists inode disk location mapping Linux Second Extended File System (Ext2FS) Disk block groups
Overview of Unix/Linux approach Philosophy - attempt to treat all resources to which you can output and from which you can input like ordinary data files Unix/Linux has 3 types of files ordinary files - for ordinary user data/programs, etc directory files - to structure file system special files - for access to I/O and system devices these are all arranged in a single tree structured hierarchy
Ordinary files Ordinary files contain a linear array of bytes - no structure is imposed on bytes by Unix/Linux reads and writes start at a file pointer file pointer can be moved anywhere multiple programs can read/write concurrently to one file (but order of access unpredictable)
Ordinary files (Cont.) although user gives names to ordinary files they are identified within system by inode numbers which index an array of data structures called inodes held on the disk inodes contain administrative information about file and location information for data of file where appropriate. Specifically it contains,
Ordinary files (Cont.) files device and inode numbers file type (ordinary, directory, special, etc.) link count file owner’s user and group ids file access permissions major and minor device numbers (for special files) time of last access/modification/status change pointers to disk blocks of file contents if data file
Directory files allow file access by name from directory table table provides logical grouping together of files as defined by user table provides translation between name and inode number inode number and file name constitute a link
Directory files (Cont.) directory is stored like ordinary files but have the directory type in the inode. This allows a link in one directory to refer to another allowing tree structures to be built inode numbers only unique to a given partition of a disk - each partition has its own inode array symbolic links permit references to files that cross partition boundaries - symbolic link is a special file type - consists of a simple text file that holds absolute pathname of linked file - access to file is indirect (via pathname in link) rather than direct to inode
Directory files (Cont.) each file has 2 types of path name - absolute (from root) and relative (from current directory) directory files can only be written to with special system calls multiple links to a file are allowed - number of links to file held in link count in inode when removing a link from a directory other links may still exist, link count just decremented. Only when link count falls to 0 is inode entry and file data blocks released back to system to reuse.
special files there are 3 main types character special files block special files FIFO special files (these are named pipes) the first 2 are primarily used to access input/output devices (disks, terminals, printers, etc.) some devices (disks for example) have both character and block special files
special files (Cont.) device files have inodes but these contain no reference to data blocks instead of data blocks major and minor device numbers are stored to access a device driver adding new device drivers is relatively easy it is a popular way of adding new facilities into the Unix/Linux kernel modern systems also support symbolic links to special files
File systems disks - in general are split into convenient sized partitions - each physical disk having one or more partitions each partition will have a Unix file system imprinted on it with a partition root directory and its own directory tree underneath Unix/Linux makes these individual file systems appear to be one directory hierarchy by mounting the root of one file system over a leaf directory in another and making the join appear seamless to user
File System Internals All the disk I/O we have looked at so far has been done via the kernel’s file system using abstractions like file, I-node and directory When a disk file system is mounted into the directory structure the disk has this abstract model built into it It is also possible to access the physical disks directly via a device special file in the /dev directory
File system internals (Cont.) This bypasses all the file abstractions and allows access to the underlying device itself. In practice most users are denied this low level access by the correct setting of the device special file permission bits. This is necessary to prevent unauthorised access to programs and data on the disk partitions.
Virtual File System (Cont.) Linux can support a number of different file systems – Second/Third Extended Filesystem (Linux default filesystems), plus e.g. Network File System(NFS), MS-DOS, Unix System V, HPFS (O/S2/NT), Minix (original Linux filesystem),etc. it does this by using a virtual file system layer file system calls are passed onto the VFS which translates the call into the call appropriate to the underlying file system the VFS specifies a set of functions that every file system it can support, has to provide implementations for
Virtual File System (Cont.) VFS uses a table defined during kernel configuration. Each table entry defines a file system type - file system type name and pointer to function called to mount that type of file system When file system is mounted, table is looked up and appropriate function called to mount the file system this function returns a mounted file system descriptor to VFS
Virtual File System (Cont.) a mounted file system descriptor includes pointers to functions provided by physical file system kernel code the VFS then uses descriptor to access the file system internal routines The VFS also maintains 2 other types of descriptors - inode descriptors and open file descriptors
Virtual File System (Cont.) each descriptor contains info about files in use and the set of operations provided by physical file system code inode descriptor contains pointers to functions that act on any file (e.g. creat(), unlink()) the file descriptor contains pointers to functions that can only act on an open file (e.g. read(), write())
Unix System V System V filesystem is the classic early Unix filesystem on a System V filesystem disk partition, the disk blocks would have the following layout:
Unix System V (Cont.) boot block (block 0) - holds boot up code if bootable partition, if not then left unused super block (block 1) - contains information about the file system as a whole, especially: the free block list (a list of all the data blocks that are free for use) a free inode list - for rapid allocation of inodes
Unix System V (Cont.) inodes (block 2 to n) - disk blocks that hold inodes for filesystem - number of disk blocks allocated to inodes is fixed when filesystem created (hence max number of inodes fixed for given partition) data blocks (blocks n+1 to end) - disk blocks for actual file/directory data
Free data block list initially list of free data blocks fills space in super block, then overflow is held in free data blocks with last block number in super block or each subsequent data block giving a linked list
Free data block list (Cont.) allocation of free blocks proceeds from the first free block entry - this repeated until only one free block entry left in super block prior to allocation of that data block it’s contents are copied into super block when data blocks become free they are added to list in super block, but if super block is full then super block entries are copied to newly freed data block and replaced by a single entry in super block to new data block
Free inode list A free inode will have a flag set within struct to indicate whether or not it is free so could conduct a simple linear search through inodes to find free inodes - but inefficient super block contains a list of free inodes - these are allocated from first to last until list is empty the inodes are then searched from number of last free inode in list refilling inode list in super block when inode freed that is lower number than those in free list then replace last inode number with this new number so the next search through inodes will start from new location
inode data block pointers (Cont.) Mechanism for locating file data blocks from inode for file. system V inode contains 13 pointers: the first 10 point directly at file data blocks, the 11th pointer (single indirect) points at a data block that contains pointers to file data blocks (typically 256 pointers) the 12th pointer (double indirect) points at a data block that contains pointers to data blocks that contain pointers to file data blocks
inode data block pointers (Cont.) the 13th pointer (triple indirect) points at a data block that contains pointers to data blocks that contain pointers to data blocks that contain pointers to file data blocks this arrangement means that small files (the majority of files) can be accessed very rapidly, whereas larger files may take a larger number of disk accesses to locate their blocks.
Second extended filesystem Ext2fs has 15 pointers in the inode - 12 direct, 1 indirect, 1 double indirect and 1 triple indirect Ext2fs does not have fixed size records for file names, but variable length records permitting long file names (max 255 chars) without wasting space
Second extended filesystem (Cont.) disk partition structure changed - to consist in boot block followed by several block groups
Disk block groups each block group contains duplicates of critical info especially super block and the file system descriptors - improves reliability
Disk block groups (Cont.) each block group holds some of the inodes and file data blocks (attempts to keep inodes and their data blocks in close proximity) Also free inode list and free data block list replaced by an inode bitmap and block bitmap
Disk block groups (Cont.) bitmap has a single bit for each inode or block in the block group. If bit is set to 1 then inode or block is free, if set to 0 then inode/block is in use. Allocation/deallocation consists in searching through a bit list setting values appropriately. Bitmaps are small and can thus be held in memory during mounting of filesystem - very fast to search.
Third Extended File System • Ext3fs replaces Ext2fs – major improvements over Ext2fs • Provides ability to log intended changes to filesystem – makes recovery from crashes/failures easier • Allows Htree indexing of large directories • Htree is a form of binary tree which allows very quick searching
References Operating System Concepts. Chapter 22 & Appendix C.