UNIX

UNIX – The Kernel

UNIX Internals: Motivations • Knowledge of UNIX Internals helps in: • understanding similar systems (for example, NT, LINUX) • designing high performance UNIX applications

WHAT IS THE KERNEL? • Part of UNIX OS that contains code for: • controlling execution of processes (creation, termination, suspension, communication) • scheduling processes fairly for execution on the CPU. • allocating main memory for exec of processes. • allocating secondary memory for efficient storage and retrieval of user data. • Handling peripherals such as terminals, tape drives, disk drives and network devices.

Kernel Characteristics: • Kernel loaded into memory and runs until the system is turned off or crashes. • Mostly written in C with some assembly language written for efficiency reasons. • User programs make use of kernel services via the system call interface. • Provides its services transparently.

Kernel Subsystems • File system • Directory hierarchy, regular files, peripherals • Multiple file systems • Process management • How processes share CPU, memory and signals • Input/Output • How processes access files, terminal I/O • Interprocess Communication • Memory management System V and BSD have different implementations of different subsystems.

TALKING TO THE KERNEL • Processes accesses kernel facilities via system calls • Peripherals communicate with the kernel via hardware interrupts.

EXECUTION IN USER MODE AND KERNEL MODE • Kernel contains several data structures needed for implementing kernel services. These structures include: • Process table: contains an entry for every process in the system • Open-file table, contains at least one entry for every open file in the system.

Execution in kernel mode and user mode • When a process executes a system call, the execution modeof the process changes from user mode to kernel mode. • Processes in user mode can access their own instructions and data but not kernel instructions and data structures. • In kernel mode, a process can access system data structures, such as the process table.

Flow of Control during a System call • User process invokes a system call (for example open( )) • Every system call is allocated a code number at system initialization. • C runtime library version of the system call places the system call parameter and the system call code number into machine registers and then executes a trap machine instruction switching to kernel code and kernel mode.

Flow of control of a system call • trap instruction uses the system call number as in index into a system call vector table (located in kernel memory) which is an array of pointers to the kernel code for each system call. • Code corresponding to system call executes in kernel mode, modifying kernel data structures if necessary. • Performs special "return" instruction that flips machine back into user mode and returns to the user process's code

SYNCHRONOUS VS ASYNCHRONOUS PROCESSING • Usually, processes performing system calls cannot be preempted. • Processes must relinquish voluntarily the CPU for example while waiting for I/O to complete. • Kernel sends a process to sleep and will wake it up when I/O is completed. • The scheduler does not allocate sleeping process any CPU time and will allocate the CPU to other processes while the hardware device is servicing the I/O request.

INTERRUPTS AND EXCEPTIONS • UNIX system allows devices such as I/O peripherals and clock to interrupt CPU asynchronously. • On receipt of the interrupt, kernel saves its current context (frozen image of what the process was doing), determines cause of interrupt and services the interrupt. • Devices are allocated an interrupt priority based in their relative importance. • When the kernel services an interrupt, it blocks out lower priority interrupts but services higher priority interrupts

PROCESSOR EXECUTION LEVELS • Kernel must sometimes prevent the occurrence of interrupts during critical activity to avoid corruption of data. • Typical Interrupt Levels • Machine Errors • Clock • Higher priority • Disk • Network Devices • Terminals • Software Interrupts Lower priority

Interrupts • Interrupts are serviced by kernel interrupt handlers which must be very fast to avoid loosing any interrupts. • If an interrupt of higher priority occurs while a lower interrupt is services, nesting will occur and higher interrupt is serviced.

DISK ARCHITECTURE: • Disk is split in two ways: sliced like a pizza called sectors • And subdivided into concentric rings called tracks. • Blocks are are individual areas bounded by the intersection of sectors and tracks; they are the basic units of disk storage. • Typical blocks can hold 4K bytes.

Disk architecture (cont’) • Several variations of disk architecture: many disks contains several platters, stacked one upon the other. In these systems, a collection of tracks with the same index number is called a cylinder. • Big issue: sequential reads are much faster than random ones (factor of 10 to 15) • When a sequence of contiguous blocks is read, there is a latency delay between each block due to latency of the communication between the disk controller and the device driver.

Disk architecture (cont’) • Want consecutive data to be on the same track though not consecutive on the track. See interleaving techniques wherein • Consecutive blocks are three sectors apart. • Extent file systems support large consecutive chunks at once. • (needed for data intensive applications) • I/O is always done in terms of blocks

THE FILE SUBSYSTEM Support of: • Regular files • Directory • Special files correspond to peripherals such as tapes, terminals or disks and inter-process communication mechanisms such as pipes and sockets.

INODES • Contains permissions, owner, groups and last modification times. • Type of file: regular, directory or special file • If it is symbolic link, the value of the symbolic link. • If it is a regular file or directory, contains location of its disks blocks:

Inode (cont’) • Direct pointers to block 0 to 9 • Indirect pointer to an entire block which holds 10 .. 1033 blocks. • Double indirect pointer (in primary inode) to a block that is just pointers to other blocks, each of which holds 1024 pointers to data blocks.

LAYOUT OF THE FILE SYSTEM • File system has following structure: • First logical block: boot block for starting OS. • Second logical block: superblock that contains information about free pages and inode list. • Following is the inode list which is a list of inodes.Administrators specify size of inode list when configuring the file system.Kernel references inodes by index into the inode list. • The data blocks start at the end of the inode list and contain file data and administrative data.

CONVERSION OF PATHNAME TO AN INODE • Initial access to a file is through its pathname. The kernel needs to translate a pathname to inodes to access files. • The algorithm nameiparses the pathname one component at a time, converting each component into an inode based on its name and the directory being searched and eventually returns the inode of the input path name.

Namei ALGORITHM • if pathname is absolute, then search starts from the root inode • if pathname is relative, search is started from the inode corresponding to the current working directory of the process. (kept in the process u area) • the components of the pathname are then processed from left to right. Every component, except the last one, should either be a directory or a symbolic link. Let's call the intermediate inodes the working inodes.

Namei algorithm (cont’) • If the working inode is a directory, the current pathname component is looked for in the directory corresponding to the working inode. If it is not found, it returns an error, otherwise, the value of the working inode number becomes the inode number associated with the located pathname component.

Namei (cont’) • If the working inode corresponds to a symbolic link, the pathname up to and including the current path component is replaced by the contents of the symbolic link, and the pathname is reprocessed. • The inode corresponding to the final pathname component is the inode of the file referenced by the entire pathname

MOUNTING FILE SYSTEMS • When UNIX is started, the directory hierarchy corresponds to the file system located on a single disk called the root device. • The mount utility allows a super-user to splice the root directory of a file system into the existing directory hierarchy. • File systems created on other devices can be attached to the original directory hierarchy using the mount mechanism.

MOUNT(CONT') • When mount is established, users are unaware of crossing mount points. • File system may be detached from the main hierarchy using the umount utility. • Links do not work across mounts (System V) • Example: • $ mount /dev/floppy /mtn • $ umount /mtn

Mount (cont’) • Kernel maintains a system-wide data structure called the mount table that allows multiple file systems to be accessed via a single directory hierarchy. • mount( ) and umount( ) system calls modify table, in the following manner:

MOUNT (CONT') • with mount( ), an entry is added with: • device number containing file system • a pointer to the root inode of the newly mounted file system • a pointer to the inode of the mount point • a pointer to the file-system-specific mount data structure of the newly mounted file system.

Umount () • With umount() several checks are made in the kernel: • checks that there are no open files in the file system to be un-mounted • flushes the superblock and buffered inodes back to the file system • removes mount table entry and removes "mount point" mark from the mount point directory

THE PROCESS SUBSYSTEM: process states • Every process on the system can be in one of 6 states: • running: process is currently using the CPU • runnable: ready to run, will run depending on priority • sleeping: waiting for an event • suspended: (e.g., as a result of ctrl Z) • idle: being created by fork( ), not yet runnable • zombie: terminated but parent has not accept its return value

Example of process state • For example, when process issues an I/O command, it becomes suspended, then becomes runnable again when I/O completes and will run depending on priority.

PROCESS COMPOSITION • code area: executable (text) portion of the process • data area: used by the process to contain static data • stack area: used by the process to store temporary data • user area: holds housekeeping info • page tables: used for memory management

USER AREA • Every process has a private user area for housekeeping information that is used by the kernel for process management. • It contains control and status information. • The contents of the user area are only accessible when the process is executing in kernel space. • The kernel can only access the user area of the currently running process, and not the user area of other processes.

PROCESS USER AREA (CONT') • The important fields in the user area include: • a pointer to the process table slot of the currently executing process • file descriptors for all open files • internal I/O parameters • current directory and current root • process and file size limits • real and effective user Ids • an array indicating how a process reacts to signals • how much CPU time process has recently used

PROCESS TABLE • The process table is a kernel data structure that contains one entry for every process in the system. • The process table contains fields that must always be accessible to the kernel.

Process entry info • state: (running, runnable, sleeping, suspended, idle or zombified) • process ID and Parent PID • its real and effective user ID and group ID (GID) • location of its code, data, stack and user areas • a list of all pending signals • various timers give process execution time and kernel resource utilization

THE SCHEDULER • The scheduler is responsible for sharing CPU time between competing processes. • The scheduler maintains a multilevel priority queue that allows it to schedule processes efficiently and follows a specific algorithm for selecting which process should be running.

Scheduling Rules • The kernel allocates the CPU to a process for a time quantum, preempts a process that exceeds its time quantum and feeds it back into one of the several priority queues. • During every second, processes in the non-empty queue of the highest priority queue are allocated the CPU is a round-robin fashion.

Scheduler (cont’) • To support real-time processes, scheduler needs to be changed so • that scheduling is based on priority inheritance rather than time quanta. Also, more preemption points in the kernel are needed.

Context Switch • To switch from one process to another, the kernel saves the process's program counter, stack pointer and other important info in the process's user area. • When the process is ready to run, the kernel will get this info from the process's user area.

Loading an executable • A user compiles the source code of a program to create an executable file, which consists of several parts: • Set of "headers" that describe the attributes of a file • Program text • Machine language representation • Other sections, such as symbol table information

Loading an executable • Kernel loads an executable file into memory during an exec( ) system call. • Loaded process contains at least 3 parts, called regions • Text corresponds to text sections of the executable file • Data corresponds to data section of the executable file • Stack is automatically created and its size is dynamically adjusted by the kernel at run time.

Loading an executable • Compiler generates address for a virtual address space with a given address range. • Memory Management Unit translates virtual addresses generated by the compiler into addresses of physical memory.

THE BOOT and INIT PROCESS • administrator initializes system through bootstrap sequence • UNIX system, bootstrap sequence eventually reads the boot block (boot 0) of a disk and loads into memory • The program contained in the boot block loads the kernel from the File system (for example, /unix) • After kernel loaded into memory, boot program transfers control to the start address of the kernel and the kernel starts running.

Boot process (cont’) • After initialization, kernel mounts root file system and handcrafts environment for process 0. • Process 0 forks() from within kernel. • Process 1, running in kernel mode, creates its user-level context by allocating a data region and attaching to its address space.

Boot process (cont’) • Process 1 copies code from kernel space to new regions which forms new user-context of process 1. • Process 1 sets up saved user registers contexts, "returns" from kernel mode and executes code just copied from kernel. • Process 1 is now a user-level process and the text code consists of a call to exec the /etc/init program.

The End

UNIX – The Kernel

UNIX – The Kernel

Presentation Transcript

Ch 2. Getting Started with the Kernel

The UNIX File System

Unix Basics

UNIX FILES

The Shell and some useful administrative Unix Commands

Linux Kernel introduction

Introduction of Unix/Linux

Reverse Engineered Architecture of the Linux Kernel

1 Introducing UNIX

CIS 240 Introduction to UNIX Instructor: Sue Sampson

Experimental Networking: Linux Kernel Modules and MIT Click Router

UNIX PROCESSES

UNIX / Linux Primer

What is UNIX?

Tutorial of Unix/Linux

OS Structure and Performance