1 / 105

LINUX Kernel

LINUX Kernel. Chapter 3 Introduction to the Kernel 黃仁竑. Processes and Tasks. Process 1. Process 2. Process 3. Task 1. Task 2. Task 3. System Kernel with co-routines. Processes seen from outside: individual processes exist independently Tasks

ilyssa
Download Presentation

LINUX Kernel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. LINUX Kernel Chapter 3 Introduction to the Kernel 黃仁竑

  2. Processes and Tasks Process 1 Process 2 Process 3 Task 1 Task 2 Task 3 System Kernel with co-routines • Processes • seen from outside: individual processes exist independently • Tasks • seen from inside: only one operating system is running ©黃仁竑/中正資工

  3. Process States Running Interrupt Return from system call Interrupt routine System call Scheduler Ready Waiting User mode System mode ©黃仁竑/中正資工

  4. Process States • Running • Task is active and running in the non-privileged user mode. • If an interrupt or system call occurs, it is switched to the privileged system mode. • Interrupt routine • hardware signals an exception condition • clock generates signal every 10 ms • System call • software interrupt ©黃仁竑/中正資工

  5. Process States • Waiting • wait for an external event (e.g., I/O complete) • Return from system call • when system call or interrupt is complete • scheduler switches the process to ready state • Ready • competing for the processor ©黃仁竑/中正資工

  6. Important Data Structures • Task structure • task_struct in include/linux/sched.h • Also accessed by assembly code, cannot alter the sequence or add declarations in the front • states • TASK_RUNNING (0): ready or running • TASK_INTERRUPTIBLE(1), TASK_UNINTERRUPTIBLE(2): waiting for certain events. TASK_UNINTERRUPTIBLE means a task cannot accept any other signals. • TASK_ZOMBIE(3): process terminated but still has its task structure • TASK_STOPPED(4): process has been halted • TASK_SWAPPING(5): not used. ©黃仁竑/中正資工

  7. Task Structure struct task_struct { /* these are hardcoded - don't touch */ volatile long state; • volatile indicates that this value can be altered by interrupt routines long counter; long priority; • counter variable holds the time in ticks for the process can still run before a mandatory scheduling action is carried out. Counter is used as dynamic priority for scheduler • priority holds the static priority of a process ©黃仁竑/中正資工

  8. Task Structure unsigned long signal; unsigned long blocked; • signal contains a bit mask for signals received for the process. It is evaluated in the routing ret_from_sys_call() which is called after every system call and after slow interrupts. • blocked contains a bit mask for signals to be blocked unsigned long flags; • flags contains the combination of the system status flags ©黃仁竑/中正資工

  9. Task Structure • Process flags: #define PF_ALIGNWARN 0x00000001 /* Print alignment warning msgs */ /* Not implemented yet, only for 486*/ #define PF_PTRACED 0x00000010 /* set if ptrace (0) has been called. */ #define PF_TRACESYS 0x00000020 /* tracing system calls */ #define PF_FORKNOEXEC 0x00000040 /* forked but didn't exec */ #define PF_SUPERPRIV 0x00000100 /* used super-user privileges */ #define PF_DUMPCORE 0x00000200 /* dumped core */ #define PF_SIGNALED 0x00000400 /* killed by a signal */ #define PF_STARTING 0x00000002 /* being created */ #define PF_EXITING 0x00000004 /* getting shut down */ #define PF_USEDFPU 0x00100000 /* Process used the FPU this quantum (SMP only) */ #define PF_DTRACE 0x00200000 /* delayed trace (used on m68k) */ ©黃仁竑/中正資工

  10. Task Structure int errno; int debugreg[8]; • errno holds the error code for the last faulty system call. • debugreg contains the 80x86’s debugging registers. struct exec_domain *exec_domain; • which UNIX is emulated for each process struct task_struct *next_task, *prev_task; • all processes are linked through these two pointers • init_task points to the start and end of this list struct task_struct *next_run, *prev_run; • list of processes that apply for the processor ©黃仁竑/中正資工

  11. Task Structure parent p_cptr p_pptr p_pptr p_pptr p_osptr p_osptr youngest child oldest child child p_ysptr p_ysptr struct task_struct *p_opptr, *p_pptr, *p_cptr, *p_ysptr, *p_osptr; • pointers to (original) parent process, youngest child, younger sibling, older sibling, respectively ©黃仁竑/中正資工

  12. Task Structure struct mm_struct *mm; • memory management information struct mm_struct { int count; pgd_t * pgd; unsigned long context; unsigned long start_code, end_code, start_data, end_data; unsigned long start_brk, brk, start_stack, start_mmap; unsigned long arg_start, arg_end, env_start, env_end; unsigned long rss, total_vm, locked_vm; unsigned long def_flags; struct vm_area_struct * mmap; struct vm_area_struct * mmap_avl; struct semaphore mmap_sem; }; ©黃仁竑/中正資工

  13. Virtual Memory ©黃仁竑/中正資工

  14. Task Structure unsigned long kernel_stack_page; • stack when a process is running in system mode unsigned long saved_kernel_stack; • save the old stack pointer when running MS-DOS emulator (vm86) int pid, pgrp, session, leader; • process id, group id, session belongs to, and session leader unsigned short uid,euid,suid,fsuid; unsigned short gid,egid,sgid,fsgid; • user id, effective user id, file system user id • group id, effective group id, file system group id ©黃仁竑/中正資工

  15. Task Structure • uid, euid, suid, gid, egid, sgid • Each process has a real user ID and group ID and an effective user ID and group ID. • The real ID identifies the person using the system • The effective ID determines their access privileges. • execve() changes the effective user or group ID to the owner or group of the executed file if the file has the set-user-ID (suid) or set-group-ID (sgid) modes. The real UID and GID are not affected. The effective user ID and effective group ID of the new process image are saved as the saved set-user-ID and saved set-group-ID respectively, for use by setuid(3V). • Turn on suid: chmod a+s filename ©黃仁竑/中正資工

  16. Task Structure • Uid, gid are inherited from parent • euid, egid, fsuid, fsgid can be set at run time (owner of the executable file) int groups[NGROUPS]; • A process may be assigned to many groups struct fs_struct *fs; • file system information struct fs_struct { int count; /* for future expansions */ unsigned short umask; /* access mode */ struct inode * root, * pwd; /* root dir and current dir */ }; ©黃仁竑/中正資工

  17. Task Structure struct files_struct *files; • open file information (file descriptors) struct files_struct { /* open file table structure */ int count; fd_set close_on_exec; /* files to be closed when exec is issued */ fd_set open_fds; /* open files (bitmask) */ struct file * fd[NR_OPEN]; }; ©黃仁竑/中正資工

  18. Task Structure long utime, stime, cutime, cstime, start_time; • time spend in user mode, system mode, total time of children process spend in user mode, system mode, and the time when the process generated, respectively. unsigned long it_real_value, it_prof_value, it_virt_value; unsigned long it_real_incr, it_prof_incr, it_virt_incr; struct timer_list real_timer; • timer for alarm system call (SIGALRM) • time in ticks until the timer will be trigger, for re-initialization, real-time interval timer, respectively. ©黃仁竑/中正資工

  19. Task Structure struct sem_undo *semundo; • semaphores need to be released when a process terminated struct sem_queue *semsleeping; • semaphore waiting queue struct wait_queue *wait_chldexit; • When a process calls wait4(), it will halt until a child process terminates at this queue. struct rlimit rlim[RLIM_NLIMITS]; • limits of the use of resources (setrlimit(), getrlimit()) ©黃仁竑/中正資工

  20. Task Structure struct signal_struct *sig; struct signal_struct { int count; struct sigaction action[32]; }; • Signal handlers int exit_code, exit_signal; • return code and the signal that causes the program aborted char comm[16]; • name of the program that executed by the process ©黃仁竑/中正資工

  21. Task Structure unsigned long personality; • description of the characteristics of this version of UNIX (see also exec_domain) int dumpable:1; • whether a memory dump is to be executed int did_exec:1; • is the process still running the old program (no execve, …) struct desc_struct *ldt; • used by WINE, windows emulator ©黃仁竑/中正資工

  22. Task Structure struct linux_binfmt *binfmt; • functions responsible for loading the program struct thread_struct tss; • holds all the data on the current processor status at the time of the last transition from user mode to system mode, all registers are saved here. • struct thread_struct can be found in asm-i386/processor.h which, among other definitions, include 8086 related information: struct vm86_struct * vm86_info; unsigned long screen_bitmap; unsigned long v86flags, v86mask, v86mode; ©黃仁竑/中正資工

  23. Task Structure unsigned long policy, rt_priority; • Scheduling policies: classic (SCHED_OTHER), real-time (SCHED_RR, SCHED_FIFO) • rt_priority :real-time priority #ifdef __SMP__ int processor; int last_processor; int lock_depth; #endif • When running on a multi-processor machine, need to know on which processor the task is running, .., etc. ©黃仁竑/中正資工

  24. Process Table struct task_struct init_task; • points to the start of the doubly linked task list struct task_struct *task[NR_TASKS]; • task table #define current (0+current_set[smp_processor_id()]) struct task_struct *current_set[NR_CPUS]; • current process (for multi-processor architecture) #define for_each_task(p) \ for (p = &init_task ; (p = p->next_task) != &init_task ; ) • macro for find all processes • the first task is skipped (init_task) ©黃仁竑/中正資工

  25. Files and inodes • Two important structures:file, inode (linux/fs.h) • The file structure (process’s view) struct file { mode_t f_mode; • acess mode when opened(RO, RW, WO) loff_t f_pos; • position of the read/write pointer (64-bit) unsigned short f_flags; • additional flag for controlling access rights (fcntl) ©黃仁竑/中正資工

  26. Files and inodes ©黃仁竑/中正資工

  27. Files and inodes unsigned short f_count; • reference count (dup, dup2, fork) struct file *f_next, *f_prev; • doubly linked list • global variable:struct file *first_file; struct inode * f_inode; • actual description of the file struct file_operations * f_op; • refers to a structure of function pointers of file operations, i.e., functions are not directly called. • Since LINUX supports many file system, Virtual File System (VFS) is implemented. ©黃仁竑/中正資工

  28. Files and inodes struct inode { kdev_t i_dev; /* which device the file is on */ unsigned long i_ino; /* position on the device */ umode_t i_mode; nlink_t i_nlink; uid_t i_uid; /* owner user id */ gid_t i_gid; /* owner group id */ off_t i_size; /* size in bytes */ time_t i_atime; /* time of last access */ time_t i_mtime; /* time of last modification */ time_t i_ctime; /* time of last modification to inode*/ ©黃仁竑/中正資工

  29. Memory Management • Macros #define __get_free_page(priority) __get_free_pages((priority),0,0) #define __get_dma_pages(priority, order) __get_free_pages((priority),(order),1) extern unsigned long __get_free_pages(int priority, unsigned long gfporder, int dma); • defined in linux/mm.h, page size is 4KB • priority: GFP_BUFFER, GFP_ATOMIC, GFP_KERNEL, GFP_NOBUFFER, GFP_NFS (what to do if not enough pages are free) • order:number of pages to be reserved (in power of 2) • dma: address can be addressed by DMA component ©黃仁竑/中正資工

  30. Memory Management • Functions extern inline unsigned long get_free_page(int priority) { unsigned long page; page = __get_free_page(priority); if (page) memset((void *) page, 0, PAGE_SIZE); return page; } • Will clear the page ©黃仁竑/中正資工

  31. Memory Management • Functions void *kmalloc(size_t size, int priority) void kfree(void *__ptr) • malloc() and free() in the kernel ©黃仁竑/中正資工

  32. Waiting Queues • Structures for waiting queues struct wait_queue { struct task_struct * task; struct wait_queue * next; }; • include/linux/wait.h • wait until condition met • Functions (sched.h) • extern inline void add_wait_queue(struct wait_queue ** p, struct wait_queue * wait) • extern inline void remove_wait_queue(struct wait_queue ** p, struct wait_queue * wait) ©黃仁竑/中正資工

  33. Waiting Queues • Functions void sleep_on(struct wait_queue ** p); void interruptible_sleep_on(struct wait_queue ** p); void wake_up(struct wait_queue ** p); void wake_up_interruptible(struct wait_queue ** p); • kernel/sched.c • sleep_on sets process state to TASK_UNINTERRUPTIBLE or TASK_INTERRUPTIBLE • wait_up sets process state to TASK_RUNNING ©黃仁竑/中正資工

  34. Semaphores • Structure for semaphores struct semaphore { int count; int waiting; struct wait_queue * wait; }; • asm-i386/semaphore.h • Functions extern inline void down(struct semaphore * sem) extern inline void up(struct semaphore * sem) ©黃仁竑/中正資工

  35. System Time and Timers • In unit of ticks (10 ms) • Global variable, jiffies, denotes the time in ticks since the system booted • Structure for timer (old) struct timer_struct { unsigned long expires; void (*fn)(void); }; extern struct timer_struct timer_table[32]; extern unsigned long timer_active; /* which entry is valid? */ ©黃仁竑/中正資工

  36. System Time and Timers • Structure for timer (new) struct timer_list { struct timer_list *next; struct timer_list *prev; unsigned long expires; unsigned long data; /* arguments */ void (*function)(unsigned long); }; extern void add_timer(struct timer_list * timer); extern int del_timer(struct timer_list * timer); ©黃仁竑/中正資工

  37. Process Management • Signal • Interrupt • Booting • Timer • Scheduler ©黃仁竑/中正資工

  38. Signal • Signals () SIGHUP 1 hangup SIGINT 2 interrupt SIGQUIT 3 quit SIGILL 4 illegal instruction SIGTRAP 5 trace trap SIGABRT 6 abort (generated by abort(3) routine) SIGIOT 6 Input/Output Trap (obsolete) SIGBUS 7 bus error SIGFPE 8 arithmetic exception SIGKILL 9 kill (cannot be caught, blocked, or ignored) SIGUSR1 10 user-defined signal 1 ©黃仁竑/中正資工

  39. Signal SIGSEGV 11 segmentation violation SIGUSR2 12 user-defined signal 2 SIGPIPE 13 write on a pipe or other socket with no one to read it SIGALRM 14 alarm clock SIGTERM 15 software termination signal SIGTKFLT 16 SIGCHLD 17 child status has changed SIGCONT 18 continue after stop SIGSTOP 19 stop (cannot be caught, blocked, or ignored) SIGTSTP 20 stop signal generated from keyboard SIGTTIN 21 background read attempted from control terminal ©黃仁竑/中正資工

  40. Signal SIGTTOU 22 background write attempted to control terminal SIGURG 23 urgent condition present on socket SIGXCPU 24 cpu time limit exceeded (see getrlimit(2)) SIGXFSZ 25 file size limit exceeded (see getrlimit(2)) SIGVTALRM 26 virtual time alarm (see getitimer(2)) SIGPROF 27 profiling timer alarm (see getitimer(2)) SIGWINCH 28 window changed (see termio(4) and win(4S)) SIGIO 29 I/O is possible on a descriptor (see fcntl(2V)) SIGPOLL 29 SIGIO SIGPWR 30 Power Failure (for UPS) SIGUNUSED 31 ©黃仁竑/中正資工

  41. Signal System Calls • Important system calls • kill(int pid, int sig) • sends the signal sig to a process or a group of processes • If pid is greater than zero, the signal is sent to the process with the PID pid. • If pid is zero, the signal is sent to the process group of the current process. • If pid is -1, the signal is sent to all processes, except the system processes and current process • If pid is less than -1, the signal is sent to all process of the process group -pid ©黃仁竑/中正資工

  42. Signal System Calls • Important system calls • kill(int pid, int sig) • The real or effective user ID of the sending processing must match the real or saved set-user ID of the receiving process, unless the effective user ID of the sending process is super-user. • A single exception is the signal SIGCONT, which requires the sending and receiving processes belong to the same session. • Errors: • EINVAL: invalid sig • ESRCH: process or process group does not exist • EPERM: no privileges ©黃仁竑/中正資工

  43. Signal System Calls • Important system calls • kill(int pid, int sig) • Implementation • linux/kernel/exit.c • sys_kill() -> send_sig(), kill_pg(), kill_proc() -> generate() • see also force_sig(), kill_sl() • also called from ret_from_sys_call() -> do_signal()->send_sig() ->handle_signal() (signal.c, 223) ->setup_frame() (160) ->regs->eip = sa->sa_handler (213) ©黃仁竑/中正資工

  44. sys_kill • Linux/kernel/exit.c, line 318-339 • 322-323: If pid is zero, the signal is sent to the process group of the current process. • 324-334: If pid is -1, the signal is sent to all processes, except the system processes (PID=0 or 1) and current process. “for_each_task” macro is defined in include/linux/sched.h, line 491. If count is zero, return error code ESRCH. • 335-336:If pid is less than -1, the signal is sent to all process of the process group -pid. • 338: If pid is greater than zero, the signal is sent to the process with the PID pid. ©黃仁竑/中正資工

  45. kill_pg • Linux/kernel/exit.c, line 258-275. • 264-265: sig must be in [1..32], pgrp (process group id) must be greater than zero • 266-273: for each process, if its process group id is pgrp, then sends signal sig to it (send_sig). If success, send_sig will return zero. • 274: if found=0, then no process has been found, return error ESRCH, else return zero. ©黃仁竑/中正資工

  46. kill_proc • Linux/kernel/exit.c, line 301-312 • 305-306: sig must be in [1..32]. • 307-310: if a process with pid is found, sends signal sig to it (send_sig) • 311: if no process has been found, return error ESRCH ©黃仁竑/中正資工

  47. send_sig • Linux/kernel/exit.c, line 73-101 • 75-76: p cannot be null and sig must less than or equal to 32 • 77: priv is privilege (0 for normal process, 1 for super user), SIGCONT can only send to process belongs to the same sessin • 78-79: The real or effective user ID of the sending processing must match the real or saved set-user ID of the receiving process, unless the effective user ID of the sending process is super-user. • 80: super user? • 81: If none of above conditions is true, return error ©黃仁竑/中正資工

  48. send_sig • 82-83: if sig=0, do nothing • 84-88: if sig in the task struct is null (in zombie state), do nothing • 89-95: if sig is SIGKILL or SIGCONT, and the process is in state TASK_STOPPED, wake up the process and reset SIGSTOP, SIGTSTP, SIGTTIN, SIGTTOU signals. • 96-97: if sig is SIGSTOP, SIGTSTP, SIGTTIN, or SIGTTOU, reset SIGCONT. • 99: actually generate the signal ©黃仁竑/中正資工

  49. generate • Linux/kernel/exit.c, line 29-51 • 31: set up signal mask • 32: action of the signal, sa=p->sig->action[sig-1] • 39: if the signal is not blocked and the process is not traced • 41: and if the handler of the signal is SIG_IGN (to be ignored) and the signal is not from state change of child process • 42: then return immediately. • 44-46: if the handler if SIG_DFL (default action) and the signal is SIGCONT, SIGCHLD, SIGWINCH, SIGURG, then return immediately. (wake up has been done for SIGCONT) ©黃仁竑/中正資工

  50. generate • 48: finally, set the signal • 49-50: if the signal receiving process is interruptable and the signal is not to be blocked, then wake up the process. ©黃仁竑/中正資工

More Related