1 / 88

CPS 310 Unix Broadly Defined

CPS 310 Unix Broadly Defined. Jeff Chase Duke University http:// www.cs.duke.edu /~chase/ cps310. The story so far: process and kernel. A (classical) OS lets us run programs as processes . A process is a running program instance (with a thread ).

lenci
Download Presentation

CPS 310 Unix Broadly Defined

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CPS 310 Unix Broadly Defined Jeff Chase Duke University http://www.cs.duke.edu/~chase/cps310

  2. The story so far: process and kernel • A (classical) OS lets us run programsas processes. A process is a running program instance (with a thread). • Program code runs with the core in untrusted user mode. • Processes are protected/isolated. • Virtual address space is a “fenced pasture” • Sandbox: can’t get out. Lockbox: nobody else can get in. • The OS kernel controls everything. • Kernel code runs with the core in trusted kernel mode. • OS offers system call APIs. • Create processes • Control processes • Monitor process execution

  3. Processes and the kernel Programs run as independent processes. Each process has a private virtual address space and one thread. data data Protected system calls ...and upcalls (e.g., signals) Protected OS kernel mediates access to shared resources. Threads enter the kernel for OS services. The kernel is a separate component/context with enforced modularity. The kernel syscall interface supports processes, files, pipes, and signals.

  4. Unix fork/exec/exit/wait syscalls • intpid = fork(); • Create a new process that is a clone of its parent. • exec*(“program”[argvp, envp]); • Overlay the calling process with a new program, and transfer control to it, passing arguments and environment. • exit(status); • Exit with status, destroying the process. • intpid = wait*(&status); • Wait for exit (or other status change) of a child, and “reap” its exit status. • Recommended: use waitpid(). fork child fork parent parent program initializes child context exec wait exit

  5. Unix fork/exit syscalls • intpid = fork(); • Create a new process that is a clone of its parent, return new process ID (pid) to parent, return 0 to child. • exit(status); • Exit with status, destroying the process. Note: this is not the only way for a process to exit! fork parent child time data data p exit exit pid: 5587 pid: 5588

  6. fork The forksyscall returns twice: It returns a zero in the context of the new child process. It returns the new child process ID (pid) in the context of the parent. intpid; int status = 0; if (pid = fork()) { /* parent */ ….. } else { /* child */ ….. exit(status); }

  7. exit syscall

  8. fork (original concept)

  9. fork in action today Fork is conceptually difficult but syntactically clean and simple. I don’t have to say anything about what the new child process “looks like”: it is an exact clone of the parent! The child has a new thread executing at the same point in the same program. The child is a new instance of the running program: it has a “copy” of the entire address space. The “only” change is the process ID and return code rc! The parent thread continues on its way. The child thread continues on its way. void dofork() { intrc = fork(); if (rc < 0) { perror("fork failed: "); exit(1); } else if (rc == 0) { child(); } else { parent(rc); } }

  10. A simple program: forkdeep int count = 0; int level = 0; void child() { level++; output pids if (level < count) dofork(); if (level == count) sleep(3); } void parent(intchildpid) { output pids wait for child to finish } main(intargc, char *argv[]) { count = atoi(argv[1]); dofork(); output pid } How?

  11. chase$ ./forkdeep 4 30866-> 30867 30867 30867-> 30868 30868 30868-> 30869 30869 30869-> 30870 30870 30870 30869 30868 30867 30866 chase$

  12. wait Note: in modern systems the waitsyscall has many variants and options.

  13. wait Parent uses wait to sleep until the child exits; wait returns child pid and status. Wait variants allow wait on a specific child, or notification of stops and other “signals”. int pid; int status = 0; if (pid = fork()) { /* parent */ ….. pid = wait(&status); } else { /* child */ ….. exit(status); }

  14. Thread states and transitions We will presume that these transitions occur only in kernel mode. This is true in classical Unix and in systems with pure kernel-based threads. Before a thread can sleep, it must first enter the kernel via trap (syscall) or fault. Before a thread can yield, it must enter the kernel, or the core must take an interrupt to return control to the kernel. STOP wait running yield preempt On entry to the running state, kernel code decides if/when/how to enter user mode, and sets up a suitable context. sleep dispatch blocked ready wakeup

  15. Kernel Stacks and Trap/Fault Handling stack stack stack stack Threads execute user code on a user stack in the user virtual memory in the process virtual address space. System calls and faults run in kernel mode on a kernel stack. data Kernel code running in P’s process context has access to P’s virtual memory. Each thread has a second kernel stack in kernel space (VM accessible only in kernel mode). syscall dispatch table The syscallhandler makes an indirect call through the system call dispatch table to the handler registered for the specific system call.

  16. Running a program data code (“text”) constants initialized data Process sections segments Thread Unix: fork/exec Program virtual memory When a program launches, the OS creates a process to run it, with a main thread to execute the code, and a virtual memory to store the running program’s code and data.

  17. But how do I run a new program? • The child, or any process really, can replace its program in midstream. • exec* system call: “forget everything in my address space and reinitialize my entire address space with stuff from a named program file.” • The exec system call never returns: the new program executes in the calling process until it dies (exits). • The code from the parent program runs in the child process and controls its future. The parent program selects the program that the child process will run (via exec), and sets up its connections to the outside world. The child program doesn’t even know!

  18. exec (original concept)

  19. A simple program: forkexec … main(intargc, char *argv[]) { int status; intrc = fork(); if (rc < 0) { perror("fork failed: "); exit(1); } else if (rc == 0) { argv++; execve(argv[0], argv, 0); } else { waitpid(rc, &status, 0); printf("child %d exited with status %d\n", rc, WEXITSTATUS(status)); } }

  20. A simple program: prog0 int main() { } chase$ cc –o forkexecforkexec.c chase$ cc –o prog0 prog0.c chase$ ./forkexec prog0 child 19175 exited with status 0 chase$

  21. Unix fork/exec/exit/wait syscalls • intpid = fork(); • Create a new process that is a clone of its parent. • exec*(“program”[argvp, envp]); • Overlay the calling process with a new program, and transfer control to it, passing arguments and environment. • exit(status); • Exit with status, destroying the process. • intpid = wait*(&status); • Wait for exit (or other status change) of a child, and “reap” its exit status. • Recommended: use waitpid(). fork child fork parent parent program initializes child process context exec wait exit

  22. Mode Changes for Fork/Exit transition from user to kernel mode (callsys) transition from kernel to user mode (retsys) • Syscall traps and “returns” are not always paired. • fork “returns” (to child) from a trap that “never happened” • exitsystem call trap never returns • System may switch processes between trap and return Exec enters the child by doctoring up a saved user context to “return” through. Fork call Fork return Wait call Wait return parent child Fork entry to user space Exit call

  23. But how is the first process made?

  24. Init and Descendents Kernel “handcrafts” initial process to run “init” program. Other processes descend from init, including one instance of the login program for each terminal. Login runs user shellin a child process after user authenticates. User shell runs user commands as child processes.

  25. Processes: A Closer Look stack thread virtual address space user ID process ID parent PID sibling links children process descriptor (PCB) + + resources Each process has a thread bound to the VAS. The thread has a stack addressable through the VAS. The kernel can suspend/restart the thread wherever and whenever it wants. The OS maintains some state for each process in the kernel’s internal data structures: a file descriptor table, links to maintain the process tree, and a place to store the exit status. The address space is a private name space for a set of memory segments used by the process. The kernel must initialize the process memory for the program to run.

  26. The Shell • Users may select from a range of command interpreter (“shell”) programs available. (Or even write their own!) • csh, sh, ksh, tcsh, bash: choose your flavor… • Shells execute commands composed of program filenames, args, and I/O redirection symbols. • Fork, exec, wait, etc., etc. • Can coordinate multiple child processes that run together as a process group or job. • Shells can run files of commands (scripts) for more complex tasks, e.g., by redirecting I/O channels (descriptors). • Shellbehavior is guided by environment variables, e.g., $PATH • Parent may control/monitor all aspects of child execution.

  27. Unix process: parents rule Created with fork by parent program running in parent process. Virtual address space (Virtual Memory, VM) Process text data Parent program running in child process, or exec’d program chosen by parent program. heap Thread Inherited from parent process, or modified by parent program in child Clone of parent VM. Environment (argv[] and envp[]) is configured by parent program on exec. Program kernel state

  28. Execsetup (ABI)

  29. Unix: upcalls • The kernel directs control flow into user process at a fixed entry point: e.g., entry for exec() is _crt0 or “main”. • Process may also register a signal handlers for events relating to the process, (generally) signalled by the kernel. • To deliver a signal, kernel munges/redirects the thread context to execute the signal handler in user mode. • Process lives until it exits voluntarily or fails • “receives an unhandled signal that is fatal by default”. data data ...and upcalls (e.g., signals) Protected system calls

  30. Unix: signals • A signal is a typed upcall event delivered to a process by kernel. • Process P may use kill* system call to request signal delivery to Q. • Process may register signal handlersfor signal types. A signal handler is a procedure that is invoked when a signal of a given type is delivered. Runs in user mode, then returns to normal control flow. • A signal delivered to a registered handler is said to be caught. If there is no registered handler then the signal triggers a default action (e.g., “die”). • A process lives until it exits voluntarily or receives a fatal signal. data data ...and upcalls(signals) Kernel sends signals, e.g., to notify processes of faults. Protected system calls

  31. Unix process view: data I/O channels (“file descriptors”) VM mappings stdin Process stdout tty text stderr data pipe Thread anon VM socket etc. etc. Program Files Segments Labels: uid

  32. Unix I/O and IPC • I/O objects / kernel abstractions / channel types: • Terminal: user interaction via “teletypewriter”: tty • Pipe: Inter Process Communication (IPC) • Socket: networking • File: storage • Signals for IPC, process control, faults • Some shellish grunge needed for Lab 2 “typewriter”

  33. Unix process view: data A process has multiple channelsfor data movement in and out of the process (I/O). I/O channels (“file descriptors”) stdin Process stdout tty stderr The parent process and parent program set up and control the channels for a child (until exec). pipe Thread socket Program Files

  34. Standard I/O descriptors I/O channels (“file descriptors”) Open files or other I/O channels are named within the process by an integer file descriptorvalue. stdin stdout tty stderr count = read(0, buf, count); if (count == -1) { perror(“read failed”); /* writes to stderr */ exit(1); } count = write(1, buf, count); if (count == -1) { perror(“write failed”); /* writes to stderr*/ exit(1); } Standard descriptors for primary input(stdin=0), primary output (stdout=1), error/status (stderr=2). These are inherited from the parent process and/or set by the parent program. By default they are bound to the controlling terminal.

  35. Files fd = open(name, <options>); write(fd, “abcdefg”, 7); read(fd, buf, 7); lseek(fd, offset, SEEK_SET); close(fd); creat(name, mode); mkdir(name, mode); rmdir(name); unlink(name); Files A file is a named, variable-length sequence of data bytes that is persistent: it exists across system restarts, and lives until it is removed. An offset is a byte index in a file. By default, a process reads and writes files sequentially. Or it can seek to a particular offset.

  36. Files: hierarchical name space root directory applications etc. mount point external media volume or network storage user home directory

  37. Unix “file descriptors” illustrated user space kernel space file intfd pointer Disclaimer: this drawing is oversimplified pipe socket per-process descriptor table tty system-wide open file table Processes often reference OS kernel objects with integers that index into a table of pointers in the kernel. (Why?) Windows calls them handles. In Unix, processes may share I/O objects (i.e., “files”: in Unix “everything is a file”). But the descriptor name space is per-process: fork clones parent descriptor table for child, but then they may diverge.

  38. Processes reference objects text data Files anon VM

  39. Fork clones all references text data Files anon VM Cloned file descriptors share a read/write offset. Cloned references to VM segments are likely to be copy-on-write to create a lazy, virtual copy of the shared object. The kernel objects referenced by a process have reference counts. They may be destroyed after the last ref is released, but not before. What operations release refs?

  40. Shell and child tty tty stdin stdin dsh dsh fork tcsetpgrp exec wait stdout stdout 2 1 3 stderr stderr Child process inherits standard I/O channels to the terminal (tty). tty stdin stdout stderr If child is to run in the foreground: Child receives/takes control of the terminal (tty) input (tcsetpgrp). The foreground process receives all tty input until it stops or exits. The parent waits for a foreground child to stop or exit.

  41. Pipes intpfd[2] = {0, 0}; pipe(pfd); /*pfd[0] is read, pfd[1] is write */ b = write(pfd[1], "12345\n", 6); b = read(pfd[0], buf, b); b = write(1, buf, b); • The pipe() system call creates a pipeobject. • A pipe has one read end and one write end: unidirectional. • Bytes placed in the pipe with write are returned by read in order. • The readsyscall blocks if the pipe is empty. • The writesyscall blocks if the pipe is full. • Writefails (SIGPIPE) if no process has the other end open. pipe A pipe is a bounded kernel buffer for passing bytes. 12345

  42. A key idea: Unix pipes [http://www.bell-labs.com/history/unix/philosophy.html]

  43. Unix programming environment Standard unix programs read a byte stream from standard input (fd==0). They write their output to standard output (fd==1). stdin stdout Stdin or stdout might be bound to a file, pipe, device, or network socket. The processes may run concurrently and are automatically synchronized That style makes it easy to combine simple programs using pipes or files. If the parent sets it up, the program doesn’t even have to know.

  44. Shell pipeline example chase$ who | grep chase chase console Jan 13 21:08 chase ttys000 Jan 16 11:37 chase ttys001 Jan 16 15:00 chase$ tty stdin dsh stdout stderr Job stderr stderr tty tty stdout stdin tty tty stdin pipe stdout who grep

  45. But how to rewire the pipe? P creates pipe. P P forks C1 and C2. Both children inherit both ends of the pipe, and stdin/stdout/stderr. Parent closes both ends of pipe after fork. 3A 2 1 3B stdout stdin tty tty stdin stdout C1 C2 C1 closes the read end of the pipe, closes its stdout, “dups” the write end onto stdout, and execs. C2 closes the write end of the pipe, closes its stdin, “dups” the read end onto stdin, and execs.

  46. Unix dup* syscall int fd2 = 0 tty in tty out int fd1 = 5 pipe out per-process descriptor table pipe in Hypothetical initial state before dup2(fd1, fd2) syscall. dup2(fd1, fd2). What does it mean? Yes, fd1 and fd2 are integer variables. But let’s use “fd” as shorthand for “the file descriptor whose number is the value of the variable fd”. Then fd1 and fd2 denote entries in the file descriptor table of the calling process. The dup2syscall is asking the kernel to operate on those entries in a kernel data structure. It doesn’t affect the values in the variables fd1 and fd2 at all!

  47. Unix dup* syscall int fd2 = 0 X tty in > tty out int fd1 = 5 pipe out per-process descriptor table pipe in Final state after dup2(fd1, fd2) syscall. Then dup2(fd1,fd2) means: “close(fd2), then set fd2 to refer to the same underlying I/O object as fd1.” It results in two file descriptors referencing the same underlying I/O object. You can use either of the descriptors to read/write. But you should probably just close(fd1).

  48. Unix dup* syscall int fd2 = 0 tty in tty out int fd1 = 5 X pipe out per-process descriptor table pipe in Final state after dup2(fd1, fd2) syscall. Then dup2(fd1,fd2); close(fd1) means: “remap the object referenced by file descriptor fd1 to fd2 instead”. It is convenient for remapping descriptors onto stdin, stdout, stderr, so that some program will use them “by default” after exec*. Note that we still have not changed the values in fd1 or fd2. Also, changing the values in fd1 and fd2 can never affect the state of the entries in the file descriptor table. Only the kernel can do that.

  49. Simpler exampleFeeding a child through a pipe Parent close(ifd[0]); count = read(0, buf, 5); count = write(ifd[1], buf, 5); waitpid(cid, &status, 0); printf("child %d exited…”); parent Parent int ifd[2] = {0, 0}; pipe(ifd); cid = fork(); stdout stdin stderr pipe stdin Child close(0); close(ifd[1]); dup2(ifd[0],0); close(ifd[0]); execve(…); chase$ man dup2 chase$ cc -o childinchildin.c chase$ ./childin cat5 12345 12345 5 bytes moved child 23185 exited with status 0 chase$ stdout stderr child cid

More Related