Parsing a String

Parsing a String • The strtok function can be used to parse a string into tokens defined by separator characters. • However, the code for this function is not reentrant and thus is not thread safe. • An thread safe alternative is the following function char * strsep (char **string_ptr, const char *delimiter)

Parsing a String #include <string.h> #include <stddef.h> ... const char string[] = "words separated by spaces -- and, punctuation!"; const char delimiters[] = " .,;:!-"; char *running; char *token; ... running = strdup (string); token = strsep (&running, delimiters); /* token => "words" */ token = strsep (&running, delimiters); /* token => "separated" */ token = strsep (&running, delimiters); /* token => "by" */ token = strsep (&running, delimiters); /* token => "spaces" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "and" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => "punctuation" */ token = strsep (&running, delimiters); /* token => "" */ token = strsep (&running, delimiters); /* token => NULL */

Parsing a String #include <stdio.h> #include <stdlib.h> #include <string.h> int main(void) { char *path, *token; path = getenv("PATH"); if (path == NULL) { printf("variable not found\n"); exit(EXIT_FAILURE); } token = strsep(&path,":"); printf("\n"); while(token != NULL) { printf("%s\n",token); token = strsep(&path,":"); } printf("\n"); exit(EXIT_SUCCESS); } /usr/kerberos/bin /usr/local/bin /bin /usr/bin /usr/local/smlnj/bin /usr/local/bin/mpich2/bin /usr/local/bin/hadoop/hadoop-0.17.2.1/bin .

Unix System Programming Part I: Process Environment

main Function • The main function is the entry point for C program executionint main( int argc, char * argv[]); • Kernel executes main via a call to an exec function • The linker (called by the C compiler) actually specifies , in the executable program file, a special start-up routine to be called before main • The start-up routine • takes command line argument and environment values from the kernel • sets things up so that main is then called • The start-up routine also arranges that the exit function is called when main returns (see next slide)

Process Termination • Eight ways for a process to terminate • Normal termination: • Return from main • Calling exit • Calling _exit or_Exit • Return of the last thread from its start routine (later!) • Calling pthread_exitfrom the last thread (later) • Abnormal termination: • Calling abort • Receipt of a signal (later) • Response of the last thread to a cancellation request (later) • The start-up routine written so that when main returns, the exit function is called.

Exit Functions #include <stdlib.h> void exit(int status); void _Exit(int status); #include <unistd.h> void _exit(int status); To print the exit status of a program at the prompt, use the shell command echo $? ISO C POSIX.1

exit • The exit function performs a clean shutdown of the standard I/O library: fclose is called on all open streams • Recall that this causes all buffered output to be flushed • The exit status is undefined if • any of the “exit functions” is called without an exit status • main does a return without a return value • main is not declared to return an integer and main “falls off the end” • Before the ISO C99 standard, if the return type of main is int and it “falls off the end” without a return statement, the status was undefined • Under the ISO C99 standard, the return value will be 0 in this case • To compile with ISO 1999, use the -std=c99 compiler switch

atexit Function • The atexit function allows you to register functions to be executed when the program exits. #include <stdlib.h> int atexit( void (*func) (void) ); returns 0 if OK, nonzero on error • Example on next slide

atexit Function static void my_exit1(void); static void my_exit2(void); int main(void) { if (atexit(my_exit2) != 0) err_sys("can't register my_exit2"); if (atexit(my_exit1) != 0) err_sys("can't register my_exit1"); if (atexit(my_exit1) != 0) err_sys("can't register my_exit1"); printf("main is done\n"); return(0); } static void my_exit1(void) { printf("first exit handler\n"); } static void my_exit2(void) { printf("second exit handler\n"); } $ ./a.out main is done first exit handler first exit handler second exit handler Note that 1. the registered exit handlers are executed in reverse order of declaration 2. a function is executed once for each time it is registered

How a C Program is Startedand How it Terminates

Environment • When a UNIX program is executed, it receives two pieces of data from the process • The arguments • The environment • To C programs both are arrays of character pointers • All but the last of the character pointers point to a NUL-terminated string • The last pointer is a NULL pointer • A count of the number of arguments (not including the NULL pointer) is also passed

Environment • The global variable environ points to the array of environment strings (whose last entry is NULL) extern char **environ;/* environment array (not in any header) */

Environment • One way to access the environment strings is to use the environ variable directly. • Each environment string is of the form name=value extern char **environ; int main(int argc, char *argv[]) { int i; for (i=0; environ[i] != NULL; i++) printf("%s\n",environ[i]); exit(EXIT_SUCCESS); }

Environment • Partial output of the previous program in my system: HOSTNAME=c4labpc15.csee.usf.edu TERM=xterm HOST=c4labpc15.csee.usf.edu SHELL=bash HISTSIZE=1000 XTERM_SHELL=bash GROUP=users USER=rtindell HOSTTYPE=i386-linux MAIL=/var/spool/mail/rtindell PATH=/usr/kerberos/bin:/usr/local/bin:/bin:/usr/bin: /usr/local/smlnj/bin:/usr/local/bin/mpich2/bin: /usr/local/bin/hadoop/hadoop-0.17.2.1/bin:.

Environment • Listing an entire environment is an unusual requirement • Typically, the value for one “environment variable” is needed #include <stdlib.h> char *getenv( const char * var /* variable to find */ } returns value or NULL if not found

Environment • Example: #include <stdio.h> #include <stdlib.h> int main(void) { char *s; s = getenv("LOGNAME"); if (s === NULL) printf("variable not found\n"); else printf("value is \"%s\"\n",s); exit(EXIT_SUCCESS); } value is “rtindell”

Memory Layout of a C Program

Shared Libraries • Shared libraries remove the common library routines from the executable file • Instead, a single copy is maintained somewhere in memory that all processes reference • Reduces size of executable file • May add some runtime overhead either • When the program is first executed; or • The first time each shared library function is executed • Another advantage: library functions can be replaced with new versions without re-linking (provided no change to interface)

Shared Libraries • Shared procedures must be reentrant: • Code cannot modify itself • Local data for each user stored separately • Thus, a reentrant procedure must have two parts: • A permanent part, the instructions that make up the procedure • A temporary part, a pointer back to the calling program as well as memory for the calling program’s local variables • Each execution instance, or activation, of the procedure will execute the code in the permanent part but must have its own copy of local variables and parameters • The temporary part associated with a particular activation is called an activation record • Activation records normally kept on a stack

Shared Libraries • Different systems provide different methods to specify the use or non-use of shared libraries • Example: $cc -static hello1.c $ ls -l a.out -rwxrwxr-x 1 sar 475570 Feb 18 23:17 a.out $size a.out text data bss dec hex filename 375657 3780 3220 382657 5d6c1 a.out cc hello1.c $ ls -l a.out -rwxrwxr-x 1 sar 11410 Feb 18 23:19 a.out $size a.out text data bss dec hex filename 872 256 4 1132 46c a.out

ISO C Memory Allocation Functions #include <stdlib.h> • void *malloc(size_t size); • Allocates a specific number of bytes • Initial value of memory indeterminate • void *calloc(size_t nobj, size_t size); • Allocates space for a specified number of objects of a specified size • The space is initialized to all 0 bits void *realloc(void *ptr, size_t newsize); • increases or decreases size of a previously allocated area • on increase, may involve moving the previously allocated area • on increase, initial value of any new space (beyond old contents) is indeterminant

ISO C Memory Allocation Functions #include <stdlib.h> • void free(void *ptr); • causes the memory area associated with ptr to be deallocated • freed space normally put into a pool of available memory for later allocation • It is not necessary to cast pointers returned by the three memory allocation functions (generic void *)

Unix System Programming Part II: Process Control

Process Control • Topics to be covered: • Creation of a new process • Program execution • Process termination • Various IDs that are properties of the process • user IDs • group IDs • real • effective • saved • How the IDs are affected by process control primitives • Interpreter files

Process IDs • As discussed before, each process has an integer process ID • The process with ID 0 is usually the swapper • No program on disk corresponds to the swapper process as it is a part of the kernel • It is a system process

Process IDs • The process with ID 1 is the init process and is invoked by the kernel at the end of the boot strap procedure • The program file for the init process used to be /etc/init, but in modern systems it is /sbin/init • The init process brings up a UNIX system after the kernel has been bootstrapped • Usually reads system-dependent initilization files -- etc/rc*files or /etc/inittab and the files in /etc/init.d • The init process then brings the system to a certain state, such as multiuser • The init process never dies • It is a normal user process, not a system process like the swapper, although it runs with superuser privileges

Process IDs • UNIX system also has a set of kernel processes that provide OS services • Example: on some virtual memory implementations of UNIX systems, process ID 2 is the pagedaemon • This process is responsible for supporting paging of VM

Other IDs • Processes have some associated IDs other than the process ID • The following functions return these identifiers

The fork function • An existing process can create a new one by calling fork #include <unistd.h> pid_t fork(void); • Returns • 0 in child (newly created process) • Process ID of the child in the parent (creating process) • -1 on error • A process can have more than one child • Keeps track of the child processes by their process IDs • The child can always call getppid() to obtain its parent’s ID

The fork function • The child is a copy of the parent and have the same text segment • The child gets a copy of the parent’s data space, heap and stack* • But they share the same text segment (code) • Thus, both the parent and child continue execution at the instruction after the fork()

Example Write not buffered Sent to output immediately stdout is line buffered if connected to a terminal otherwise stdout is fully buffered so “before fork” stays in the buffer after the fork

Sharing of Open Files between Parent and Child

Shared Files • If both parent and child write to the same file descriptor, without any synchronization, output is intermixed • Not the normal mode of operation • Two normal cases: 1. Parent waits for child to complete. When child terminates all file offsets will be updated automatically 2. Parent and child go their own ways with each closing file descriptors it doesn’t need (with no overlap!) • Case 2 is often the case with network servers

Properties Shared by Parent and Child Processes • Real and effective user and group IDs • Supplementary group IDs • Process group ID • Controlling Terminal • Set-user-ID and set-group-ID flags • Current working directory • Root directory • File mode creation mask • Signal mask and dispositions • Close-on-exec flag for any open file descriptors • Environment • Attached shared memory segments • Memory mappings • Resource limits A process group is a collection of one or more processes, usually associated with some job that can receive signals from the same terminal

Differences between Parent and Child • The return value from fork • Different parent process IDs • Child’s time values set to 0: • tms_utime • tms_stime • tms_cutime • tms_cstime • File locks set by parent are not inherited by child • Pending alarms are cleared for child • Set of pending signals for child set to empty set

fork Failure • Two ways that the fork call can fail: • Too many process in the system (usually means something else is wrong) • Total number of processes for this real user ID exceeds system limit (given by CHILD_MAX )

Main uses of fork 1. Process wants to duplicate itself so it and the child can execute different sections of code at the same time Common for network servers: > Parent waits for a service request from a client. > When the request arrives, parent calls fork and lets the child handle the request. > Parent goes back to waiting for next request. 2. Process wants to execute a different program Common for shells In this case, the child does an exec right after the return from fork.

Process Termination (Revisited) • Normal Termination by return from main (calls exit) • Calling exit function • Defined by ISO C • calls all exit handlers registered by atexit() • closes all standard I/O streams • Because ISO C does not deal with file descriptors, multiple processes (parents and children) and job control, the ISO definition is not complete for UNIX systems • Calling the _exit or _Exit functions. • ISO C defines _Exit to provide a way for a process to terminate without running exit handlers or signal handlers. Whether standard I/O streams are flushed depends on the implementation • On UNIX systems, _exit and _Exit are synonymous and do not flush standard I/O streams • _exit is called by the exit function and handles the UNIX specific details • In most UNIX systems, exit is a function from the standard C library, whereas _exit is a system call

Process Termination (Revisited) • Two other normal ways and one abnormal way to terminate a process are thread-specific and will be covered later • Abnormal termination happens by calling abort (which is a special case of termination by a signal, SIGABRT in this case) • Termination by a signal generated by the process, another process or the kernel • We will be studying threads and signals soon

Process Termination (Revisited) • Regardless of how a process terminates, the same code in the kernel is executed • This kernel code closes all the process’s open descriptors, releases the memory it was using, etc.

Process Termination (Revisited) • The terminating process needs to be able to notify its parent how it terminated • The three exit functions use the exit status argument • For abnormal termination, the kernel generates a termination status to describe the reason for the termination • When _exit is finally called, the kernel converts the exit status to a termination status • The parent process can obtain the termination status from either the wait or waitpid function (to be covered soon)

Process Termination (Revisited) • What happens if the parent process terminates before a child process terminates? • Answer: the init process becomes the child’s parent • Normally, when a process terminates, the kernel goes through all active processes to check for orphans • Then all the orphan’s parent process ID is changed to 1 (init) • Thus the init process adopts all orphan processes

Process Termination (Revisited) • What happens if a child process terminates before its parent process (and becomes a zombie)? • If the child completely disappeared, the parent process would not be able to find its termination status when and if it checks it. • Solution: the kernel keeps a small amount of information for every terminating process • This information is available when the parent process calls wait or waitpid • This information includes the process ID, termination status and amount of CPU time used by the process • The kernel can release the child process’s memory and close its files • The ps command prints the state of a zombie process as Z

Process Termination (Revisited) • What happens when a process with parent init terminates? • Does it become a zombie? • No. init is written so that when a child of init terminates, it calls one of the wait functions . • This prevents the system from being overrun by zombies

The wait and waitpid Functions • Termination of a child process is an asynchronous event • That is, it can happen at any time the parent is running • When it happens, the kernel sends the SIGCHLD signal to the parent • It can either ignore the signal or provide a signal handler • The standard signal handlers are wait and waitpid • A process that calls either of these functions can • Block, if all its children are still running • Return immediately with a terminating child’s status • Return immediately with an error if it doesn’t have any child processes

The wait and waitpid Functions #include <sys/wait.h> pid_t wait(int *statloc); pid_twaitpid(pid_tpid, int *statloc, intoptions) Both return process ID if OK, 0 or -1 on error • Differences between the two processes: • wait can block the caller until a child process terminates • waitpid has an option that prevents it from blocking • waitpid doesn’t wait for the child that terminates first • Instead, it has a number of options that control which process it waits for

The wait and waitpid Functions #include <sys/wait.h> pid_t wait(int *statloc); pid_twaitpid(pid_tpid, int *statloc, intoptions) Both return process ID if OK, 0 or -1 on error • If the calling process has a zombie child process, wait returns immediately with that child’s status • Otherwise, it blocks until a child terminates • If the calling process has multiple children, wait returns when one of the child processes terminates • We can always tell which child terminates, since wait returns its process ID

The wait and waitpid Functions #include <sys/wait.h> pid_t wait(int *statloc); pid_twaitpid(pid_tpid, int *statloc, intoptions) Both return process ID if OK, 0 or -1 on error • For both functions, statloc is a pointer to an integer • If statloc is not NULL, its target location contains the termination status of the process • POSIX.1 specifies that the way a process terminated be checked via four mutually exclusive macros • Depending on which macro returns true, other macros are used to obtain exit status, signal number, etc.

Parsing a String

Parsing a String

Presentation Transcript

Parsing

Parsing

A Parsing Trifecta

Parsing a Name

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing

Parsing a String

Parsing A Bacterial Genome

Parsing

A string phone

A STRING TELEOPHONE

Parsing

Parsing

Parsing