Pthread and POSIX.1c Threads

Pthread and POSIX.1c Threads

What’s A Thread? (Review) • Recall that a process is a complete computational entity,including • credentials, • resources and resource quotas (e.g. memory, disk, CPU time), and • an execution environment with a single set of registers (includingprogram counter). • In the thread model • a process is the entity with which credentials, resources andresource quotas are associated, and • a thread is a separately schedulable entity (with its own set ofregisters, stack, and limited private memory) that shares the sameexecution requirement and other resources with all other processesbelonging to the same process.

Thread Implementation Models • There are two major thread implementation models: • kernel threads – the kernel is aware of threads and makes independent scheduling decisions for each thread, regardless of the process of which it is a member. • user-level threads – faster to create, with faster context switching than kernel threads. A potential problem is that when a single thread in a process blocks, since the kernel isn’t aware of the threads, the entire process – and every other thread in that process – blocks. • Most modern implementations of user-level threads attempt to provide a solution to the blocking problem by providing “wrappers” for the system calls that could block, detecting them before they block, and diverting control to another thread in the same process.

Thread Implementation Models

When To Use Threads • Increased need for throughput – threads are ideal for server applications, as individual threads can be created (perhaps in advance as with many web servers) to handle client requests. • Need for performance – especially on SMP machines, each parallelized component of an application can be executed by a separate thread, potentially in parallel. • I/O and CPU can be overlapped – this can be achieved, to some extent, using things like the POSIX “aio” functions, but having one thread do I/O and another do computation is a more familiar, and possibly manageable, paradigm, permitting the use of simpler algorithms. • Processes are created frequently – In client-server applications, servers often create processes (an expensive operation) for every client that connects.

Pitfalls of Thread Programming • Added complexity – more complex algorithm design and data synchronization • Difficult to debug and test – thread-level debuggers are newer and more primitive than process-level debuggers; a multithreaded application may work fine on a single processor, but fail on an SMP system. • Data synchronization and race conditions – since memory and other resources are shared, explicit synchronization must be used (hence the need for good understanding of the classic process synchronization problems!!!) • Potential for deadlocks – since resource locking is required, careful attention must be paid to the order in which this locking is performed (again, good understanding of deadlocks is necessary) • Non-thread-safe-environments – many standard system libraries and third-party libraries are not reentrant (allowing execution by several threads at the same time). Although thread-safe libraries are now relatively common, you must be certain to use them!

Models of Thread Programming • Master/slave model – one thread (the master) receives each request and creates slave threads to handle the request. • Worker model – a number of worker threads are created to service clients; client requests are placed on a queue, and removed by worker threads when as they finish earlier requests. • Pipelining model – tasks are broken into smaller components, each component providing the input to the next component. For example, a multithreaded compiler might have threads to preprocess, compile, assemble, and optimize code.

Thread Implementations • Provided by many operating systems: • Mach (Carnegie-Mellon) • WIN32 (Microsoft) • OS/2 (IBM) • UNIX (e.g. SUN Solaris, OSF/1) • Unfortunately, each of these has a different API for thread-level operations (e.g. thread creation). • POSIX.1c provides a standard API for threads, and can be implemented as kernel or user-level threads.

POSIX Thread Functions

pthread_create #include <pthread.h>int pthread_create( pthread_t* thread,const pthread_attr_t* attr,void* (*start_routine)(void* ),void* arg ); • This function is used to create a new thread which will begin execution with the function named by the start_routine argument, and with the argument pointed to by arg. The “thread” argument points to a pthread_t object which can effectively be used as the thread identification. (“attr” will be discussed later.)

pthread_join #include <pthread.h>int pthread_join( pthread_t thread,void** value_ptr ); • This function is used to block the calling thread until the thread specified by “thread” terminates (or has already terminated). The “value_ptr” argument is used to retrieve any exit value provide by the terminated thread (by the pthread_exit function).

Mutexes (or Mutices?) • A mutex is similar (identical?) to a binary semaphore. That is, it is used to control a resource that can be used by at most one thread at a time. • To create a mutex, usepthread_mutex_t amutex = PTHREAD_MUTEX_INITIALIZER;orpthread_mutex_t amutex = PTHREAD_RECURSIVE_MUTEX_INITIALIZER;

Mutex Types • Determined by behavior when locking function is called on already owns mutex • Fast – simply suspends • Error checking – return deadlock indicator • Recursive – return immediately, number of locks is increased

pthread_mutex_destroy #include <pthread.h>int pthread_mutex_destroy(pthread_mutex_t *mutex); • Destroys a mutex.

pthread_mutex_init #include <pthread.h>int pthread_mutex_init(pthread_mutex_t *mutex,pthread_mutex_attr *attr); • Initializes a mutex with the attributes specified in the specified mutex attribute object. If attr is NULL, the default attributes are used.

pthread_mutex_lock #include <pthread.h>int pthread_mutex_lock(pthread_mutex_t* mutex ); • Used to lock the specified “mutex.” If it’s already locked, then the calling thread blocks until the mutex is unlocked.

pthread_mutex_unlock #include <pthread.h>int pthread_mutex_unlock(pthread_mutex_t* mutex ); • Used to unlock the specified “mutex.” For a recursive mutex that has been locked multiple times, only the last unlock (the one that reduces the lock count to zero) will release the mutex for use by other threads. If other threads are blocked waiting on the mutex, the highest priority waiting thread is unblocked and becomes the owner of the mutex.

pthread_mutex_trylock #include <pthread.h>int pthread_mutex_trylock(pthread_mutex_t* mutex ); • Tries to lock a mutex. If the mutex is already locked, the calling thread returns without waiting for the mutex to be freed.

Recursive Mutex Locking • A thread that attempts to lock a non-recursive mutex it already owns (has locked) will receive a deadlock indication and the attempt to lock the mutex will fail. • Using a recursive mutex avoids this problem, but the thread must ensure that it unlocks the mutex the appropriate number of times. Otherwise no other threads will be able to lock the mutex.

Dynamic Mutex Initialization • #include <pthread.h>int pthread_mutex_init(phread_mutex_t *mutex,const pthread_mutexattr_t *attr); • Initializes Pthread mutex with specified argument • If attr is NULL, default attribute is used

Pthread Mutex Attribute • Support only one attribute – mutex type • #include <pthread.h> • int pthread_mutexattr_init(pthread_mutexattr_t *attr); • int pthread_mutexattr_destroy(pthread_mutexattr_t *attr); • int pthread_mutexattr_settype(pthread_mutexattr_t *attr, int kind); • int pthread_mutexattr_gettype(const pthread_mutexattr_t *attr, int *kind); • Kind is one of PTHREAD_MUTEX_FAST_NP, PTHREAD_MUTEX_RECURSIZE_NP or PTHREAD_MUTEX_ERRORCHECK_NP

pthread_once #include <pthread.h>int pthread_once(phread_once_t *once_block,void (*init_routine) (void)); • Ensures that init_routine will run just once regardless of how many threads in a process call it. All threads issue calls to the routine by making identical pthread_once calls (with the same once_block and init_routine). The thread that first makes the pthread_once call succeeds in running the routine; subsequent pthread_once calls from other threads do not run the routine.

pthread_sigmask • #include <pthread.h>int pthread_sigmask(int how, const sigset_t *set,sigset_t *oset); • Examines or changes the calling thread’s signal mask.

pthread_self • #include <pthread.h>pthread_t pthread_self( void ); • This function returns the thread ID of the calling thread.

pthread_equal • #include <pthread.h>int pthread_equal( pthread_t t1,pthread_t t2 ); • Returns zero if the two thread IDs t1 and t2 are not equal, and non-zero if they are equal.

pthread_exit • #include <pthread.h>void pthread_exit(void *value); • Terminates the calling thread, returning the specified value to any thread that may have previously issue a pthread_join on the thread.

pthread_kill • #include <pthread.h>int pthread_kill(pthread_t thread,int sig); • Delivers a signal (sig) to the specified thread.

POSIX Thread Synchronization Tools • We have already seen the mutex, similar to a binary semaphore, and the pthread_join and pthread_once functions. • Other thread synchronization facilities in pthreads include: • counting semaphores (covered next) • condition variables (as in monitors) • Barriers

Condition Variables • Recall that a condition variable (in a monitor) is a synchronization object on which a process may wait(stepping outside the monitor) until another process signals it. • With POSIX threads, a condition variable is used in conjunction with a mutex. When executing inside a monitor, the mutex is locked. If necessary a thread waits on the condition variable, which unlocks the mutex,allowing other threads to enter the critical section. • Later, when the conditions are right, a thread signals the condition variable, unblocking the highest priority thread waiting on the condition.

Initializing a Condition Variable • A condition variable is created and initialized using code similar to this: • #include <pthread.h>pthread_cond_t a_c_v = PTHREAD_COND_INITIALIZER; • where a_c_v is an arbitrary condition variable name.

Waiting on a Condition Variable • To wait on a condition variable, a thread must have already locked a mutex. Then it executes: • #include <pthread.h>pthread_cond_wait(&a_c_v,&a_mutex); • The mutex is unlocked and the calling thread blocks. • On return from the function, the mutex will again have been locked, and will be owned by the calling thread. • Do not use a recursive mutex with this function.

Signaling a Condition Variable • A condition variable can be signaled in two ways: • pthread_cond_signal (pthread_cond_t *cond) will unblock the highest priority thread that has been waiting the longest. • pthread_cond_broadcast (pthread_cond_t *cond)will unblock all threads in priority order, using FIFO order for threads with the same priority. • A thread may also use pthread_cond_timedwait to wait on a condition variable; an absolute time parameter is provided to unblock the process if the absolute time passes.

After the Signals Over… • Recall the rules regarding a process (thread) that performs a signal operation on a condition variable. Either • the process must immediately exit the monitor (e.g. unlock the mutex) [Brinch Hansen’s approach], or • the process must wait while the awakened process uses the controlled resource [Hoare’s approach]. • The first approach is recommended.

Barriers • A barrier is essentially a gate at which threads must wait until a specified number of threads arrive; each is then allowed to continue. • After the blocked threads continue, the barrier is effectively reinitialized to its original state, ready to block threads until a group of the appropriate size has again reassembled.

Creating and Initializing a Barrier • To (dynamically) initialize a barrier, use code similar to this (which sets the number of threads to 3): • pthread_barrier_t b;pthread_barrier_init(&b,NULL,3); • The second argument specifies an object attribute; using NULL yields the default attributes. • This barrier could have been statically initialized (in QNX) by assigning an initial value created using the macro PTHREAD_BARRIER_INITIALIZER(3).

Waiting at a Barrier • To wait at a barrier, a process executes:pthread_barrier_wait(&b); • One of the threads continuing from the barrier will be returned the value BARRIER_SERIAL_THREAD; the others will receive 0. This property can be used to allow one of the threads to execute unique code. Consider, for example, the operation of the pthread_once function. If a thread waiting at a barrier is signaled, then it resumes waiting at the barrier after the signal handler returns.

Pthread Attributes • Default attributes, frequently acceptable, are provided when a NULL parameter is provided for the attribute parameter in many pthread functions. In some cases, however, explicit attribute settings may be required. In this case, an attribute object must be created so the attributes can be modified. Creating and initializing an attribute object is easy:pthread_attr_t my_attributes;pthread_attr_init(&my_attributes);

Setting Attribute Values • Once an initialized attribute object exists, changes can be made. For example: • To change the stack size for a thread to 8192 (before calling pthread_create), do this: • pthread_attr_setstacksize(&my_attributes, (size_t)8192); • To get the stack size, do this: • size_t my_stack_size;pthread_attr_getstacksize(&my_attributes, &my_stack_size);

Other Attributes • Detached state – set if no other thread will use pthread_join to wait for this thread (improves efficiency) • Guard size – use to protect against stack overfow • Inherit scheduling attributes (from creating thread) – or not • Scheduling parameter(s) – in particular, thread priority • Scheduling policy – FIFO or Round Robin • Contention scope – with what other threads does this thread compete for a CPU • Stack address – explicitly dictate where the stack is located • Lazy stack allocation – allocate on demand (lazy) or all atonce, “up front”

Understanding Pthreads Implementation • Implementations fall into three categories: • pure user space implementations, • pure kernel thread implementations, or • somewhere between the two (referred to as two-level schedulers, lightweight processes, or activations). • There are advantages and disadvantages to implementations based on which of these is used. • Pure user space implementations don’t provide global schedulingscope, and don’t allow multiple threads from the same process toexecute in parallel on multiple CPUs. • Pure kernel thread implementations don’t scale well when a processhas many threads.

User Threads • User threads are programming abstractions that exist to be accessed by calls from within a user program. • They might not rely on kernel threads, even if they are provided. • A kernel thread is an abstraction for a system execution point within a process. • Implementation of POSIX pthreads doesn’t require use of kernel threads, even if they exist.

Older User-level Thread Packages • Some operating systems may provide a non-POSIX user-level thread package that has similarity to the POSIX thread standard. • POSIX threads may be built on top of these (sometimes easily), or an implementation may be entirely separate. It’s all up to the implementer. • Of course, these older packages may be significantly different, and in any case, probably don’t have the same syntax and precise semantics from one system to another.

User Space Implementations • User space implementations include: • a “library scheduler” that runs in user mode to schedule the threads in a single process; there’s one of these in each process using threads • the operating system scheduler, which schedules each process independently. • A simple-minded way to provide a user space implementation is to switch between threads using the user mode context switching functions like setjmp, longjmp, and signals.

Pure User Space Advantages • It doesn’t require any change to the operating system, meaning a new OS can quickly provide support for Pthreads. • Context switching between threads is usually faster than if the kernel was involved, because no user-kernel and kernel-user address space switches are required. • New threads can be created quickly. Each thread is just another time slice of the set of resources originally assigned to the process.

Pure User Space Disadvantages • The all-to-one mapping of user threads to a single kernel-schedulable entity means that threads within the same process compete with each other for CPU cycles. Priority changes are only relative to threads within the same process, not to threads in other processes. This has significant negative implications for real-time programs. • Multiple threads in the same process cannot be running in parallel on multiple CPUs.

Kernel Space Threads • This is basically a one-to-one mapping of threads in the user program to schedulable entities in the operating system. • Much information that must be maintained by the kernel for each thread is of the same size and scope as the information that was previously used for a single process. • Some information, such as the open file table, is still associated only with the process.

Pure Kernel Space Advantages • Threads compete against all other threads in the system for CPU cycles. Thread priorities are global. • Multiple threads in a single process can run in parallel on multiple CPUs.

Pure Kernel Space Disadvantages • Creating a new thread does require kernel overhead (although less than creating a new process). If the application will never run on a multiprocessor, user space threads are probably more efficient. • Applications using a lot of threads (which could mean 10 or 100 depending on the system) will consume significant system resources and degrade its overall performance, hurting other processes.

Two-level Schedulers • In a two-level scheduler system, the library scheduler and the kernel scheduler cooperate to schedule user threads. • Called a some-to-one mapping, many user threads are mapped to any of a pool of kernel threads. • A user thread may not have a unique mapping to a kernel thread; the mapping may change over time. • The Pthreads library assigns user threads to run in a process’ available kernel threads, and the kernel schedules kernel threads from the collection of all processes’ runnable kernel threads.

Example Cases • Suppose a program’s user threads frequently sleep on timers, events, or I/O completion. • It’s not logical to tie each of these to a kernel thread, since they’d see little CPU activity. • It’s better in this case to allow the library scheduler to associate these with a single kernel thread, yielding less kernel overhead and better performance. • Or suppose user threads are frequently CPU-bound. In this case, the library scheduler can simply associate each thread with a separate kernel thread. • This allows the CPU considerable flexibility in scheduling, selecting any of the available user threads for execution.

Pthread and POSIX.1c Threads

Pthread and POSIX.1c Threads

Presentation Transcript

Programming with Posix Threads

Realizing Concurrency using Posix Threads (pthreads)

C Pthread

POSIX Threads Programming

POSIX Threads

MPI vs POSIX Threads

Pthread and OpenMP

Pthread (continue)

Programming with POSIX* Threads

POSIX Threads

MPI vs POSIX Threads

Pthread II

POSIX threads and C++ facilities

Lecture 3 Posix Threads

POSIX Threads

CS162B: POSIX Threads

Pthread Programming

POSIX Threads Programming

Lecture 7: POSIX Threads - Pthreads

POSIX Threads

Posix Threads

Lecture 3 Posix Threads