Thread

Multi-Processing Multi-Threading Process 1 Process 2 Thread 1 Thread 2 Stack 1 Stack 2 Stack Stack Heap Heap Heap fork Uninitialized Data Uninitialized Data Uninitialized Data Initialized Read-Write Data Initialized Read-Write Data Initialized Read-Write Data Initialized Read-Only Data Initialized Read-Only Data Initialized Read-Only Data P1 T1 T2 P2 Text Text Text Text Thread Thread is separate part of process, providing it’s specific working flow, and sharing the process data and resources with other threads. • Thread Attributes: • Thread ID • Set of registers • (stack pointer, • program counter, etc.) • Stack • (local variables, • return addresses) • errno • Signal mask • Priority Introduction to Network Programming in UNIX & LINUX

The Particularity of Multi-Threaded Programming • All threads within a process share the same global memory. This makes the sharing of information easy between the threads, but along with this simplicity comes the problem of synchronization. • Reentrant Functionality • Functionality (procedure, object) is reentrant if all its task-unique information (such as local variables) • is kept in a separate area of memory that is distinct for each thread (or process), executing this functionality • in simultaneous mode. • Thread-Safe Functionality • Functionality is thread-safe if: • It is reentrant • All its parts, requiring execution by single thread only, are protected from multiple simultaneous execution. • (For this purpose some form of mutual exclusion is used.) • Safety from Deadlocks, Livelocks and Starvation • The following synchronization problems could occur in multi-threaded • (multi-process) environment as result of Race Condition: • A Deadlock is a situation in which two or more threads (or processes) • sharing the same resource are effectively preventing each other • from accessing the resource. • A Livelock is a situation in which two or more threads (or processes) • continually change their state, each in response to state change • in the other one. The result is that none of the running threads could • make further progress. • A Starvation is a situation where a thread (or process) is unable • to gain regular access to shared resource, because the resource • are made unavailable for long periods by "greedy" threads (processes), • locking it for a long time. Introduction to Network Programming in UNIX & LINUX

Thread A Resource X Resource Y Thread B lock lock lock (blocks) (blocks) lock waiting for Thread B waiting for Thread A Thread A Thread B Thread N … Thread A Resource X Resource Y Thread B lock lock (fails) try-lock (fails) try-lock unlock unlock repeating lock lock Thread Deadlock and Livelock Examples The classic Deadlock case is where two Threads both require two shared Resources, and they try to lock them in opposite order. In more sophisticated cases the Deadlock scenario could contain the chain of multiple threads waiting each other: The Livelock may arise from attempts to avoid Threads blocking via a try-lock. After the try-lock failure, both threads release their lock and no work is done. Then the same locking pattern is repeated. Introduction to Network Programming in UNIX & LINUX

Thread A Resource X Resource Y Thread B lock (blocks) lock lock unlock unlock lock unlock Thread A Thread B Thread N … unlock Deadlock Resolution Example To avoid Deadlock in previous example, both Threads have to follow the same Resource locking order. In other words, both Threads have to establish common Resource locking protocol. In more sophisticated systems with big number of Locking Resources, the establishing of common Resource Locking protocol could be problematic or impossible. In this case the system must provide functionality, prohibiting the specific lock operation, which “fastens” the Deadlock chain. Introduction to Network Programming in UNIX & LINUX

Thread Life Cycle • Thread Creation • Each process has at least one thread – the Main Thread, where function main() of the process is executed. • Any other thread could be created with system call, issued by the Main Thread or any other running thread. • Thread Execution • The functionality of any thread is execution of its own “thread-main” function, using its own thread stack. • The “thread-main” function of the Main Thread is function main(). The “thread-main” function of any other • thread is specified during the thread creation. • Thread Termination • Execution of any specific thread is terminated in following cases: • When its “thread-main” function returns • When Kernel terminates the execution of specific thread by request from another thread or • as result of signal handling. • When process terminates, finishing the execution of all its threads. • This occurs, when function main() returns, exit() called by any thread, or process terminated by signal. • Thread “Post-mortem” Termination Status • The return value of “thread-main” function is interpreted as thread termination status and preserved by kernel • in the scope of process after specific thread termination. The thread termination status then could be extracted • by request from any other running thread. This is default behaviour, which could be changed by specific thread, • if it desires to run in daemon mode. Each Operation System, which supports multi-threading, provides its own specific set of system calls, implementing the thread life cycle functionalities. In the following slides we will discuss the set of system calls, provided by POSIX standard, which is supported by most modern UNIX and Linux operation systems. Introduction to Network Programming in UNIX & LINUX

Attention: Parameter arg would NOT point to automatic variable in Calling Thread to avoid dereferencing of arg by Newly-created Thread after its de-allocation on stack of Caller Thread. POSIX Thread API. PThread Basic System Calls POSIX declares portable API, providing set of thread-related system calls named pthread_XXX(). All these system calls require the following synopsis: #include <pthread.h> gcc … –D_REENTRANT [–D_THREAD_SAFE] … –lpthread • The header file <pthread.h> containing the declaration of all pthread_XXX() system calls, to be #include-d. • Macro name _REENTRANT (or _THREAD_SAFE, or _POSIX_THREAD_SAFE_FUNCTIIONS) must be #define-d • in your code or to be provided via –D compiler option in compilation time to use thread-safe version of errno • and standard functions. • The standard library /usr/lib/libpthread.a would be linked via –l compiler option in linkage time. int pthread_create(pthread_t *tid, const pthread_attr_t *attr, void *(*func) (void *), void *arg); • Creates new Thread under the running process • Argument attr specifies the thread attributes. • Default parameters are assigned if NULL specified. • The new Thread executes function func with arguments arg. • On success, the tid argument is filled with ID of newly-created Thread. • Returns 0 on success, or positive error code in case of failure. void pthread_exit (void *status); • Terminates the execution of calling thread. Does not return to caller. • The status must not point to a thread-local object since that object disappears when the thread terminates. • Note: If the function, running under the thread, return-s. The return value is the exit status of the thread. • If the main() function of the process return-s or if any thread calls exit(), the whole process terminates, • including all its threads. Introduction to Network Programming in UNIX & LINUX

pthread_t pthread_self (void); Note: POSIX standard does not provide methods like “Join Any” or “Join All”, supported by some thread APIs. POSIX supposes, that thread calling “join”, would explicitly know, whom does he join. The “Join All” method, suspending execution of calling thread until execution of all other non-detached threads in the current process is terminated,could be implemented at application level in following way: - The application would maintain the ID list of all active non-detached threads - The calling thread navigates through ID list and calls pthread_join() for each ID in this list. int pthread_detach (pthread_t tid); PThread Basic System Calls • Returns thread ID of calling thread int pthread_equal (pthread_t tid1, pthread_t tid2); • Compares two thread IDs. • Returns positive value (true) if IDs are equal, returns 0 otherwise #include <sched.h> int sched_yield(void); • Yields the current thread execution in favor of another thread with the same or greater priority (if such exists). • Returns 0 (no errors) int pthread_join (pthread_t tid, void ** status); • Suspends execution of calling thread, until execution of target thread tid is terminated (as waitpid() for process) • Returns 0 on success, error code on failure. • If non-NULL status argument is specified, it is filled with target thread termination status. • Changes the specified thread to be detached. The Detached Thread is like a daemon process: when it • terminates, all its resources are released, and it could not be pthread_join-ed by another thread. • Commonly called by the thread, that wants to detach itself: pthread_detach(pthread_self()); • Returns 0 on success, error code on failure. Introduction to Network Programming in UNIX & LINUX

/* attribute container initialization / deinitialization */ int pthread_attr_init(pthread_attr_t *attr); int pthread_attr_destroy(pthread_attr_t *attr); /* setter and getter for Detach State attribute */ int pthread_attr_setdetachstate(pthread_attr_t *attr, int detachstate); int pthread_attr_getdetachstate(const pthread_attr_t *attr, int *detachstate); /* realtime-related setters and getters for Scheduling Policy attributes */ int pthread_attr_setschedpolicy(pthread_attr_t *attr, int policy); int pthread_attr_getschedpolicy(const pthread_attr_t *attr, int *policy); int pthread_attr_setschedparam(pthread_attr_t *attr, const struct sched_param *param); int pthread_attr_getschedparam(const pthread_attr_t *attr, struct sched_param *param); int pthread_attr_setinheritsched(pthread_attr_t *attr, int inherit); int pthread_attr_getinheritsched(const pthread_attr_t *attr, int *inherit); int pthread_attr_setscope(pthread_attr_t *attr, int scope); int pthread_attr_getscope(const pthread_attr_t *attr, int *scope); Thread Attributes • PThread API provides type pthread_attr_t, representing container of PThread Attributes used during Thread Creation • with system call pthread_create(). The following Thread Attributes are defined: • detachstate • The thread ID and exit status for Joinable Thread are retained for a later pthread_join() by some other thread. • The Detached Thread is like a daemon process: when it terminates, all its resources are released. • Possible values: PTHREAD_CREATE_JOINABLE (default), PTHREAD_CREATE_DETACHED. • schedpolicy, schedparam, inheritsched, scope • The Scheduler is the part of Kernel that decides which runnable resource (process, thread) will be executed • by the CPU next. The 4 attributes above describe the Scheduling Policy and Scheduling Priority of the thread. • The default scheduling policy of the thread is non-realtime. Setting the specific values for these attributes is • meaningful only for real-time programming (for more information see corresponding man sections) • All functions return 0 on success and a non-zero error code on error. • On success, the getter functions also store the current value of the requested attribute • in the location pointed to by their second argument. Introduction to Network Programming in UNIX & LINUX

source target kernel cancel request deferral time test cancel stop cancellation point Deferred Cancellation Thread Cancellation • Thread Cancellation is the mechanism by which one • (source) thread can send request to another (target) thread • to terminate its (target thread’s) execution. • Depending on its settings, the target thread can then either: • ignore the request, • honor it immediately, • or defer it till it reaches a cancellation point. • Cancellation State: • PTHREAD_CANCEL_ENABLE – (default) thread ready to accept • cancellation requests • PTHREAD_CANCEL_DISABLE – threads ignores • cancellation requests • Cancellation Type: • PTHREAD_CANCEL_DEFERRED – (default) the cancellation request is pending until the next cancellation • point (test of cancellation request existence is initiated by target thread). • PTHREAD_CANCEL_ASYNCHRONOUS – the cancellation request is executed immediately by the kernel. int pthread_cancel (pthread_t thread); - sends cancellation request to target thread int pthread_setcancelstate (int state, int *oldstate); - sets self cancellation state int pthread_setcanceltype (int type, int *oldtype); - sets self cancellation type void pthread_testcancel (void); - explicitly tests and executes pending cancellation • Cancellation Points in thread execution are the places, where a test for pending cancellation requests is performed and cancellation is executed if positive. The following POSIX threads functions are cancellation points: • pthread_join, pthread_cond_wait, pthread_cond_timedwait, pthread_testcancel, sem_wait, sigwait. • All other POSIX threads functions are guaranteed not to be cancellation points. Note: In the high level programming languages (C++, Java) it is recommended to terminate the threads on application level, using user-defined cancellation flag variable with following deferred flag testing. This allows to release safely all system resources allocated by specific thread, before its termination. Introduction to Network Programming in UNIX & LINUX

Thread Cleanup Cleanup Handlers are functions that get called when a thread terminates (if pthread_exit() called or because of cancellation). They free the resources (unlock synchronization devices, close open descriptors, etc.) that a thread may hold at the time of its termination. The POSIX provides system calls to installation and removal of Cleanup Handlers in stack-like order (LIFO). On thread termination all Cleanup Handlers are executed in reverse order, beginning from most recently installed on stack. void pthread_cleanup_push(void (*handler) (void *), void *arg); - installs the handler with argument arg on stack void pthread_cleanup_pop (int execute); - removes and optionally (execute>0) executes the handle, most recently installed on stack • Thread Cleanup in C++ • In C++ for safe de-allocation of resources during thread termination, the GUARD pattern is used. • The Guard class: • Provides the allocation of specific resource in class Constructor • Provides the de-allocation of already allocated resource in class Destructor • The Guard is instantiated as automatic variable in any function (or scope), where resource allocation is required. • When the program execution leaves this scope, the destructors of all automatic variables are always executed. • As result, the resource is de-allocated automatically, when corresponded Guard instance is destructed. • Thread Cleanup in Java • In Java for guaranteed de-allocation of resources the try{…} catch{…}finally{…}constraint could be used. • The finally block always executes when the try block exits. • This ensures that the finally block is executed even if an unexpected exception occurs. Introduction to Network Programming in UNIX & LINUX

Thread B Thread B Thread A Thread C int pthread_atfork(void (*prepare)(void), void (*parent)(void), void (*child)(void)); S t a c k s S t a c k Potential Deadlock Problem in “Fork-One” Model. If at the time of the fork() call another thread in the Parent owns a lock, this lock will be never unlocked in the Child process, because lock owner thread is not duplicated in the Child. If then any thread in the Child process needs to acquire this lock, the Deadlock occurs. Heap Heap Data Data Text Text fork() Parent Child Solution of Potential Deadlock Problem in “Fork-One” Model. To avoid potential Deadlock after the fork() the following “atfork” handlers could be installed before the fork(): - The prepare handle acquires the lock (waiting in blocking mode, while another thread will release the lock) - The parent and child handlers release the lock, acquired by parent handler As result, the lock object is released in Child process, and could be acquired in future by any thread. Multi-Threading & Fork. POSIX Fork-One Model • The POSIX standard provides "fork-one" model: • fork() duplicates the whole memory space including lock objects, • but only one calling thread; • other threads are not running in the child process. • The pthread_atfork() function declares fork handlers to be called prior to and following fork(), • within the thread that called fork(). • Any of prepare, parent or child handlers could be specified as NULL. • Returns 0 on success or error number in case of failure. • The prepare handlers will be called in LIFO order from the Parent process just before the fork() processing begins. • The parent handlers will be called in FIFO order from the Parent process just after fork() processing finishes. • The child handlers will be called in FIFO order from the Child process just after fork() processing finishes. Introduction to Network Programming in UNIX & LINUX

Note: For systems, using SIGALRM for implementation of system call sleep(), it is recommended in multi-threaded environment to use system call nanosleep() instead of it. Signal Handling In Threads • The POSIX standard provides the following system calls for handling of signals in threads: • #include <pthread.h> • #include <signal.h> • int pthread_sigmask(int how, const sigset_t *newmask, sigset_t *oldmask); • Changes the signal mask for the calling thread as described by arguments : • - how (SIG_SETMASK, SIG_BLOCK, SIG_UNBLOCK) • - newmask (initialized and modified by sigXXXset() macros) • If oldmask is not NULL, the previous signal mask is stored. • Any new thread inherits the calling thread's signal mask • Returns 0 on success, error code on failure. • Note: signal masks are set on a per-thread basis, • but signal actions (sigaction()), are shared between all threads. • int pthread_kill(pthread_t thread, int signo); • Sends signal number signo to the specific thread. • If signo is 0, the actual signal is not sent. Used to check thread existence. • Returns 0 on success, error code (ESRCH – thread doesn’t exist, EINVAL – bad parameter) on failure. • int sigwait(const sigset_t *set, int *sig); • Suspends the calling thread until one of the signals in set becomes pending on the calling thread. • Accepting the signal, clears it from pending signals mask, and stores the number of the signal under sig. • The signals in set must be blocked and not ignored on entrance to sigwait(). • If the delivered signal has a signal handler function attached, that function is not called. • The sigwait() is a cancellation point. • Returns 0 on success, error code on failure • On platforms, supporting also non-POSIX version of sigwait() call, • the -D _POSIX_PTHREAD_SEMANTICS compilation flag is required. Introduction to Network Programming in UNIX & LINUX

Mutex Mutex Mutex Mutex Mutex. Unlock Mutex. Unlock Mutex. Lock Mutex. Lock Mutex. Lock Mutex. Unlock Mutex. Unlock Mutex. Lock Thread Synchronization. Mutex • Mutex is a MUTual EXclusion device, used to serialize access to a section of reentrant code that cannot be executed concurrently by more than one thread. Mutex is useful for protecting shared data from concurrent modifications. • A Mutex has two possible states: • locked (owned by one thread) • unlocked (not owned by any thread) • A Mutex could be unlocked only by the same Thread, which locked it. • A Mutex can never be owned by two different threads simultaneously. • A Thread attempting to lock a Mutex that is already locked by another Thread, is suspended until the owning • Thread unlocks the Mutex first. • The POSIX defines 3 types of Mutex: • Normal (Fast) Mutex (default type) • Could be locked only once by the same Thread. • Attempt to lock Normal Mutex, already locked by the same Thread, leads to Deadlock. • Recursive Mutex • Could be locked repeatedly by the same Thread. • To be unlocked, the number of unlock operations must be equal to number of performed locks. • Error Checking Mutex • Could be locked only once by the same Thread. • Attempt to lock Error Checking Mutex , already locked by the same Thread, results in error. Thread 1 Thread 2 Thread 1 Thread 2 owner owner Introduction to Network Programming in UNIX & LINUX

Mutex Initialization System Calls int pthread_mutex_init (pthread_mutex_t *mutex, const pthread_mutexattr_t *mutexattr); • Initializes the Mutex referenced by mutex with attributes specified by mutexattr. • If mutexattr is NULL, the default attributes are used • Returns 0 on success, • error (EBUSY- already initialized, EINVAL-bad parameter, ENOMEM-no memory) on failure. pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER; • Static Mutex initializer. Could be used to initialize mutex with default attributes. PThread API provides type pthread_mutexattr_t, representing Mutex Attributes container with only one attribute – Mutex Type, which has following predefined values: PTHREAD_MUTEX_NORMAL - Normal (Default, Fast) Mutex is locked only once, repeated trial leads to deadlock PTHREAD_MUTEX_RECURSIVE - Recursive Mutex permits repeated locks by the same thread, multiple locks to be followed by equal number of unlock operations PTHREAD_MUTEX_ERRORCHECK - Error Check Mutex denies repeated lock trials, returning error The following utilities are provided for Mutex Attributes container maintenance: int pthread_mutexattr_init (pthread_mutexattr_t *attr); - container initializer int pthread_mutexattr_settype(pthread_mutexattr_t *attr, int kind); - Mutex Type setter int pthread_mutexattr_gettype(const pthread_mutexattr_t *attr, int *kind); - Mutex Type getter int pthread_mutexattr_destroy(pthread_mutexattr_t *attr); - container deinitializer • All these utilities return 0 on success, error EINVAL if bad parameter specified int pthread_mutex_destroy (pthread_mutex_t *mutex); • Destroys the Mutex referenced by mutex. • Returns 0 on success, • error (EBUSY- currently locked by another thread, EINVAL-bad parameter) on failure. Introduction to Network Programming in UNIX & LINUX

int pthread_mutex_lock (pthread_mutex_t *mutex); int pthread_mutex_trylock (pthread_mutex_t *mutex); int pthread_mutex_unlock (pthread_mutex_t *mutex); Note: The Normal (Default, Fast) Mutex is basic mutex type, supported by most of platforms and APIs. For platforms and APIs, which don’t support Recursive and Error Check mutex types, these types could be implemented at application level on top of the Normal Mutex. For this purpose the additional “owner id” and “lock count” attributes would be maintained per each Normal Mutex by the application. Mutex Locking System Calls • Locks referenced mutex on behalf of calling thread, which becomes the Mutex Owner. • If the mutex is already locked by another thread, the calling thread blocks until the mutex becomes available. • Note: If a signal is delivered to a thread waiting for a mutex, upon return from the signal handler • the thread resumes waiting for the mutex as if it was not interrupted. • If the mutex is already locked by the same thread, result depends on mutex type, as follows: • PTHREAD_MUTEX_NORMAL – repeated lock trial leads to Recursive Deadlock • PTHREAD_MUTEX_RECURSIVE – returns with immediate success, internal lock count is increased. • PTHREAD_MUTEX_ERRORCHECK – returns with EDEADLK error • Returns 0 in success or error on failure. • Tries to lock the mutex like pthread_mutex_lock() does, but in non-blocking mode. • If the mutex could not be locked immediately, returns EBUSY error. • Returns 0 on success, error on failure. • Releases referenced mutex. As result, the mutex becomes available to any other waiting threads. • If the mutex is of type PTHREAD_MUTEX_RECURSIVE, it becomes available for other threads • only when lock count reaches 0. • Returns 0 on success or error EPERM if the current thread is not owner of the mutex. Introduction to Network Programming in UNIX & LINUX

Mutex Mutex Mutex Mutex Mutex. Lock Mutex. Unlock Mutex. Lock Mutex. Lock Mutex. Unlock Mutex. Unlock Mutex. Unlock Mutex. Unlock Mutex. Lock Mutex. Lock Mutex. Lock Mutex. Unlock CondVar CondVar CondVar Condition Variable • Condition Variable is a synchronization device that allows threads to suspend execution and release the • processors until shared data will be changed to have a desired state. • The basic operations on Condition Variables are: • Wait for the specific state of shared data, suspending the thread execution until another thread changes the • shared data and notifies (signals) the Condition Variable, that state is changed. • Notify one (signal) or all (broadcast) threads, waiting for specific condition, that shared data state is changed. • A Condition Variable is stateless signaling device. Notification (signal) does not change the state of • device. It affects only the thread(s), that are waiting on this Condition Variable in the moment of notification (signal). • A Condition Variable must always be associated with a Mutex, to avoid the race condition where a thread prepares to wait on a Condition Variable and another thread notifies (signals) the condition just before the first thread actually waits on it. Thread 1 Thread 2 Thread 1 Thread 2 Thread 1 Thread 2 owner owner owner Mutex Mutex State NOT Fitting CondVar.Wait(Mutex) State NOT Fitting CondVar.Wait(Mutex) Change State CondVar. Notify State NOT Fitting CondVar.Wait(Mutex) State is Fitting Change State CondVar. Notify Introduction to Network Programming in UNIX & LINUX

Condition Variable Initialization System Calls int pthread_cond_init (pthread_cond_t *cond, pthread_condattr_t *cond_attr); • Initializes the Condition Variable referenced by cond with attributes specified by cond_attr. • If cond_attr is NULL, the default attributes are used • Returns 0 on success, • error (EBUSY- already initialized, EINVAL-bad parameter, ENOMEM-no memory) on failure. pthread_cond_t cond = PTHREAD_COND_INITIALIZER; • Static Condition Variable initializer. Could be used to initialize Condition Variable with default attributes. PThread API provides type pthread_condattr_t, representing Condition Variable Attributes container and following utilities for its maintenance: int pthread_condattr_init(pthread_condattr_t *attr); - container initializer int pthread_condattr_destroy(pthread_condattr_t *attr); - container deinitializer Currently PThread API defines only one default type of Condition Variable. So the type pthread_condattr_t and corresponded utilities are provided only for compliance with the POSIX standard. int pthread_cond_destroy (pthread_cond_t *cond); • Destroys the Condition Variable referenced by cond. • Returns 0 on success, • error (EBUSY- currently used by another thread, EINVAL-bad parameter) on failure. Introduction to Network Programming in UNIX & LINUX

Condition Variable Utilization System Calls int pthread_cond_wait (pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_timedwait (pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime); • These condition wait calls are used to block the calling thread on a Condition Variable until another thread • notifies (signals) the Condition Variable. • These calls must be called when Mutex referenced by mutex, is locked by the calling thread. • Both condition wait system calls provide the following functionality: • - suspend current thread • - release the locked mutex • - wait for notification from another thread to be sent to cond-itional variable • (or expiration of specified timeout – for pthread_cond_timedwait () only ) • - re-acquire the lock on mutex • - unblock current thread • The condition wait calls could be interrupted (spuriously waken up) by delivered UNIX signals. • So, if thread blocks on condition variable, awaiting for some logical condition to become true, • the logical condition must be re-evaluated, when wait call finished. • These system calls are cancellation points. If cancellation request is activated upon waiting call, • the mutex is re-acquired before calling the first cancellation cleanup handler. • Return values: 0 on success, EINVAL - bad parameter or invalid concurrent usage of condition variable • with different mutex objects, EINTR – interrupted by signal ETIMEDOUT - timeout expiration. int pthread_cond_signal (pthread_cond_t *cond); int pthread_cond_broadcast (pthread_cond_t *cond); • These condition signaling (condition notification)calls are used to unblock (wake up) the threads, which • currently waiting on this cond-ition variable object. • The call pthread_cond_signal() wakes up one of threads waiting on cond, • the pthread_cond_broadcast() wakes up all threads waiting on cond. • The condition notification calls have no effect if there are no threads currently blocked on cond. • To avoid race condition, these calls would be called when mutex is locked. • The unblocking order of waiting threads depends on Scheduler Policy. • Return 0 on success, error (EINVAL) on failure. Introduction to Network Programming in UNIX & LINUX

Condition Variable Example Consider two shared variables x and y, protected by the mutex mut, and a condition variable cond that is to be signaled whenever x becomes greater than y. Waiting until x is greater than y is performed as follows: pthread_mutex_lock(&mut); while (x <= y) { pthread_cond_wait( &cond, &mut); } /* operate on x and y */ pthread_mutex_unlock( &mut); Modifications on x and y caused x to become greater than y should signal the condition: pthread_mutex_lock( &mut); /* modify x and y */ if (x > y) { pthread_cond_broadcast(&cond); } pthread_mutex_unlock( &mut); To wait for x to becomes greater than y with a timeout of 5 seconds, do: struct timespec timeout; int retcode = 0; pthread_mutex_lock( &mut); clock_gettime(CLOCK_REALTIME, &timeout); timeout.tv_sec = now.tv_sec + 5; while (x <= y && retcode != ETIMEDOUT) { retcode = pthread_cond_timedwait(&cond, &mut, &timeout); } if (retcode == ETIMEDOUT) { /* timeout occurred */ } else { /* operate on x and y */ } pthread_mutex_unlock( &mut); Note: To avoid mutex locking forever (in case of unexpected thread termination), the cleanup handler could be used. Introduction to Network Programming in UNIX & LINUX

Synchronization Devices in C++ In typical C++ APIs the Mutex and Condition Variable devices are represented by classes like following: class Mutex { friend class CondVar; public: Mutex(); virtual ~Mutex(); // Locks the mutex, works in blocking mode. bool lock(); // Unlocks the mutex bool unlock(); // Tries to lock the mutex, works in non-blocking mode bool tryLock(); … }; class CondVar { public: CondVar(); virtual ~CondVar(); // Unlocks the locked Mutex ,waits notification from // another thread, then restores the Mutex lock. bool wait(Mutex& mutex, long sec = 0, long nsec = 0); // Unblocks the first thread, wait()-ing on this Conditional Variable bool notify(); // Unblocks all threads, wait( )-ing on this Conditional Variable bool notifyAll(); … }; Together with class Thread, these classes encapsulate the specifics of Operation System calls, and provide Object Oriented API for development of multi-threaded applications. The following pattern provides safe Mutex locking control: class MutexGuard { public: // Constructor: locks specified Mutex MutexGuard(Mutex& mutex) :m_mutex(mutex) {m_mutex.lock();} // Destructor: unlockes the Mutex ~MutexGuard( ) {m_mutex.unlock();} private: // Reference to Mutex to be locked/unlocked Mutex& m_mutex; }; Sometimes the pair (Mutex + Condition Variable) is called Monitor. The Monitor plays the role of one global synchronization device, providing Facade functionalityfor encapsulated mutex and conditional variable. Introduction to Network Programming in UNIX & LINUX

Thread Synchronization in Java • Unlike C++, the classic Java standardized the multi-threading interface as part of language syntax. • Each instance of generic base class java.lang.Object (and, therefore, any instance of any class) is declared to have • its own Monitor (which could be considered as impartible pair: Mutex + Condition Variable) • In terms of Java, the thread owns the Monitor, corresponded to specific object in following cases: • by calling a synchronized instance method of this object • by executing a synchronizedstatic method of that class • using synchronized(obj){…} block • In terms of C++, the synchronized block could be considered as: • { • MutexGuard dummy (obj.monitor.mutex); // owns object monitor • … • } • The suspending of thread execution until desired state of specific object is achieved, and wakening up the • suspended threads is provided in Java by methods Object.wait(), Object.notify(), Object.notifyAll(). • In terms of Java, all these methods could be called successfully only by a thread that is the owner of this • object's monitor. Otherwise the IllegalMonitorStateExceptionis thrown. • In terms of C++, the call to the method Object.wait() could be considered as: • { • MutexGuard dummy (obj.monitor.mutex); // owns object monitor • … • obj.monitor.cond_var.wait (obj.monitor.mutex); // waits object monitor to be notified • } • Object Monitor as impartible pair: Mutex + Condition Variable, is not always useful for optimal synchronization • design. If different groups of functionalities (threads) need to wait for different states of the same synchronized • object, it is useful to have multiple Condition Variables (one per waiting group), associated with the same single • Mutex responsible for object synchronization. • Such design was impossible in Java till release SE 1.5. Beginning from release 1.5, Java introduced new package • java.util.concurrent.locks . This package contains class ReentrantLock, which is analog of recursive Mutex, • and interface Condition, providing analogy of Condition Variable. The method ReentrantLock::newCondition() • is provided to create multiple Conditioninstances associated with the same instance of ReentrantLock. Introduction to Network Programming in UNIX & LINUX

Thread Synchronization by means of POSIX Semaphore. POSIX defines 2 forms of Semaphores: Named and Unnamed. The Unnamed semaphore is memory-based. Unlike other types of semaphores used for inter-process synchronization, the Unnamed semaphore could be declared as non-shared. In this case it is visible only from single process and used for inter-thread synchronization only. #include <semaphore.h> int sem_init(sem_t *sem, 0 /*non-shared*/, - Initializes unnamed non-shared Semaphore in address sem unsigned int value); with initial value equal value. int sem_destroy(sem_t *sem); - Destroys the Semaphore sem int sem_wait(sem_t *sem); - Waits until Value of Semaphore sem becomes positive, then decrements (locks) the semaphore. int sem_timedwait(sem_t* sem, - The same as sem_wait(), excepting the time limit specified for const struct timespec * abs_timeout ); the decrement operation if could not performed immediately int sem_post(sem_t *sem); - Increments (unlocks) the Semaphore In most cases the Semaphore is used in following roles: Introduction to Network Programming in UNIX & LINUX

Semaphore Example. Resource Counter. Consider we want to open simultaneously no more than 5 connections to some database. The semaphore will be used as counter of available (not acquired) connections. … #include <semaphore.h> #define MAX_CONNECTIONS 5 sem_t sem_counter; int main() { /* init semaphore */ sem_init( &sem_counter, 0, MAX_CONNECTIONS); …. } The following procedure will be used by thread to acquire DB connection: open_db_connection(…) { /* wait at least one connection to be available */ sem_wait( &sem_counter); /* now actual DB connection could be opened*/ …. } The following procedure will be used by thread to release DB connection: close_db_connection(…) { /* close actual DB connection */ … /* increment available connections counter, if “waiting” threads exist, one of them will be awakened */ sem_post( &sem_counter); } Introduction to Network Programming in UNIX & LINUX

Semaphore Example. Blocking and Signaling. Consider two shared variables x and y, protected by blocking semaphore sem_blocker, and a signaling semaphore sem_signaler is signaled whenever x becomes greater than y. The initialization of these semaphores would be performed as follows: sem_t sem_blocker, sem_signaler; sem_init( &sem_blocker, 0, 1); /* initially posted (unlocked) */ sem_init( &sem_signaler, 0, 0); /* initially not posted (not signaled) */ Waiting until x is greater than y is performed as follows: sem_wait( &sem_blocker); /*lock */ while (x <= y) { sem_post( &sem_blocker); /* temporary unlock to allow x,y to be modified */ sem_wait( &sem_signaler); /* “signaler” is stateful. no race condition here */ sem_wait( &sem_blocker); /* restore lock to continue the work */ } /* operate on x and y */ sem_post( &sem_blocker); /* unlock */ Modifications on x and y that may cause x to become greater than y should post (signal) the “signaler”: sem_wait( &sem_blocker) ); /*lock */ /* modify x and y */ if (x > y) { sem_post( &sem_signaler); } sem_post( &sem_blocker); /* unlock */ Note: To avoid sem_blocker to remain “locked” forever (in case of unexpected thread termination), the cleanup handler could be used. Introduction to Network Programming in UNIX & LINUX

typedef struct _app_sem_t{ unsigned int value; pthread_mutex_t mutex; pthread_cond_t cond; } app_sem_t; /* constructed semaphore type */ /* analog of sem_init( ) */ void app_sem_init( app_sem_t* sem, unsigned int value) { sem->value = value; pthread_mutex_init (&sem->mutex, NULL); pthread_cond_init (&sem->cond, NULL); } /* analog of sem_destroy( ) */ void app_sem_destroy(app_sem_t* sem) { pthread_mutex_destroy (&sem->mutex); pthread_cond_destroy (&sem->cond); } /* analog of sem_post( ) */ void app_sem_post(app_sem_t* sem) { pthread_mutex_lock(&sem->mutex); sem->value++; pthread_cond_signal(&sem->cond); pthread_mutex_unlock(&sem->mutex); } /* analog of sem_wait ()*/ void app_sem_wait(app_sem_t* sem) { pthread_mutex_lock(&sem->mutex); while(sem->value == 0) { pthread_cond_wait(&sem->cond, &sem->mutex); } sem->value--; pthread_mutex_unlock(&sem->mutex); } Mutex or Semaphore ? • The Persistent Semaphore is still the main personage for Inter-Process synchronization. • For Inter-Thread synchronization the memory-based non-shared Semaphore is optimal in role of Resource Counter. • In role of Blocking and Signaling device the Semaphore is less universal than Mutex - Condition Variable couple: • as Blocking devise it is non-recursive and could be mistakenly “unlocked” without “lock” • as Signaling device it does not have “broadcast” (notify all) possibility • The Semaphore is more complicated. The Mutex and Condition Variable are more primitive and, as result of this, • could be used as universal “bricks” for building of more complicated synchronization devices with various • synchronization scenarios. Example. Building of “Semaphore” from Mutex and Condition Variable. (Error codes are not checked for example simplicity) Introduction to Network Programming in UNIX & LINUX

Mutex versus Semaphore Mutex Semaphore (Sys V, POSIX) In most cases, is memory-based object, allocated in the scope of single process existing only until process finishes Could be memory-based or persistent object. Could be visible from different processes. Could exist independently of process life Used mostly for inter-thread synchronization Used for inter-process or inter-thread synchronization Has only 2 states: locked / unlocked Is a counter (or array of counters in Sys V) Supported operations: • lock (trylock) • unlock Supported operations: • increment (Sys V) / post (POSIX) • wait & decrement (Sys V) / wait (POSIX) • wait zero (Sys V) System 5 Semaphore also supports transactions. Does not have any “owner”. Could be modified by multiple processes / threads simultaneously Has a single Owner Thread. When locked, could be unlocked by Owner only. Could be used as: - resource counter - non-recursive blocking device - statefulsignaling device Used as blocking device (recursive or non-recursive ). In conditional scenarios often coupled with Condition Variable, which is statelesssignaling device. Physically locks the section of code, preventing the access of non-owner threads to the code section. (Example: gate or barrier on the road) Needs logical "binding" to shared resource. Used for synchronization by means of application protocol agreement. (Example: traffic light) Condition Variable versus Signaling Semaphore Stateful device. “Signaling” and “waiting” could be performed asynchronously. Stateless device. When signaled, only currently “waiting” threads could listen the signal. Supports “broadcast” signaling, awakening all “waiting” threads. Each “post” (increment) operation awakens no more than one thread. Introduction to Network Programming in UNIX & LINUX

void init_something( ) {…} int flag=0; void f() { … if (! flag) { init_something(); flag=1; } … } struct Dummy { Dummy() {init_something();} }; void f() { … static Dummy dummy; … } pthread_mutex_t mutex = PTHREAD_MUTEX_INITIALIZER ; int flag=0; void f() { … pthread_mutex_lock(&mutex); if (! flag) { init_something(); flag=1; } pthread_mutex_unlock(&mutex); … } Multi-threading & Once-Only Initialization Let us have some initialization function, which must be called at most once in our program: In single-treaded environment to ensure once-only call to this function, we can do the following: “C” example: “C++” example: Most compilers emit object code, which tests a secret compiler-generated flag to see whether static variable already initialized. So, actually, both the examples perform the same algorithm. In multi-threaded environment both the examples above are Thread-Unsafe. If two or more threads simultaneously enter the critical section in function f(), the function init_something() will be executed more than once. To ensure thread safety, the critical section would be protected from mutual execution: Note 1: If init_something() is cancellation point, the function f() also has to install cleanup handlers to avoid mutex locking forever. Note 2: In this thread-safe example each thread, calling the function f(), provides 2 system calls ( 2 context switches) even when call to function init_something() is not performed. Introduction to Network Programming in UNIX & LINUX

pthread_once_t once_flag = PTHREAD_ONCE_INIT; void f() { … pthread_once(&once_flag, init_something); … } Once-Only Initialization in POSIX pthread_once_t once_control = PTHREAD_ONCE_INIT; int pthread_once (pthread_once_t *once_control, void (*init_routine) (void)); • The system call pthread_once() ensures that a piece of initialization code is executed at most once. • The once_control argument points to a global variable statically initialized to PTHREAD_ONCE_INIT. • The first time pthread_once() is called with a given once_control argument, it calls init_routine and changes • the value of the once_control variable to value “initialization performed”. • Subsequent call to pthread_once()with the same once_control argument : • - blocks calling thread, if any other thread currently executes the same call with the same once_control argument • - does not run init_routine(), if once_control variable already has value “initialization performed”. • The pthread_once() is not a cancellation point. However, if the function init_routine is a cancellation point • and is canceled, the effect on once_control is the same as if pthread_once() had never been called. • If once_control is automatic variable on stack or is not initialized, the behavior is undefined. • Returns 0 on success, error code (EINVAL – bad argument) on failure. Using system call pthread_once() , the previous example with at most one call to initialization function init_something() could be implemented in following way: Introduction to Network Programming in UNIX & LINUX

NULL Data A Thread 1 TSD Area Thread 2 TSD Area Data B Global Data Data C TSD Keys key1 key2 PTHREAD_KEYS_MAX not allocated .... not allocated Thread Local Storage (TLS). POSIX Thread Specific Data (TSD). Static and global variables normally are shared by threads, because these variables are located in one memory space shared by all threads of the same process. Variables on the stack however are local to threads, because each thread has its own stack, residing in a different memory location. So, with regular variables it is impossible to have global or static variables that have different values in different threads. Thread-local storage (TLS) is a programming method which enables to use static or global memory local to a thread. Thread-specific data (TSD) is the POSIX implementation of TLS method. • Each thread possesses a private memory block, • the Thread-Specific Data Area (TSD Area). • The TSD Area is indexed by TSD Keys. • Each TSD Key in the TSD Area has associated • value of type void *, which can be NULL or can be • a pointer to any thread-specific data. • TSD keys are common to all threads, but the value • associated with a given TSD key can be different • in each thread. • When a new thread created, its TSD area initially • associates NULL with all keys, that already • allocated in the scope of current process. • When new TSD Key is allocated by request from • some running thread, this Key becomes known • and is associated with NULL value in all currently • executing threads. Introduction to Network Programming in UNIX & LINUX

Note: In Java (SE Since 1.2) Thread Local Storage pattern is represented by class java.lang.ThreadLocal POSIX TSD System Calls int pthread_key_create (pthread_key_t *key, void (*destr_function) (void * )); • This system call allocates a new TSD Key. The allocated TSD Key is stored in the location pointed to by key. • The value initially associated with the returned key is NULL in all currently executing threads. • Note: There is a limit of PTHREAD_KEYS_MAX on the number of keys allocated at a given time. • The destr_function argument, if not NULL, specifies a destructor function associated with the key. • When a thread terminates via pthread_exit() or by cancellation, destr_function is called and accepts as argument • the value associated with the Key in the specific thread. The destr_function is not called if that value is NULL. • Note: The order in which destructor functions are called at thread termination time is unspecified. • Before the destructor function is called, the NULL value is associated with the key in the current thread. • Returns 0 on success, error (EAGAIN – key limit reached, ENOMEM- no memory) on failure. int pthread_key_delete (pthread_key_t key); • This system call de-allocates a TSD key. • Note: This call does not check whether non- NULL values are associated with that Key in the currently executing • threads, nor call the destructor function associated with the key. int pthread_setspecific (pthread_key_t key, const void *pointer); • This system call changes the value associated with key in the calling thread, storing the given pointer instead. • Note: The pointer argument would not refer to the stack (automatic) variable void *pthread_getspecific (pthread_key_t key); • This system call returns the value currently associated with key in the calling thread. Introduction to Network Programming in UNIX & LINUX

TSD Usage Example: User-Defined Thread Name Utilities This example allocates a thread-specific array of 100 characters, with automatic reclamation at thread exit. The array used to store the user-defined thread name, which could be extracted by any function called during following thread execution. /*---- Supporting “private” variables and functions ---- */ static pthread_key_t name_buffer_key; /* TSD Key for name buffer storage */ static pthread_once_t once_flag = PTHREAD_ONCE_INIT; /* Once-only initializer for TSD Key allocation */ /* Frees the thread-specific name buffer */ static void destroy_name_buffer(void * name_buffer_ptr) { free(name_buffer_ptr); } /* Allocates the TSD Key for name buffer storage */ static void allocate_name_buffer_key() { pthread_key_create(&name_buffer_key, destroy_name_buffer); } /*---- User-define thread name “public” utilities ---- */ #define MAX_NAME_LENGTH 100 /* Allocates the thread-specific buffer (once only) and stores there thread-specific name */ void set_thread_name (char* name_string) { char* name_buffer_ptr; pthread_once(&once_flag, allocate_name_buffer_key); /* Allocate TSD Key - once only per process*/ name_buffer_ptr = (char *) pthread_getspecific(name_buffer_key); /* Get name buffer pointer */ if (NULL == name_buffer_ptr) { /* Allocate name buffer – once only per thread */ name_buffer_ptr = (char *)malloc(MAX_NAME_LENGTH); pthread_setspecific(name_buffer_key, name_buffer_ptr ); memset(name_buffer_ptr , ‘\0’, MAX_NAME_LENGTH); } strncpy(name_buffer_ptr, name_string, MAX_NAME_LENGTH – 1); /* Store the name into name buffer */ } /* Gets the thread-specific name */ char * get_thread_name (void) { return (char *) pthread_getspecific(name_buffer_key); /* Get name buffer pointer */ } Introduction to Network Programming in UNIX & LINUX

class Singleton { public: static Singleton* getInstance(){ return &m_instance; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton m_instance; …// private data members }; // initialization of static data memberin .cpp file Singleton Singleton::m_instance(…); class Singleton { public: static Singleton* getInstance() { if (m_pInstance == NULL) { // 1) m_pInstance = new Instance(…); // 2) } return m_pInstance; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton* m_pInstance; …// private data members }; // initialization of static data member in .cpp file Singleton Singleton::m_pInstance=NULL; Singleton Pattern & Multi-threading In software engineering, the Singleton design pattern is used to restrict instantiation of a class to one only object. Commonly the Singleton is implemented as Class with prohibited copying (in C++ it means private Copy Constructor and private operator=() ) and with privateConstructor, which is used to create at most one instance of the Class. Public interface of Singleton commonly has static method getInstance(), providing access to the single instance. The Singleton could be implemented as Statically or Dynamically (Lazily) constructed instance. • Singleton with Static Instantiation: • The instance is created during application start-up. • Advantage: • When application started, the Singleton is “ready to use” • Disadvantages: • We spend resources for instance initialization even if method • getInstance() is never called. • C++ standard does not specify the order of static data • members initialization during application start-up. • If constructor of Singleton needs access to any static data • in other classes, its static constructionmay crash in C++. • Singleton with Dynamic (Lazy) Instantiation: • The instance is created during first call to method getInstance(). • Advantages: • Initialization is performed only when actually required • Not depends on static data initialization order in C++ • Disadvantage: • In multi-threaded environment, this implementation • is Thread-Unsafe. The method getInstance() has critical • region between m_pInstance pointer value checking (1) • and assignment (2). • If two threads simultaneouslycall method getInstance() • at first time, the Singleton may be instantiated twice. Introduction to Network Programming in UNIX & LINUX

class Singleton { public: static Singleton* getInstance() { MutexGuard guard(m_mutex); // thread-safe if (m_pInstance == NULL) { m_pInstance = new Instance(…); } return m_pInstance; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton* m_pInstance; static Mutex m_mutex; …// private data members }; // initialization of static data member in .cpp file Singleton Singleton::m_pInstance=NULL; Mutex Singleton::m_mutex; static Singleton* getInstance() { if (m_pInstance == NULL) { // 1st check MutexGuard guard(m_mutex); if (m_pInstance == NULL) { // 2nd check m_pInstance = new Instance(…); } } return m_pInstance; } Thread-Safe Singleton & Double-Checked Locking Pattern • Thread-Safe Singleton with Dynamic (Lazy) Instantiation: • The thread-safe implementation of Lazy Singleton could be • achieved by synchronization of getInstance() method using • Mutex or Once-Only Initializer. • Advantage: • This is reliable thread-safe implementation of Singleton • Disadvantage: • Each access to the Singleton acquires a lock. • Actually lock is necessary only during first time • initialization. As result, n calls to Singleton perform • n-1 superfluous lock operations. • Double-Check Locking Pattern (DCLP): • Access to already initialized Singleton is lock-free; • The lock is acquired only if m_pInstance is NULL; • The 2nd check after lock acquisition ensures, that another • thread did not perform initialization while calling thread • acquired the lock. • Advantage: • DCLP avoids superfluous lock operations. • Disadvantage: • DCLP … DOES NOT WORK with modern Optimizing Compilers and Optimizing Processors. • (See short explanation on following slides… • See also: “C++ Perils of DCLP” http://www.aristeia.com/Papers/DDJ_Jul_Aug_2004_revised.pdf ) Introduction to Network Programming in UNIX & LINUX

Singleton* tmp = operator new(sizeof(Singleton)); // Step 1: memory allocation new (tmp) Singleton; // Step 2: memory initialization m_pInstance = tmp; // Step 3: assignment • The volatile keyword in C and C++. • The standard of C and C++ declares keyword volatile. It tells the compiler that the object can change at any time • (by hardware, asynchronous kernel activity, etc.), and that compiler would restrict its optimizations, working with such an object. Every reference to volatile object to be a genuine (actual) reference, as follows: • The system would always re-read the current value of a volatile object at the point it is requested, even if a previous instruction already extracted a value from the same object. • The system would re-write the value of the object immediately on assignment. • The Problem: Even after full “volatilization” of DCLP pattern (defining of m_pInstance, tmp and Singleton • instance itself as volatile) the pattern does not become reliable. • The Explanation: The following two issues could not be solved by usage of volatile keyword: • “Volatilized” call to constructor new volatile Singleton(…) first initializes the instance, and only then actually • declares it as volatile. As result, reordering still could occur during instance initialization. • The Standard prevents compilers from reordering of “read” and “write” operations to volatile data within a single • thread, but it imposes no restrictions on such reordering across multiple threads. DCLP and Instruction Ordering by Optimizing Compilers In response to the code: m_pInstance = new Singleton(…); the compiler actually generates the following sequence of operations: The Problem: Although the class instance life cycle in C++ begins only after successful finish of its constructor, some types of Optimizing Compilers could reorder Steps 2 and 3. As result, Singleton instance becomes visible by another threads, while it is NOT fully initialized yet. The Explanation: As languages, neither C nor C++ have language constraints to express the ordering. To define safe synchronization primitives, the system-specific libraries (like POSIX PThread), commonly use assembler instructions in their implementation. As result, sequence of operations, accessing the data not through system- specific library constraints, could be reordered by compiler. Introduction to Network Programming in UNIX & LINUX

Note: The implementation of reliable solutions with Memory Barriers impossible in C and C++ Such implementation requires the platform-specific code written in assembler. DCLP & Optimizing Processors Cache Coherency Problem On a machine with multiple processors, each processor has its own memory cache. To modify shared resource, each processor caches data from “main” memory, updates the data in the cache, and then flushes the updated data to the “main” memory. The data updated by one processor, could have its inconsistent copies in the cache of other processors.Cache Coherency Problem is inter-cache inconsistency in the value of a shared resource. Memory Barriers The Memory Barriers are instructions for compiler or linker, limiting the reordering of reads and writes of shared memory in multiprocessor systems. Memory Barriers are used to solve Cache Coherency Problem. For all that, Why Mutex still works ? As it was already stated, neither C nor C++ have language constraints to restrict “read” and “write” operations reordering across multiplethreads. Implementation of safe synchronization primitives by the system-specific libraries (like POSIX PThread) is reached by calling system-specific assembler code, implementing Memory Barriering instructions. Actually, Mutex lock acquisition leads not only to exclusive locking of code section by mutex owner, but also to synchronization of processor memory cache with “main” memory. The Conclusion The only way to provide safe multi-threaded access to shared data resource in C and C++ is to use synchronization primitives (mutex, once-only initializer, etc.), provided by system-specific libraries (like POSIX PThread). The access to shared resource without synchronization leads to data inconsistency because of implicit operation reordering and data caching provided by Optimized Compilers and Optimized Processors. Introduction to Network Programming in UNIX & LINUX

Thread-Safe Lazy Singleton optimization by means of Thread Local Storage: class Singleton { public: static Singleton* getInstance() { Singleton* tmp = (Singleton*) m_tlsKey.getValue(); // get from TLS if (tmp == NULL) { MutexGuard guard(m_mutex); // thread-safe if (m_pInstance == NULL) { m_pInstance = new Instance(…); } m_tlsKey.setValue(m_pInstance); // store in TLS } return tmp; } private: … // prohibited copy constructor & operator=()… Singleton(…){…} // private constructor static Singleton* m_pInstance; static Mutex m_mutex; static ThreadLocalKey m_tlsKey; …// private data members }; // initialization of static data member in .cpp file Singleton Singleton::m_pInstance=NULL; Mutex Singleton::m_mutex; ThreadLocalKey Singleton::m_tlsKey; • The synchronized (via Mutex or Once-Only Initializer) • method is called by specific Thread only once, • during first access from this Thread to the Singleton. • The accepted reference to Singleton is saved • in Thread Local Storage (TLS). • All subsequent calls to Singleton are provided via • reference saved in Thread Local Storage without • superfluous locking. For all that, Is It Possible to Avoid Superfluous Locking? • The possibilities to avoid superfluous locking still exist: • If Singleton constructor does not access any static data in other classes, we can use Static Instantiation. • We can remove any locking from method getInstance(), but call it once from Main Thread, before any other Thread is created. • If we still want to use Lazy Instantiation, optimization is also possible. • In DCLP we unsuccessfully tried to limit number of lock acquisitions to single lock. The following solution • demonstrates possibility to limit number of locks to single lock per each accessing thread: Introduction to Network Programming in UNIX & LINUX

class Singleton { private static volatile Singleton instance = null; private Singleton(){ } // Private constructor public static Singleton getInstance() { if (instance == null) { synchronized (Singleton.class) { if (instance == null) instance = new Singleton(); } } return instance; } } public class Singleton { private Singleton() { } // Private constructor // Inner Singleton Holder class private static class SingletonHolder { private static Singleton instance = new Singleton(); } public static Singleton getInstance() { return SingletonHolder.instance; } } Note: Java (SE Since 1.5) provides package java.util.concurrent.atomic containing “volatilized” data types and lock-free atomic operations. Singleton in Java • Java took volatile a step further than C++ by guaranteeing reordering restrictions across multiple threads. • Since Java 1.5 the Memory Model was standardized. The volatile has the more restrictive, but simpler semantics: • Any read of a volatile is guaranteed to occur prior to any memory reference in the subsequent statements, • Any write to a volatile is guaranteed to occur after all memory references in the preceding statements The DCLP is reliable only since Java 1.5 (See “The DCLP is Broken Declaration”: http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html ) A correct thread-safe Java lazy-loaded solution working in any Java version is suggested by Bill Pugh. It is known as the "Initialization On Demand Holder" pattern. “Initialization On Demand Holder” Pattern Introduction to Network Programming in UNIX & LINUX

In general, multi-threading bugs are statistical instead of deterministic. Multi-threaded programs with such bugs, often behave differently in two successive runs, even with identical inputs. This behavior is caused by differences in the order that threads are scheduled. In such cases tracing is more effective method of bug finding, than is breakpoint-based debugging. Common Oversights in Multi-Threaded Programs • These are more frequent oversights that causing bugs in multi-threaded programs: • An argument, passed to a New thread, points to the stack (to automatic variable) of the Caller Thread. • As result, Newly-created Thread could dereference the argument after its de-allocation on stack of Caller Thread. • A shared global memory (global variable) with changeable state is accessed without exclusive lock protection by • two or more threads, and at least one of the threads tries to write to the location. As result, the order of accesses • is non-deterministic and leads to Data Race bugs. • Two threads trying to acquire rights to the same pair of global resources in alternate order. • As result, Deadlock is caused. • Trying to reacquire a non-recursive lock already held. As result, Recursive Deadlock is caused. • Protected code segment contains call to a function that frees and reacquires the synchronization before returning • to the caller. As result, data actually has not been protected, and caller is not aware of a hidden gap in • synchronization protection. • Mixing UNIX signals with threads, and not using the sigwait() model for handling asynchronous signals. • Long-jumping away (calling setjmp() and longjmp() in C, or throwing exception in C++) without releasing the • mutex locks. • Failing to re-evaluate the conditions after returning from a call to pthread_cond_wait() or thread_cond_timedwait(). • Forgetting that default threads are created PTHREAD_CREATE_JOINABLE and must be reclaimed with • pthread_join(). Note that pthread_exit() does not free up its storage space. • Making deeply nested, recursive calls and using large automatic arrays can cause problems because • multi-threaded programs have a more limited stack size than single-threaded programs. • Specifying an inadequate stack size, or using non-default stacks. Introduction to Network Programming in UNIX & LINUX

Mediation Data Buffer push data pop data Request Listener Request Processor Note: The Java Platform (SE Since 1.5) provides it’s own implementation of different patterns for asynchronous processing in package java.util.concurrent Design of Asynchronous Functionalities. Asynchronous Processing. Producer-Consumer Pattern Any server usually has two different functional parts: Request Listening and Request Processing. In most the cases it is very useful to separate these functionalities and to provide mediating mechanism for data exchange between them. Such design not only allows easy implementation of concurrent request processing, but also has advantages even for iterative server, allowing: - To avoid loss of Requests (because of socket buffer overflow) in case of long processing time of single request. - To provide equable CPU utilization in case of non-equable traffic of Requests. Such solution is also useful for the servers, providing simultaneously different type of Request Listeners (for example, sockets and shared memory) to provide uniform processing of Requests accepted from different sources. In multi-processing and multi-threading programming this solution is known as Producer-Consumer Pattern: - The Producer repeatedly generates a piece of data and puts it into the Mediation Buffer - The Consumer repeatedly extracts the piece of data from Mediation Buffer and provides its processing. - The main idea of the pattern is to make sure that the Producer won't try to add data into the Buffer if it's full, and that the Consumer won't try to remove data from the Buffer if it’s empty. Repeatable Thread Utilization. Thread Pool Pattern In multi-threaded systems the creation of each new thread requires additional system resources. So, it is useful to provide repeatable utilization of already created threads. From another side, holding the big amount of “unemployed” threads also unjustified. For these purposes usually well-known Thread Pool Patternis used. Thread Pool minimizes the unjustified usage of system resources and commonly implements the following functionalities: - Dynamic creation of new threads - Repeatable utilization of already created threads - Dynamic Termination of hibernated “unemployed” threads Introduction to Network Programming in UNIX & LINUX

Note: In Java (SE Since 1.5) Blocking Queue pattern is represented by interface java.util.concurrent.BlockingQueue Blocking Queue: Functionality and Design • The Blocking Queue is one of possible implementations of Mediation Buffer for Producer-Consumer Pattern. • It is thread-safe FIFO (first-input-first-output) data container, providing the following functionalities: • - Synchronized methods for storage and extraction of data items, • - Blocking of Consumers, trying to extract the data, if Queue is empty, • - Blocking of Producers, trying to store new data, if Queue is full. • The implementation of such container would provide the following methods: • Constructor • It builds the Queue instance. Could provide optional High Water Mark parameter, specifying the maximal • allowed Queue capacity to avoid memory overload of using application. • push(), tryPush() • These methods push the new object to the back of Queue and represent blocking and non-blocking version • of functionality to be used by Producer(s). • The push() method could block if High Water Mark parameter is specified and maximal allowed Queue capacity • is reached. In this case Producer would wait until the Consumer will decrease the used capacity of the Queue. • pop(), tryPop() • These methods pop the next object (if exists) from the top of Queue and represent blocking and non-blocking • version of functionality to be used by Consumer(s). • The pop() methods could block if Queue is empty. In this case Consumer would wait until the Producer will • push the new object to the Queue. • dispose() • While this Queue has blocking methods, we need to unblock waiting threads before Queue destruction. • This method could provide parameter, specifying disposal mode. Deferred disposal means that the objects • already contained by the Queue, remain until they are popped by Consumer(s). Immediate disposal means • discarding of such objects. In both modes the pushing of new objects to the Queue would be prohibited. • Destructor • It would require Queue disposal, if not provided yet. • Attention: To safely avoid application crash, even after Queue disposal and emptying, all Producer(s) and • Consumer(s) threads would be disallowed to access the Queue instance (or joined) before Queue deletion. Introduction to Network Programming in UNIX & LINUX

Listener Request Queue Iterative Server Socket Socket Socket Legend: listener thread processor thread dispatcher thread request request processing data flow … Dispatcher Processors Concurrent Server Concurrent Server Synchronized Request Queue Examples Listener Listener Request Queue Request Queue … Processor Processors • Scenario 1. Iterative Server • One Listener and One Processor • Listener accepts Request from Socket and puts it into Request Queue • Processor extracts one-by-one Requests from Queue and processes them. • Scenario 2. Concurrent Server • One Listener and static number of Processors • Listener accepts Request from Socket and puts it into Request Queue • The static number of Processors compete to extract Requests • from Queue and to process them one-by-one • Scenario 3. Concurrent Server • One Listener, one Dispatcher and dynamic number of Processors • Listener accepts Request from Socket and puts it into Request Queue • Dispatcher extracts one-by-one Requests from Queue and starts separate • Processor for each Request. • Processor provides processing of single Request and terminates Introduction to Network Programming in UNIX & LINUX

Repeatable Utilization of Resources. Object Pool • The Object Poolis standard pattern for storage of reusable objects. • When instance of object is required by any functionality, it is retrieved from Object Pool. • After the end of usage, the instance is not destroyed, but is retuned to Object Pool for future reuse. • The main service provided by Object Pool is object storage. • The additional optional service is dynamic object allocation and de-allocation. • Traditional Object Pool interface: • Get Object – gets the object from storage. If object does not exist, it could be allocated by Pool. • Release Object – returns the object to Pool, were it stored in “passive” state until future reuse. • Traditional Object Pool functionalities: • Dynamic Object Allocation (optional). • For this purpose the Pool would contain Object Factory. • The maximal number of object instances could be restricted. • Object Storage (mandatory) • Dynamic Object De-allocation (optional). • In this case each object has expiration time. The Pool would have “Garbage Collector” Thread, which • periodically checks the state of objects in Pool. If specific Object is in “passive” (not used) state for the • period more than specified expiration time, this Object is de-allocated by Garbage Collector. Introduction to Network Programming in UNIX & LINUX

… AddTask ( ) R • Task Queue • Max thread count • Total thread count • Passive thread count • Expiration time • Mutex and CondVar • Dispose flag R Dispose ( ) R Join ( ) R Thread::Run { while(Pool.GetTask()) { Task.Run(); } Pool.Deregister(this); } Thread Pool: Functionality and Design • The Thread Poolis pool of threads. It has the following specifics: • The service provided by Thread Pool is not only the allocation and storage of the Threads, but • providing asynchronous simultaneous processing of scheduled tasks by means of dynamically • changed number of reusable threads. • Instead of methods “Get Object“ / “Release Object” the Thread Pool traditionally provides the method • “Add Task” for scheduling of runnable tasks to be asynchronously executed by Pool’s threads. • Thread Pool does not require existence of separate “Garbage Collector” Thread. Each of Pool’s thread • is able to destroy itself, if it was not used during the specified expiration timeout period. • Public Interface of Thread Pool: • Constructor - specifies Maximal Thread Count and • Thread Expiration Time (maximal hibernation timeout). • addTask() - adds Runnable Task to the Pool, by need creates new • Thread or activates “passive” Thread if exists. • dispose() - unblocks all waiting Pool Threads. • join() - waits for all Pool Threads termination. • Private Interface of Thread Pool used by its Threads: • getTask() - signs calling Thread as “passive” and blocks until next • Runnable Task is added, then signs Thread as “active” • and returns the Task for processing by calling Thread. • deregisterThread() - decreases total threads count. • Private Interface of Pool’s Threads: • Constructor - visible from Thread Pool only, creates Daemon Thread, • specifies reference to the Pool to be notified • when Thread processing is finished. • run() - performs loop of Task getting and processing, • until empty Task is accepted ( when hibernation • timeout expired or Pool is disposed). Note: In Java (SE Since 1.5) Thread Pool pattern is represented by class java.util.concurrent.ThreadPoolExecutor Introduction to Network Programming in UNIX & LINUX

1 2 3 Legend: listener thread dispatcher thread processor thread data flow request processing request Runnables: request processing dispatching combined Socket Socket Socket R R … … … R R R R R R R R R R R R R R Thread Pool Usage Examples • Scenario 1. • From each accepted Request the Listener builds one Runnable and • schedules it to Thread Pool. • Runnable contains Request Data and algorithm of Request Processing. • Scenario 2 • Listener puts accepted Requests to Request Queue and schedules • one Runnable per accepted Request to Thread Pool • Runnable contains algorithm of Request Data extraction from Request • Queue and following Request Processing. • Scenario 3. • Listener only puts accepted Requests to Request Queue • Dispatcher Thread extracts Requests from Request Queue, builds • Runnable per Request and schedules it to Thread Pool • Runnable contains Request Data and algorithm of Request Processing. Introduction to Network Programming in UNIX & LINUX

4 5 Socket Socket R R R R R R R R Thread Pool Usage Examples (continuation) … … Initial Task Initial Task • Scenario 4. • Listener only puts accepted Requests to Request Queue. • Thread Pool is initialized with one Dispatcher Runnable, providing functionality of extraction Request Data • from Request Queue, building one Request Processing Runnable per extracted Request and scheduling it • to Thread Pool. • As result, one thread of Thread Pool always performs functionality of Dispatcher Thread from Scenario 3. • Request Processing Runnable, scheduled by Dispatcher, contains Request Data and algorithm of Request • Processing. • Scenario 5 • Listener only puts accepted Requests to Request Queue. • Thread Pool is initialized with one Combined Runnable, implementing part of Dispatcher functionality and • Request Processing algorithm. It performs the following functionality: • Extract single Request Data from Request Queue • Schedule the new Combined Runnable to Thread Pool • Perform Data Processing algorithm on extracted Request Data. • As result: • All Threads in Pool execute the same algorithm of Combined Runnable • Dispatcher functionality distributed by parts between all running Threads • No more than one Runnable waits for processing in internal Thread Pool queue of Runnables. Introduction to Network Programming in UNIX & LINUX

RWLock Attributes related system calls: • pthread_rwlockattr_init(), pthread_rwlockattr_destroy() • Constructor and Destructor of RWLock attributes container of type pthread_rwlockattr_t • pthread_rwlockattr_getpshared(), pthread_rwlockattr_setpshared() • Setter and Getter for RWLock Sharing Mode attribute, which accepts the following values: • - “Process-Private” (default) mode – the lock used for inter-thread synchronization of single process only • - “Process-Shared” mode – the lock could be used for inter-process synchronization • RWLock related system calls: • pthread_rwlock_init(), pthread_rwlock_destroy() • Constructor and Destructor of RWLock object of type pthread_rwlock_t • pthread_rwlock_rdlock(), pthread_rwlock_tryrdlock() • Acquires non-exclusive Read Lock on RWLock object in behalf of calling thread in blocking or non-blocking mode • pthread_rwlock_wrlock(), pthread_rwlock_trywrlock() • Acquires exclusive Write Lock on RWLock object in behalf of calling thread in blocking or non-blocking mode • pthread_rwlock_unlock() • Releases last lock (Read or Write) acquired on RWLock object by calling thread Note:In Java (SE Since 1.5) the RWLock pattern is represented by class java.util.concurrent.locks.ReentrantReadWriteLock RWLock – Multi Read / Single Write Lock Pattern Some systems, providing the multiple simultaneous access to shared data, need more complicated kind of data locking, than binary locking logics (Locked/Unlocked) provided by Mutex. Such systems (for example, Airline- Ticketing stations) should be able to read the data concurrently (find available “seat”), but only one of them should be able to change the status of data (reserve a “seat”) at a given time. The Read-Write Lock (RWLock)provides non-exclusive read-only access and exclusive write access to the shared data. Some Unix (Free BSD, Solaris) and Linux (Debian, Ubuntu) operating systems support the RWLock pattern and implement the set of RWLock-related system calls, defined by POSIX: Introduction to Network Programming in UNIX & LINUX

class RWLock { private: // state: 0-unlocked, -1-write-locked, // >0-number of read locks. int m_lockState; // number of pending writers int m_waitingWriters; // synchronization devices Mutex m_mutex; CondVar m_cond; public: // constructor RWLock() :m_lockState(0), m_waitingWriters(0){} ... // acquires recursive non-exclusive Read Lock void readLock() { MutexGuard guard(m_mutex); while((m_lockState == -1) || (m_waitingWriters != 0)) { m_cond.wait(m_mutex); } m_lockState++; } // acquires non-recursive exclusive Write Lock void writeLock() { MutexGuard guard(m_mutex); while(m_lockState != 0) { m_waitingWriters++; m_cond.wait(m_mutex); m_waitingWriters--; } m_lockState = -1; } // releases last acquired lock void unlock() { MutexGuard guard(m_mutex); if(0 == m_lockState) return; // already unlocked if(m_lockState == -1) m_lockState = 0; // write unlock else m_lockState--; // read unlock m_cond.notifyAll(); } }; // end class RWLock Trivial RWLock Implementation Example This example demonstrates the simple C++ implementation of RWLock functionality, similar to most POSIX implementations in Unix and Linux: Introduction to Network Programming in UNIX & LINUX

Ra “Write-First” preference order Ra Wa Legend: - acquired lock; - pending lock; - released lock; R - read lock; W - write lock; a,b - object name, - wait RWLock Acquisition Policies: Preference Order • Common Policies • All RWLock implementations provide the following two obvious lock acquisition rules: • Read Lock could be acquired immediately, if no Write Locks held on the RWLock object by other threads. • If any other thread already holds the Write Lock, the current Read Lock operation will be blocked. • Write Lock could be acquired immediately, if no Read Locks held on the RWLock object by other threads. • The Write Lock operation will be blocked until all other threads will release their Read Locks on this object. • Acquisition Preference Order • Two different threads try to acquire Read or Write Lock on the same RWLock object. Who would be the first? • Different platforms provide different preference ordering to lock access: • Random-order access (implemented by Java) – order is not specified • Arrival-order access (implemented by Java) – the longest-waiting single writer or group of readers • will acquire lock. • “Writer-first” order (implemented by POSIX) – to avoid writers starvation, Write Lock has higher priority than • Read Lock. This means: • - If Write Lock could not be acquired immediately (Read Lock is held • by another thread(s)), the Write Lock is signed as Pending Write Lock • and then is blocked until all other threads release previously acquired • Read Locks. • - All subsequent Read Lock operations will be blocked, if PendingWrite • Lock exists. • - The acquisition order of two different Write Locks, performed by • different threads, depends on Scheduler Policy (for non-realtime • threads in Linux it will be random-order access) Introduction to Network Programming in UNIX & LINUX

Ra Recursive Read Lock Ra Wa Wa Wa Wa Recursive Write Lock is acquired in Java Recursive Write Lock leads to Deadlock in POSIX Ra Recursive Read Lock is not Reentrant in POSIX. Existence of Pending Writer can lead to Deadlock. Wa Ra RWLock Acquisition Policies: Recursivity and Reentrancy Lock Recursivity Recursivity means possibility to acquire the multiple concurrent Locks of the same type (Read or Write) by the same thread with following matching number of unlock operations. Recursive Read Lock is supported by most of RWLock implementations (POSIX, Java). Recursive Write Lock is supported by Java, but is not supported by POSIX implementations. In most of POSIX implementations the trial to take recursive Write Lock leads to unspecified behavior of RWLock object or to Recursive Deadlock. Note: To support Recursive Write Lock, its implementation would explicitly maintain the ID of Write Lock holding thread to permit repeatable Write Lock operation to this thread only. Lock Reentrancy Reentrancy means guaranteed success during repeat acquisition of Recursive Lock (Read or Write), which already held (at least once) by the calling thread. Java implementation guarantees Reentrancy of Read and Write Lock operations. In POSIX implementation Write Lock is not Recursive, Read Lock is Recursive, but not Reentrant. If pending Write Lock appeared between two sequential acquisitions of the Read Lock, this leads to Deadlock. Note: To support Reentrancy of Read and Write Lock, its implementation would explicitly maintain the IDs of Write Lock and Read Lock holding threads. Introduction to Network Programming in UNIX & LINUX

Wa Wa Wa Acquisition of Read Lock by Write Holder Thread leads to Deadlock in POSIX Ra Lock Downgrading in Java Ra Ra Without Deadlock Prevention Functionality Lock Upgrading scenario leads to inevitable Deadlocks. Ra Wa Wa RWLock Acquisition Policies: Downgrading and Upgrading • Lock Downgrading Ability • Ability to Downgrade the RWLock means possibility to turn Write Holder • Thread to became the Read Holder of the specific RWLock object. • The possible downgrade scenario looks as follows: • The Thread holds Write Lock on specific RWLock Object • The Thread acquires Read Lock on the same object • The Thread releases Write Lock, retaining the acquired Read Lock. • Lock Downgrading is supported in Java. • Acquisition of Read Lock by Write Holder Thread is not permitted by • POSIX and leads to Recursive Deadlock. • Note: To support Lock Downgrading, the RWLock implementation would • explicitly maintain the ID of Write Lock holding thread to permit • Read Lock acquisition operation to this thread only. • Lock Upgrading Ability • Ability to Upgrade the RWLock means possibility to turn Read Holder • Thread to became the Write Holder of the specific RWLock object. • The possible upgrade scenario looks as follows: • The Thread holds Read Lock on specific RWLock Object • The Thread acquires Write Lock on the same object • The Thread optionally releases Read Lock, retaining the acquired Write Lock. • Lock Upgrading is NOT supported by Java and POSIX, because it leads • to inevitable Deadlocks. • Note: Unlike other RWLock related scenarios, the Lock Upgrading leads • to Deadlocks not because design deficiency, but because of • RWLock pattern nature itself. • To support Lock Upgrading, the RWLock implementing systems • must have Deadlock Prevention functionality, providing automatic • interruption of potentially dangerous lock operationto avoid real • deadlocking of RWLock maintaining threads. Introduction to Network Programming in UNIX & LINUX

Thread

Thread

Presentation Transcript

Thread API

Thread Gages

Thread

Voice Thread

Ariadne’s Thread

Thread

Thread

Voice Thread

Logic Thread

Torn Thread

Thread

Logic Thread

sewing Thread

Thread Pools

Thread Basic