1 / 50

Parallel Programming with PThreads

Parallel Programming with PThreads. Threads. Sometimes called a lightweight process smaller execution unit than a process Consists of: program counter register set stack space Threads share: memory space code section OS resources(open files, signals, etc.). Threads.

perdy
Download Presentation

Parallel Programming with PThreads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Programming with PThreads

  2. Threads • Sometimes called a lightweight process • smaller execution unit than a process • Consists of: • program counter • register set • stack space • Threads share: • memory space • code section • OS resources(open files, signals, etc.)

  3. Threads • A process is defined to have at least one thread of execution • A process may launch other threads which execute concurrently with the process • Switching between threads is faster • No memory management issues, etc. • Mutual exclusion problems:

  4. Threads

  5. Why Threads? • Software Portability • Latency Hiding • Scheduling and Load Balancing • Ease of Programming and Widespread use

  6. POSIX Threads • Thread API available on many OS’s • #include <pthread.h> • cc myprog.c –o myprog -lpthread • Thread creation • int pthread_create(pthread_t * thread, pthread_attr_t * attr, void * (*start_routine)(void *), void * arg); • Thread termination • void pthread_exit(void *retval); • Waiting for Threads • int pthread_join(pthread_t th, void **thread_return);

  7. #include <pthread.h> #include <stdio.h> int print_message_function( void *ptr ); int x = 1; main() { pthread_t thread1, thread2; int thread1result, thread2result; char *message1 = "Hello"; char *message2 = "World"; pthread_attr_t *pthread_attr_default = NULL; printf("Begin\n"); pthread_create( &thread1, pthread_attr_default, (void*)&print_message_function, (void*) message1); pthread_create(&thread2, pthread_attr_default, (void*)&print_message_function, (void*) message2); pthread_join(thread1, (void *)&thread1result); printf("End thread1 with %d\n", thread1result); pthread_join(thread2, (void *)&thread2result); printf("End thread2 with %d\n", thread2result); exit(0); } int print_message_function( void *ptr ) { char *message; message = (char *) ptr; printf("%s ", message); fflush(stdout); return x++; }

  8. Thread Issues • Thread function only gets • One void * argument and void * return • Reentrance • False Sharing • Not same global variable, but within cache line

  9. Effects of False Sharing

  10. Synchronization Primitives • int pthread_mutex_init( pthread_mutex_t *mutex_lock, const pthread_mutexattr_t *lock_attr); • int pthread_mutex_lock( pthread_mutex_t *mutex_lock); • int pthread_mutex_unlock( pthread_mutex_t *mutex_lock); • int pthread_mutex_trylock( pthread_mutex_t *mutex_lock);

  11. #include <pthread.h> void *find_min(void *list_ptr) pthread_mutex_t minimum_value_lock; int minimum_value, partial_list_size; main(){ minimum_value = MIN_INT; pthread_init(); pthread_mutex_init(&minimum_value_lock, NULL); /*inititalize lists etc, create and join threads*/ } void *find_min(void *list_ptr){ int *partial_list_ptr, my_min = MIN_INT, i; partial_list_ptr = (int *)list_ptr; for (i = 0; i < partial_list_size; i++) if (partial_list_ptr[i] < my_min) my_min = partial_list_ptr[i]; pthread_mutex_lock(minimum_value_lock); if (my_min < minimum_value) minimum_value = my_min; pthread_mutex_unlock(minimum_value_lock); pthread_exit(0); }

  12. Locking Overhead • Serialization points • Minimize the size of critical sections • Be careful • Rather than wait, check if lock is available • Pthread_mutex_trylock • If already locked, will return EBUSY • Will require restructuring of code

  13. /* Finding k matches in a list */ void *find_entries(void *start_pointer) { /* This is the thread function */ struct database_record *next_record; int count; current_pointer = start_pointer; do { next_record = find_next_entry(current_pointer); count = output_record(next_record); } while (count < requested_number_of_records); } int output_record(struct database_record *record_ptr) { int count; pthread_mutex_lock(&output_count_lock); output_count ++; count = output_count; pthread_mutex_unlock(&output_count_lock); if (count <= requested_number_of_records) print_record(record_ptr); return (count); }

  14. /* rewritten output_record function */ int output_record(struct database_record *record_ptr) { int count; int lock_status; lock_status=pthread_mutex_trylock(&output_count_lock); if (lock_status == EBUSY) { insert_into_local_list(record_ptr); return(0); } else { count = output_count; output_count += number_on_local_list + 1; pthread_mutex_unlock(&output_count_lock); print_records(record_ptr, local_list, requested_number_of_records - count); return(count + number_on_local_list + 1); } }

  15. Controlling Thread and Synchronization Attributes • The Pthreads API allows a programmer to change the default attributes of entities using attributes objects. • An attributes object is a data-structure that describes entity (thread, mutex, condition variable) properties. • Once these properties are set, the attributes object can be passed to the method initializing the entity. • Enhances modularity, readability, and ease of modification.

  16. Attributes Objects for Threads • Use pthread_attr_init to create an attributes object. • Individual properties associated with the attributes object can be changed using the following functions: pthread_attr_setdetachstate, pthread_attr_setguardsize_np, pthread_attr_setstacksize, pthread_attr_setinheritsched, pthread_attr_setschedpolicy, and pthread_attr_setschedparam

  17. Attributes Objects for Mutexes • Initialize the attrributes object using function: pthread_mutexattr_init. • The function pthread_mutexattr_settype_np can be used for setting the type of mutex specified by the mutex attributes object. pthread_mutexattr_settype_np ( pthread_mutexattr_t *attr, int type); • Here, type specifies the type of the mutex and can take one of: • PTHREAD_MUTEX_NORMAL_NP • PTHREAD_MUTEX_RECURSIVE_NP • PTHREAD_MUTEX_ERRORCHECK_NP

  18. Condition Variables for Synchronization • A condition variable allows a thread to block itself until specified data reaches a predefined state. • A condition variable is associated with a predicate. • When the predicate becomes true, the condition variable is used to signal one or more threads waiting on the condition. • A single condition variable may be associated with more than one predicate. • A condition variable always has a mutex associated with it. • A thread locks this mutex and tests the predicate defined on the shared variable. • If the predicate is not true, the thread waits on the condition variable associated with the predicate using the function pthread_cond_wait.

  19. Using Condition Variables Main Thread * Declare and initialize global data/variables which require synchronization (such as "count") * Declare and initialize a condition variable object * Declare and initialize an associated mutex * Create threads A and B to do work Thread A * Do work up to the point where a certain condition must occur (such as "count" must reach a specified value) * Lock associated mutex and check value of a global variable * Call pthread_cond_wait() to perform a blocking wait for signal from Thread-B. Note that a call to pthread_cond_wait() automatically and atomically unlocks the associated mutex variable so that it can be used by Thread-B. * When signalled, wake up. Mutex is automatically and atomically locked. * Explicitly unlock mutex * Continue Thread B * Do work * Lock associated mutex * Change the value of the global variable that Thread-A is waiting upon. * Check value of the global Thread-A wait variable. If it fulfills the desired condition, signal Thread-A. * Unlock mutex. * Continue

  20. Condition Variables for Synchronization • Pthreads provides the following functions for condition variables: int pthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex); int pthread_cond_signal(pthread_cond_t *cond); int pthread_cond_broadcast(pthread_cond_t *cond); int pthread_cond_init(pthread_cond_t *cond, const pthread_condattr_t *attr); int pthread_cond_destroy(pthread_cond_t *cond);

  21. Producer-Consumer Using Locks pthread_mutex_t task_queue_lock; int task_available; ... main() { .... task_available = 0; pthread_mutex_init(&task_queue_lock, NULL); .... } void *producer(void *producer_thread_data) { .... while (!done()) { inserted = 0; create_task(&my_task); while (inserted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 0) { insert_into_queue(my_task); task_available = 1; inserted = 1; } pthread_mutex_unlock(&task_queue_lock); } } }

  22. Producer-Consumer Using Locks void *consumer(void *consumer_thread_data) { int extracted; struct task my_task; /* local data structure declarations */ while (!done()) { extracted = 0; while (extracted == 0) { pthread_mutex_lock(&task_queue_lock); if (task_available == 1) { extract_from_queue(&my_task); task_available = 0; extracted = 1; } pthread_mutex_unlock(&task_queue_lock); } process_task(my_task); } }

  23. Producer-Consumer Using Condition Variables pthread_cond_t cond_queue_empty, cond_queue_full; pthread_mutex_t task_queue_cond_lock; int task_available; /* other data structures here */ main() { /* declarations and initializations */ task_available = 0; pthread_init(); pthread_cond_init(&cond_queue_empty, NULL); pthread_cond_init(&cond_queue_full, NULL); pthread_mutex_init(&task_queue_cond_lock, NULL); /* create and join producer and consumer threads */ }

  24. Producer-Consumer Using Condition Variables void *producer(void *producer_thread_data) { int inserted; while (!done()) { create_task(); pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 1) pthread_cond_wait(&cond_queue_empty, &task_queue_cond_lock); insert_into_queue(); task_available = 1; pthread_cond_signal(&cond_queue_full); pthread_mutex_unlock(&task_queue_cond_lock); } }

  25. Producer-Consumer Using Condition Variables void *consumer(void *consumer_thread_data) { while (!done()) { pthread_mutex_lock(&task_queue_cond_lock); while (task_available == 0) pthread_cond_wait(&cond_queue_full, &task_queue_cond_lock); my_task = extract_from_queue(); task_available = 0; pthread_cond_signal(&cond_queue_empty); pthread_mutex_unlock(&task_queue_cond_lock); process_task(my_task); } }

  26. Condition Variables • Rather than just signaling one blocked thread, we can signal all • int pthread_cond_broadcast(pthread_cond_t *cond) • Can also have a timeout • int pthread_cond_timedwait( pthread_cond_t *cond, pthread_mutex_t *mutex, const struct timespec *abstime)

  27. #include <pthread.h> #include <stdio.h> #include <stdlib.h> #define NUM_THREADS 3 #define TCOUNT 10 #define COUNT_LIMIT 12 int count = 0; int thread_ids[5] = {0,1,2,3,4}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv; int main(int argc, char *argv[]) { int i, rc; pthread_t threads[5]; pthread_attr_t attr; /* Initialize mutex and condition variable objects */ pthread_mutex_init(&count_mutex, NULL); pthread_cond_init (&count_threshold_cv, NULL); /* For portability, explicitly create threads in a joinable state */ pthread_attr_init(&attr); pthread_attr_setdetachstate(&attr, PTHREAD_CREATE_JOINABLE); pthread_create(&threads[4], &attr, watch_count, (void *)&thread_ids[4]); pthread_create(&threads[3], &attr, watch_count, (void *)&thread_ids[3]); pthread_create(&threads[2], &attr, watch_count, (void *)&thread_ids[2]); pthread_create(&threads[1], &attr, inc_count, (void *)&thread_ids[1]); pthread_create(&threads[0], &attr, inc_count, (void *)&thread_ids[0]); /* Wait for all threads to complete */ for (i = 0; i < NUM_THREADS; i++) { pthread_join(threads[i], NULL); } printf ("Main(): Waited on %d threads. Done.\n", NUM_THREADS); /* Clean up and exit */ pthread_attr_destroy(&attr); pthread_mutex_destroy(&count_mutex); pthread_cond_destroy(&count_threshold_cv); pthread_exit (NULL); }

  28. int count = 0; int thread_ids[5] = {0,1,2,3,4}; pthread_mutex_t count_mutex; pthread_cond_t count_threshold_cv; void *inc_count(void *idp) { int j,i; double result=0.0; int *my_id = idp; for (i=0; i < TCOUNT; i++) { pthread_mutex_lock(&count_mutex); count++; /* Check the value of count and signal waiting thread when condition is reached. Note that this occurs while mutex is locked. */ if (count == COUNT_LIMIT) { pthread_cond_broadcast(&count_threshold_cv); printf("inc_count(): thread %d, count = %d Threshold reached.\n", *my_id, count); } printf("inc_count(): thread %d, count = %d, unlocking mutex\n", *my_id, count); pthread_mutex_unlock(&count_mutex); /* Do some work so threads can alternate on mutex lock */ for (j=0; j < 1000; j++) result = result + (double)random(); } pthread_exit(NULL); } void *watch_count(void *idp) { int *my_id = idp; printf("Starting watch_count(): thread %d\n", *my_id); /* Lock mutex and wait for signal. Note that the pthread_cond_wait routine will automatically and atomically unlock mutex while it waits. Also, note that if COUNT_LIMIT is reached before this routine is run by the waiting thread, the loop will be skipped to prevent pthread_cond_wait from never returning. */ pthread_mutex_lock(&count_mutex); if (count < COUNT_LIMIT) { pthread_cond_wait(&count_threshold_cv, &count_mutex); printf("watch_count(): thread %d Condition signal received.\n", *my_id); } pthread_mutex_unlock(&count_mutex); pthread_exit(NULL); }

  29. Composite Synchronization Constructs • By design, Pthreads provide support for a basic set of operations. • Higher level constructs can be built using basic synchronization constructs. • Consider Read-Write Locks and Barriers

  30. Read-Write Locks • In many applications, a data structure is read frequently but written infrequently. • For such applications, we should use read-write locks. • A read lock is granted when there are other threads that may already have read locks. • If there is a write lock on the data (or if there are queued write locks), the thread performs a condition wait. • If there are multiple threads requesting a write lock, they must perform a condition wait.

  31. Read-Write Locks • The lock data type mylib_rwlock_t holds the following: • a count of the number of readers, • the writer (a 0/1 integer specifying whether a writer is present), • a condition variable readers_proceed that is signaled when readers can proceed, • a condition variable writer_proceed that is signaled when one of the writers can proceed, • a count pending_writers of pending writers, and • a mutex read_write_lock associated with the shared data structure

  32. Read-Write Locks typedef struct { int readers; int writer; pthread_cond_t readers_proceed; pthread_cond_t writer_proceed; int pending_writers; pthread_mutex_t read_write_lock; } mylib_rwlock_t; void mylib_rwlock_init (mylib_rwlock_t *l) { l -> readers = l -> writer = l -> pending_writers = 0; pthread_mutex_init(&(l -> read_write_lock), NULL); pthread_cond_init(&(l -> readers_proceed), NULL); pthread_cond_init(&(l -> writer_proceed), NULL); }

  33. Read-Write Locks void mylib_rwlock_rlock(mylib_rwlock_t *l) { /* if there is a write lock or pending writers, perform condition wait.. else increment count of readers and grant read lock */ pthread_mutex_lock(&(l -> read_write_lock)); while ((l -> pending_writers > 0) || (l -> writer > 0)) pthread_cond_wait(&(l -> readers_proceed), &(l -> read_write_lock)); l -> readers ++; pthread_mutex_unlock(&(l -> read_write_lock)); }

  34. Read-Write Locks void mylib_rwlock_wlock(mylib_rwlock_t *l) { /* if there are readers or writers, increment pending writers count and wait. On being woken, decrement pending writers count and increment writer count */ pthread_mutex_lock(&(l -> read_write_lock)); while ((l -> writer > 0) || (l -> readers > 0)) { l -> pending_writers ++; pthread_cond_wait(&(l -> writer_proceed), &(l -> read_write_lock)); } l -> pending_writers --; l -> writer ++; pthread_mutex_unlock(&(l -> read_write_lock)); }

  35. Read-Write Locks void mylib_rwlock_unlock(mylib_rwlock_t *l) { /* if there is a write lock then unlock, else if there are read locks, decrement count of read locks. If the count is 0 and there is a pending writer, let it through, else if there are pending readers, let them all go through */ pthread_mutex_lock(&(l -> read_write_lock)); if (l -> writer > 0) l -> writer = 0; else if (l -> readers > 0) l -> readers --; pthread_mutex_unlock(&(l -> read_write_lock)); if ((l -> readers == 0) && (l -> pending_writers > 0)) pthread_cond_signal(&(l -> writer_proceed)); else if (l -> readers > 0) pthread_cond_broadcast(&(l -> readers_proceed)); }

  36. Barriers • A barrier holds a thread until all threads participating in the barrier have reached it. • Barriers can be implemented using a counter, a mutex and a condition variable. • A single integer is used to keep track of the number of threads that have reached the barrier. • If the count is less than the total number of threads, the threads execute a condition wait. • The last thread entering (and setting the count to the number of threads) wakes up all the threads using a condition broadcast.

  37. Barriers typedef struct { pthread_mutex_t count_lock; pthread_cond_t ok_to_proceed; int count; } mylib_barrier_t; void mylib_init_barrier(mylib_barrier_t *b) { b -> count = 0; pthread_mutex_init(&(b -> count_lock), NULL); pthread_cond_init(&(b -> ok_to_proceed), NULL); }

  38. Barriers void mylib_barrier (mylib_barrier_t *b, int num_threads) { pthread_mutex_lock(&(b -> count_lock)); b -> count ++; if (b -> count == num_threads) { b -> count = 0; pthread_cond_broadcast(&(b -> ok_to_proceed)); } else while (pthread_cond_wait(&(b -> ok_to_proceed), &(b -> count_lock)) != 0); pthread_mutex_unlock(&(b -> count_lock)); }

  39. Barriers • Linear barrier. • The trivial lower bound on execution time of this function is O(n) for n threads. • This implementation of a barrier can be speeded up using multiple barrier variables organized in a tree. • Use n/2 condition variable-mutex pairs for implementing a barrier for n threads. • At the lowest level, threads are paired up and each pair of threads shares a single condition variable-mutex pair. • Once both threads arrive, one of the two moves on, the other one waits. • This process repeats up the tree. • This is called a log barrier and its runtime grows as O(log p).

  40. Barrier • Execution time of 1000 sequential and logarithmic barriers as a function of number of threads on a 32 processor SGI Origin 2000.

  41. Semaphores • Synchronization tool provided by the OS • Integer variable and 2 operations • Wait(s) while (s <= 0) do noop; /*sleep*/ s = s - 1; • Signal(s) s = s + 1; • All modifications to s are atomic

  42. The critical section problem Shared semaphore: mutex = 1; repeat wait(mutex) critical section signal(mutex) remainder section until false

  43. P1: S1; signal(synch); P2: wait(synch); S2; Using semaphores • Two processes P1 and P2 • Statements S1 and S2 • S2 must execute only after S1

  44. Bounded Buffer Solution Shared semaphore: empty = n, full = 0, mutex = 1; repeat produce an item in nextp wait(empty); wait(mutex); add nextp to the buffer signal(mutex); signal(full); until false repeat wait(full); wait(mutex); remove an item from buffer place it in nextc signal(mutex); signal(empty); consume the item in nextc until false

  45. Readers - Writers (priority?) Shared Semaphore mutex=1, wrt = 1; Shared integer readcount = 0; wait(mutex); readcount = readcount + 1; if (readcount == 1) wait(wrt); signal(mutex); read the data wait(mutex); readcount = readcount - 1; if (readcount == 0) signal(wrt); signal(mutex); wait(wrt); write to the data object signal(wrt);

  46. Readers – Writers (priority?) outerQ, rsem, rmutex, wmutex, wsem: = 1 wait (outerQ) wait (rsem) wait (rmutex) readcnt++ if (readcnt == 1) wait (wsem) signal(rmutex) signal (rsem) signal (outerQ) READ wait (rmutex) readcnt--; if (readcnt == 0) signal (wsem) signal (rmutex) wait (wsem) writecnt++; if (writecnt == 1) wait (rsem) signal (wsem) wait (wmutex) WRITE signal (wmutex) wait (wsem) writecnt--; if (writecnt == 0) signal (rsem) signal (wsem)

  47. Unix Semaphores • Are a generalization of the counting semaphores (more operations are permitted). • A semaphore includes: • the current value S of the semaphore • number of processes waiting for S to increase • number of processes waiting for S to be 0 • System calls • semget creates an array of semaphores • semctl allows for the initialization of semaphores • semop performs a list of operations: one on each semaphore (atomically)

  48. Unix Semaphores • Each operation to be done is specified by a value sop. • Let S be the semaphore value • if sop > 0: (signal operation) • S is incremented and process awaiting for S to increase are awaken • if sop = 0: • If S=0: do nothing • if S!=0, block the current process on the event that S=0 • if sop < 0: (wait operation) • if S >= | sop | then S = S - | sop | then if S <=0 wait

More Related