1 / 45

Shared-memory Parallel Programming

Shared-memory Parallel Programming. Taura Lab M1 Yuuki Horita. Agenda. Introduction Sample Sequential Program Multi-thread programming OpenMP Summary. Agenda. Introduction Sample Sequential Program Multi-thread programming OpenMP Summary. Parallel Programming Model.

reina
Download Presentation

Shared-memory Parallel Programming

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shared-memory Parallel Programming Taura Lab M1 Yuuki Horita Parallel and Distributed Programming

  2. Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming

  3. Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming

  4. Parallel Programming Model • Message Passing Model • Talked by Imatake-kun just now • Shared Memory Model • Memory is shared with all process elements • Multiprocessor (SMP, SunFire, …) • DSM (Distributed Shared Memory) • Process elements can communicate each other through the shared memory Parallel and Distributed Programming

  5. Shared Memory Model …… PE PE PE Memory Parallel and Distributed Programming

  6. Shared Memory Model • Simplicity • not necessary to think about the location of the computation data • Fast communication (Multiprocessor) • not necessary to use networks in process communication • Dynamic load sharing • the same reason as simplicity Parallel and Distributed Programming

  7. Shared Memory Parallel Programming • Multi-thread programming • Pthreads • OpenMP • Parallel Programming model for shared memory multiprocessor Parallel and Distributed Programming

  8. Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming

  9. Sample Sequential Program FDM (Finite Difference Method) …loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } }… Parallel and Distributed Programming

  10. Parallelization Procedure Assignment Decomposition Sequential Computation Tasks Process Elements Mapping Orchestration Processors Parallel and Distributed Programming

  11. Parallelize theSequential Program • Decomposition a task …loop{ for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); } } }… Parallel and Distributed Programming

  12. Parallelize the Sequential Program • Assignment PE Divide the tasks equally among process elements PE PE PE Parallel and Distributed Programming

  13. Parallelize the Sequential Program • Orchestration PE PE need to communicate and to synchronize PE PE Parallel and Distributed Programming

  14. Parallelize the Sequential Program • Mapping PE PE PE PE Multiprocessor Parallel and Distributed Programming

  15. Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming

  16. Multi-thread Programming • A process element is a thread • cf. a process • Memory is shared among all threads generated from the same process • Threads can communicate with each other through shared memory Parallel and Distributed Programming

  17. Fork-Join Model Main Thread Program starts (Main Thread) Serialized Section Fork Main Thread creates new threads Parallelized Section Other threads join Main Thread Join Serialized Section Main Thread continues processing Parallel and Distributed Programming

  18. Libraries for Thread Programming • Pthreads (C/C++) • pthread_create() • pthread_join() • Java Thread • Thread Class / Runnable Interface Parallel and Distributed Programming

  19. Pthreads API (fork/join) • pthread_t // thread variable • pthread_create ( pthread_t *thread, // thread variable pthread_attr_t *attr, // thread attributes void *(*func)(void *), // start function void *arg // arguments of the function) • pthread_join ( pthread_t thread, // thread variable void **thread_return // the return value of the thread) Parallel and Distributed Programming

  20. Pthreads Parallel Programming #include …void do_sequentially (void){ /* sequential execution */} main (){… do_sequentially(); // want to parallelize…} Parallel and Distributed Programming

  21. Pthreads Parallel Programming #include …#include <pthread.h>void do_in_parallel (void){ /* parallel execution */} main (){pthread_t tid;… pthread_create(&tid, NULL, (void *)do_in_parallel, NULL); do_in_parallel();pthread_join(tid);…} Parallel and Distributed Programming

  22. Exclusive Access Control ThreadA ThreadB int sum = 0; thread_A(){ sum++;} thread_B(){ sum++;} sum = 0 0 a ← read sum 0 a ← read sum a = a + 1 a = a + 1 1 write a → sum sum = 1 1 write a → sum sum = 1 Parallel and Distributed Programming

  23. Pthreads API (Exclusive Access Control) • Variablepthread_mutex_t • Initialization Function pthread_mutex_init( pthread_mutex_t *mutex, pthread_mutexattr_t *mutexattr ) • Lock Function pthread_mutex_lock(pthread_mutex_t *mutex) pthread_mutex_unlock(pthread_mutex_t *mutex) Parallel and Distributed Programming

  24. Exclusive Access Control ThreadA ThreadB int sum = 0;pthread_mutex_t mutex;pthread_mutex_init(&mutex, 0) thread_A(){pthread_mutex_lock(&mutex); sum++;pthread_mutex_unlock(&mutex);} thread_B(){pthread_mutex_lock(&mutex); sum++;pthread_mutex_unlock(&mutex);} acquire lock acquire lock sum ++ release lock acquire lock sum ++ release lock Parallel and Distributed Programming

  25. Pthreads API (Condition Variable) • Variablepthread_cond_t • Initialization Functionpthread_cond_init( pthread_cond_t *cond, pthread_condattr_t *condattr ) • Condition Functionpthread_cond_wait(pthread_cond_t *cond, pthread_mutex_t *mutex) pthread_cond_broadcast(pthread_cond_t *cond) pthread_cond_signal(pthread_cond_t *cond); Parallel and Distributed Programming

  26. Condition Wait ThreadA acquire lock pthread_mutex_lock(&mutex)while( condition is not satisfied ){pthread_cond_wait(&cond, &mutex);}pthread_mutex_unlock(&mutex); sleep release lock Is condition satisfied? ThreadB pthread_mutex_lock(&mutex)update_condition();pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex); pthread_cond_broadcastpthread_cond_signal release lock Parallel and Distributed Programming

  27. Synchronization • Synchronization in the sample program n = 0;…pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex); Parallel and Distributed Programming

  28. Characteristics of Pthreads • troublesome to describe exclusive access control and synchronization • likely to be deadlocked • still hard to parallelize a given sequential program Parallel and Distributed Programming

  29. Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming

  30. What’s OpenMP? • specification for a set of compiler directives, library routines, and environment variables that can be used to specify shared memory parallelism in Fortran and C/C++ programs • Fortran ver1.0 API – Oct.1997 • C/C++ ver1.0 API – Oct. 1998 Parallel and Distributed Programming

  31. Background of OpenMP • spread of shared memory multiprocessors • need for common directives in shared memory multiprocessors • Each vendors had provided a different set of directives • need for simpler and more flexible interface for developing parallel applications • Pthread is hard for developers to describe parallel applications Parallel and Distributed Programming

  32. OpenMP API • Directives • Libraries • Environment Variables Parallel and Distributed Programming

  33. Directives • C/C++ • Fortran #pragma omp directive_name… !$OMP directive_name… If user’s compiler doesn’t support openMP, the directive sentencesare ignored and therefore the program can be executed as a sequential program. Parallel and Distributed Programming

  34. Parallel Region • the part parallelized by some threads #pragma omp parallel{ /* parallel region */} create some threads at the beginning of the parallel region join at the end of the parallel region Parallel and Distributed Programming

  35. Parallel Region (thread) • the number of thread • omp_get_num_threads() : get current # of threads • omp_set_num_threads(int nthreads) : set # of threads to nthreads • $OMP_NUM_THREADS • thread ID (0~# of threads-1) • omp_get_thread_num() : get thread ID Parallel and Distributed Programming

  36. Work Sharing Construction • specify the task assignment inside parallel region • for • sharing iterations among threads • sections • sharing sections among threads • single • executing only by one thread Parallel and Distributed Programming

  37. Example of Work Sharing omp_set_num_threads(4); #pragma omp parallel#pragma omp forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} omp_set_num_threads(4); #pragma omp parallel forfor (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} Memory access conflict at i and j makes the computation slow Parallel and Distributed Programming

  38. Data Scoping Attributes • specify the data scoping at parallel construction or work sharing construction • shared( var_list ) • var_list is shared among threads • private( var_list ) • var_list is private • reduction (operator : var_list ) • ex) #pragma omp for reduction (+: sum) • var_list is private in construction and reflected after the construction Parallel and Distributed Programming

  39. Example of Data Scoping Attributes omp_set_num_threads(4); #pragma omp parallel for private(i, j)for (i=0; i<N; i++){ for (j=0; j<N; j++){ a[i][j] = 0.2 * (a[i][j-1] + a[i][j+1] + a[i-1][j] + a[i+1][j] + a[i][j]); }} Parallel and Distributed Programming

  40. Synchronization • barrier • wait until all threads reach this line • #pragma omp barrier • critical • execute exclusively • #pragma omp critical [(name)] { … } • atomic • update a scalar variable atomically • #pragma omp atomic…… Parallel and Distributed Programming

  41. Synchronization (Pthreads/OpenMP) • Synchronization in the sample program <Pthreads> pthread_mutex_lock(&mutex);n++;while ( n < nthreads ){ pthread_cond_wait(&cond, &mutex);}pthread_cond_broadcast(&cond);pthread_mutex_unlock(&mutex); <OpenMP> #pragma omp barrier Parallel and Distributed Programming

  42. Summary of OpenMP • Incremental parallelization of sequential programs • Portability • Easier to implement parallel application than Pthreads and MPI Parallel and Distributed Programming

  43. Agenda • Introduction • Sample Sequential Program • Multi-thread programming • OpenMP • Summary Parallel and Distributed Programming

  44. Message Passing Model / Shared Memory Model Parallel and Distributed Programming

  45. Thank you! Parallel and Distributed Programming

More Related