1 / 19

OpenMP Tutorial

OpenMP Tutorial. Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN. Shared Memory Parallel Programming in the Multi-Core Era. Desktop and Laptop 2, 4, 8 cores and … ? A single node in distributed memory clusters

azana
Download Presentation

OpenMP Tutorial

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OpenMP Tutorial Seung-Jai Min (smin@purdue.edu) School of Electrical and Computer Engineering Purdue University, West Lafayette, IN High-Performance Parallel Scientific Computing 2008 Purdue University

  2. Shared Memory Parallel Programming in the Multi-Core Era • Desktop and Laptop • 2, 4, 8 cores and … ? • A single node in distributed memory clusters • Steele cluster node: 2  8  (16) cores • /proc/cpuinfo • Shared memory hardware Accelerators • Cell processors: 1 PPE and 8 SPEs • Nvidia Quadro GPUs: 128 processing units High-Performance Parallel Scientific Computing 2008 Purdue University

  3. OpenMP:Some syntax details to get us started • Most of the constructs in OpenMP are compilerdirectives or pragmas. • For C and C++, the pragmas take the form: • #pragma omp construct [clause [clause]…] • For Fortran, the directives take one of the forms: • C$OMP construct [clause [clause]…] • !$OMP construct [clause [clause]…] • *$OMP construct [clause [clause]…] • Include files • #include “omp.h” High-Performance Parallel Scientific Computing 2008 Purdue University

  4. Parallel Program #include “omp.h”void main(){ int i, k, N=1000; double A[N], B[N], C[N];#pragma omp parallel forfor(i=0;i<N;i++) { A[i] = B[i] + k*C[i]; }} How is OpenMP typically used? • OpenMP is usually used to parallelize loops: • Find your most time consuming loops. • Split them up between threads. Sequential Program void main(){ int i, k, N=1000; double A[N], B[N], C[N];for(i=0;i<N;i++){ A[i] = B[i] + k*C[i] }} High-Performance Parallel Scientific Computing 2008 Purdue University

  5. Thread 0 Thread 1 Thread 2 Parallel Program void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 0; ub = 250; for(i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }} void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 250; ub = 500; for(i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }} void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 500; ub = 750; for(i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }} Thread 3 #include “omp.h”void main(){ int i, k, N=1000; double A[N], B[N], C[N];#pragma omp parallel forfor(i=0;i<N;i++) { A[i] = B[i] + k*C[i]; }} void main(){ int i, k, N=1000; double A[N], B[N], C[N]; lb = 750; ub = 1000; for(i=lb;i<ub;i++) { A[i] = B[i] + k*C[i]; }} How is OpenMP typically used? (Cont.) • Single Program Multiple Data (SPMD) High-Performance Parallel Scientific Computing 2008 Purdue University

  6. Serial Parallel Serial Parallel Serial OpenMP Fork-and-Join model printf(“program begin\n”); N = 1000; #pragma omp parallel for for (i=0; i<N; i++) A[i] = B[i] + C[i]; M = 500; #pragma omp parallel for for (j=0; j<M; j++) p[j] = q[j] – r[j]; printf(“program done\n”); High-Performance Parallel Scientific Computing 2008 Purdue University

  7. OpenMP Constructs OpenMP’s constructs: • Parallel Regions • Worksharing (for/DO, sections, …) • Data Environment (shared, private, …) • Synchronization (barrier, flush, …) • Runtime functions/environment variables (omp_get_num_threads(), …) High-Performance Parallel Scientific Computing 2008 Purdue University

  8. OpenMP:Structured blocks (C/C++) • Most OpenMP constructs apply to structured blocks. • Structured block: a block with one point of entry at the top and one point of exit at the bottom. • The only “branches” allowed are STOP statements in Fortran and exit() in C/C++. #pragma omp parallel { more: do_big_job(id); if(++count>1) goto more; } printf(“ All done \n”); if(count==1) goto more; #pragma omp parallel { more: do_big_job(id); if(++count>1) goto done; } done: if(!really_done()) goto more; A structured block Not A structured block High-Performance Parallel Scientific Computing 2008 Purdue University

  9. Structured Block Boundaries • In C/C++: a block is a single statement or a group of statements between brackets {} #pragma omp parallel { id = omp_thread_num(); A[id] = big_compute(id); } #pragma omp for for(I=0;I<N;I++){ res[I] = big_calc(I); A[I] = B[I] + res[I]; } • In Fortran: a block is a single statement or a group of statements between directive/end-directive pairs. C$OMP PARALLEL 10 W(id) = garbage(id) res(id) = W(id)**2 if(res(id) goto 10 C$OMP END PARALLEL C$OMP PARALLEL DO do I=1,Nres(I)=bigComp(I) end do C$OMP END PARALLEL DO High-Performance Parallel Scientific Computing 2008 Purdue University

  10. Each thread executes the same code redundantly. OpenMP Parallel Regions double A[1000];omp_set_num_threads(4); #pragma omp parallel{ int ID = omp_get_thread_num(); pooh(ID, A);} printf(“all done\n”); double A[1000]; omp_set_num_threads(4) A single copy of A is shared between all threads. pooh(0,A) pooh(1,A) pooh(2,A) pooh(3,A) printf(“all done\n”); Threads wait here for all threads to finish before proceeding (I.e. a barrier) High-Performance Parallel Scientific Computing 2008 Purdue University

  11. OpenMP shortcut: Put the “parallel” and the work-share on the same line The OpenMP APICombined parallel work-share int i; double res[MAX]; #pragma omp parallel { #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); } } int i; double res[MAX]; #pragma omp parallel for for (i=0;i< MAX; i++) { res[i] = huge(); } These are equivalent High-Performance Parallel Scientific Computing 2008 Purdue University

  12. Shared Memory Model private private • Data can be shared or private • Shared data is accessible by all threads • Private data can be accessed only by the threads that owns it • Data transfer is transparent to the programmer thread2 thread1 Shared Memory thread3 thread5 thread4 private private private High-Performance Parallel Scientific Computing 2008 Purdue University

  13. Shared Memory programming model: Most variables are shared by default Global variables are SHARED among threads Fortran: COMMON blocks, SAVE variables, MODULE variables C: File scope variables, static But not everything is shared... Stack variables in sub-programs called from parallel regions are PRIVATE Automatic variables within a statement block are PRIVATE. Data Environment:Default storage attributes High-Performance Parallel Scientific Computing 2008 Purdue University

  14. Critical Construct sum = 0; #pragma ompparallel private (lsum) { lsum = 0; #pragma omp for for (i=0; i<N; i++) { lsum = lsum+ A[i]; } #pragma omp critical { sum += lsum; } } Threads wait their turn; only one thread at a time executes the critical section High-Performance Parallel Scientific Computing 2008 Purdue University

  15. Reduction Clause sum = 0; #pragma ompparallel for reduction(+:sum) for (i=0; i<N; i++) { sum = sum+ A[i]; } Shared variable High-Performance Parallel Scientific Computing 2008 Purdue University

  16. OpenMP example: PI !$OMP PARALLEL PRIVATE(X, i) write (*,1004) omp_get_thread_num() !$OMP DO REDUCTION(+:sum) DO i = 1, 1000000000, 1 x = step*((-0.5)+i) sum = sum+4.0/(1.0+x**2) ENDDO !$OMP END DO NOWAIT !$OMP END PARALLEL High-Performance Parallel Scientific Computing 2008 Purdue University

  17. OpenMP example Running OpenMP applications on Steele • qsub –I –l nodes=1:ppn=8 • module avail • module load intel • icc/ifort omp_pi.f –o omp_pi –openmp • setenv OMP_NUM_THREADS 4 • time ./pi High-Performance Parallel Scientific Computing 2008 Purdue University

  18. Summary • OpenMP is great for parallel programming • It allows parallel programs to be written incrementally. • Sequential programs can be enhanced with OpenMP directives, leaving the original program essentially intact. • Compared to MPI: you don’t need to partition data and insert messages in OpenMP programs High-Performance Parallel Scientific Computing 2008 Purdue University

  19. Resources http://www.openmp.org http://openmp.org/wp/resources High-Performance Parallel Scientific Computing 2008 Purdue University

More Related