1 / 18

Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP *

Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP *. Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts of this lecture was derived from chapters 1-2 in Chandra et al. Introduction.

hung
Download Presentation

Parallel Processing (CS 730) Lecture 5: Shared Memory Parallel Programming with OpenMP *

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel Processing (CS 730)Lecture 5: Shared Memory Parallel Programming with OpenMP* Jeremy R. Johnson Wed. Jan. 31, 2001 *Parts of this lecture was derived from chapters 1-2 in Chandra et al. Parallel Processing

  2. Introduction • Objective: To further study the shared memory model of parallel programming. Introduction to the OpenMP standard for shared memory parallel programming • Topics • Introduction to OpenMP • hello.c • hello.f • Loop level parallelism • Shared vs. private variables • Synchronization (implicit and explicit) • Parallel regions Parallel Processing

  3. OpenMP • Extension to FORTRAN, C/C++ • Uses directives (comments in FORTRAN, pragma in C/C++) • ignored without compiler support • Some library support required • Shared memory model • parallel regions • loop level parallelism • implicit thread model • communication via shared address space • private vs. shared variables (declaration) • explicit synchronization via directives (e.g. critical) • library routines for returning thread information (e.g. omp_get_num_threads(), omp_get_thread_num() ) • Environment variables used to provide system info (e.g. OMP_NUM_THREADS) Parallel Processing

  4. Benefits • Provides incremental parallelism • Small increase in code size • Simpler model than message passing • Easier to use than thread library • With hardware and compiler support smaller granularity than message passing. Parallel Processing

  5. Further Information • Adopted as a standard in 1997 • Initiated by SGI • www.openmp.org • Chandra, Dagum, Kohr, Maydan, McDonald, Menon, “Parallel Programming in OpenMP”, Morgan Kaufman Publishers, 2001. Parallel Processing

  6. ... P0 P1 Pn Memory Shared vs. Distributed Memory P0 P1 Pn ... M0 M1 Mn Interconnection Network Shared memory Distributed memory Parallel Processing

  7. Shared Memory Programming Model • Shared memory programming does not require physically shared memory so long as there is support for logically shared memory (in either hardware or software) • If logical shared memory then there may be different costs for accessing memory depending on the physical location. • UMA - uniform memory access • SMP - symmetric multi-processor • typically memory connected to processors via a bus • NUMA - non-uniform memory access • typically physically distributed memory connected via an interconnection network Parallel Processing

  8. IBM S80 • An SMP with upto 24 processors (RS64 III processors) • http://www.rs6000.ibm.com/hardware/enterprise/s80.html Name: Goopi.coe.drexel.edu Machine type: S80 12-Way with 8Gb RAM Specifications: 2 x 6 way 450 MHz RS64 III Processor Card, 8Mb L2 Cache 2 x 4096 Mb Memory 9 x 18.2 Gb Ultra SCSI Hot Swappable Hard Disk Drives. Name: bagha.coe.drexel.edu Machine Type: 44P Model 270 4 way with 2 Gb RAM Specifications: 2 x 2 way 375 MHz POWER3-II Processor, 4 Mb L2 Cache 4 x 512 Mb SDRAM DIMMs 2 x 9.1 Gb Ultra SCSI HDD Parallel Processing

  9. hello.c #include <stdio.h> int main(int argc, char **argv) { int n; n = atoi(argv[1]); omp_set_num_threads(n); printf("Number of threads = %d\n",omp_get_num_threads()); #pragma omp parallel { int id = omp_get_thread_num(); if (id == 0) printf("Number of threads = %d\n",omp_get_num_threads()); printf("Hello World from %d\n",id); } exit(0); } Parallel Processing

  10. hello.f program hello integer omp_get_thread_num, omp_get_num_threads print *, "Hello parallel world from threads" print *, "Num threads = ", omp_get_num_threads() !$omp parallel print *, "Num threads = ", omp_get_num_threads() print *, omp_get_thread_num() !$omp end parallel print *, "Back to the sequential world" end Parallel Processing

  11. Compiling and Executing OpenMP Programs on the IBM S80 • To compile a C program with OpenMP directives • cc_r -qsmp=omp hello.c -o hello • To compile a Fortran program with OpenMP directives • xlf_r -qsmp=omp hello.f -o hello • The environment variable OMP_NUM_THREADS controls the number of threads used in OpenMP parallel regions. It can be set from the C shell • setenv OMP_NUM_THREAD <count> • where <count> is a positive integer Parallel Processing

  12. Parallel Loop subroutine saxpy(z, a, x, y, n) integer I, n read z(n), a, x(n), y do i = 1, n z(i) = a * x(i) + y end do return end subroutine saxpy(z, a, x, y, n) integer I, n read z(n), a, x(n), y !$omp parallel do do i = 1, n z(i) = a * x(i) + y end do return end Parallel Processing

  13. Execution Model Master thread Implicit thread creation Parallel Region Master and slave threads Implicit barrier synchronization Master thread Parallel Processing

  14. More Complicated Exampe Real*8 x, y integer i, j, m, n, maxiter integer depth(*,*) integer mandel_val … maxiter = 200 do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) end do end do Parallel Processing

  15. Parallel Loop !$omp parallel do private(j,x, y) maxiter = 200 do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) end do end do !$omp end parallel do Parallel Processing

  16. Parallel Loop maxiter = 200 !$omp parallel do private(j,x, y) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) end do end do !$omp end parallel do Parallel Processing

  17. Explicit Synchronization maxiter = 200 total_iters = 0 !$omp parallel do private(j,x, y) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) !$omp critical total_iters = total_iters + depth(j,I) !$omp end critical end do end do !$omp end parallel do Parallel Processing

  18. Reduction Variables maxiter = 200 total_iters = 0 !$omp parallel do private(j,x, y) !$omp+ reduction(+:total_iters) do i = 1, m do j = 1, m x = i/real(m) y = j/real(n) depth(j,i) = mandel_val(x, y, maxiter) total_iters = total_iters + depth(j,I) end do end do !$omp end parallel do Parallel Processing

More Related