1 / 46

! serail code to demo data-sharing later program sharing-seq implicit none

/home/jemmyhu/CES706/openmp/Fortran/synch/. ! serail code to demo data-sharing later program sharing-seq implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i do i = 1, N x(i) = i

tanek
Download Presentation

! serail code to demo data-sharing later program sharing-seq implicit none

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. /home/jemmyhu/CES706/openmp/Fortran/synch/ ! serail code to demo data-sharing later program sharing-seq implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i do i = 1, N x(i) = i end do total = 0 do i = 1, N total = total + x(i) end do write(*,*) "total = ", total end program [jemmyhu@saw-login1:~ ]$ ./sharing-atomic-seq total = 1250000025000000

  2. program sharing_par1 implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i !$omp parallel !$omp do do i = 1, N x(i) = i end do !$omp end do total = 0 !$omp do do i = 1, N total = total + x(i) end do !$omp end do !$omp end parallel write(*,*) "total = ", total end program ! Parallel code with openmp do directives ! Run result varies from run to run ! Due to the chaos with ‘total’ global variable [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 312500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 937500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 312500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 312500012500000 [jemmyhu@saw-login1:~] ./sharing-atomic-par1 total = 937500012500000

  3. Synchronization categories • Mutual Exclusion Synchronization critical atomic • Event Synchronization barrier ordered master • Custom Synchronization flush (lock – runtime library)

  4. Named Critical Sections A named critical section must synchronize with other critical sections of the same name but can execute concurrently with critical sections of a different name. cur_max = min_infinity cur_min = plus_infinity !$omp parallel do do I = 1, n if (a(i).gt. cur_max) then !$omp critical (MAXLOCK) if (a(i).gt. cur_max) then cur_max = a(i) endif !$omp critical (MAXLOCK) endif if (a(i).lt. cur_min) then !$omp critical (MINLOCK) if (a(i).lt. cur_max) then cur_min = a(i) endif !$omp critical (MINLOCK) endif enddo

  5. program sharing_par2 use omp_lib implicit none integer, parameter :: N = 50000000 integer(selected_int_kind(17)) :: x(N) integer(selected_int_kind(17)) :: total integer :: i !$omp parallel !$omp do do i = 1, N x(i) = i end do !$omp end do total = 0 !$omp do do i = 1, N !$omp atomic total = total + x(i) end do !$omp end do !$omp end parallel write(*,*) "total = ", total end program ! Parallel code with openmp do directives ! Synchronized with atomic directive ! which give correct answer, but cost more [jemmyhu@saw-login1:~] ./sharing-atomic-par2 total = 1250000025000000 [jemmyhu@saw-login1:~] ./sharing-atomic-par2 total = 1250000025000000 [jemmyhu@saw-login1:~] ./sharing-atomic-par2 total = 1250000025000000

  6. Barriers are used to synchronize the execution of multiple threads within a parallel region, not within a work-sharing construct.Ensure that a piece of work has been completed before moving on to the next phase !$omp parallel private(index) index = generate_next_index() do while (inex .ne. 0) call add_index (index) index = generate_next_index() enddo ! Wait for all the indices to be generated !$omp barrier index = get_next_index() do while (inex .ne. 0) call process_index (index) index = get_next_index() enddo !omp end parallel

  7. Ordered Sections • Impose an order across the iterations of a parallel loop • Identify a portion of code within each loop iteration that must be executed in the original, sequential order of the loop iterations. • Restrictions: If a parallel loop contains an ordered directive, then the parallel loop directive itself must contain the ordered clause An iteration of a parallel loop is allowed to encounter at most one ordered section !$omp parallel do ordered do i = 1, n a(i) = … complex calculation here … ! Wait until the previous iteration has finished its section !$omp ordered print *, a(i) ! Signal the completion of ordered from this iteration !omp end ordered enddo

  8. The problem with this example is that operations on variables a and b are not ordered with respect to each other. For instance, nothing prevents the compiler from moving the flush of b on thread 1 or the flush of a on thread 2 to a position completely after the critical section (assuming that the critical section on thread 1 does not reference b and the critical section on thread 2 does not reference a). If either re-ordering happens, the critical section can be active on both threads simultaneously.

  9. Lock: low-level synchronization functions • Why use lock 1) The synchronization protocols required by a problem cannot be expressed with OpenMP’s high-level synchronization constructs 2) The parallel overhead incurred by OpenMP’s high-level synchronization constructs is too large The simple lock routines are as follows: • omp_init_lock routine initializes a simple lock. • omp_destroy_lock routine uninitializes a simple lock. • omp_set_lock routine waits until a simple lock is available, and then sets it. • omp_unset_lock routine unsets a simple lock. • omp_test_lock routine tests a simple lock, and sets it if it is available. Formats (omp.h) C/C++ Fortran data type omp_lock_t nvar must be an integer variable of Fortran kind=omp_nest_lock_kind. void omp_init_lock(omp_lock_t *lock); subroutine omp_init_lock(svar) integer (kind=omp_lock_kind) svar

  10. OpenMP excution model (nested parallel)

  11. program LIB_ENV use omp_lib implicit none integer :: nthreads logical :: dynamics, nnested integer :: myid write(*,*) "start" nthreads = omp_get_num_threads() dynamics = omp_get_dynamic() nnested = omp_get_nested() write(*,*) "nthreads, dynamics, nnested : ", nthreads, dynamics, nnested write(*,*) "before" !$omp parallel private(myid) !$omp master nthreads = omp_get_num_threads() dynamics = omp_get_dynamic() nnested = omp_get_nested() write(*,*) "nthreads, dynamics, nnested : ", nthreads, dynamics, nnested !$omp end master myid = omp_get_thread_num() write(*,*) "myid : ", myid !$omp end parallel write(*,*) "after" end program

  12. /home/jemmyhu/CES706/openmp/Fortran/data-scope [jemmyhu@saw-login1:~] f90 -openmp -o openmp_lib_env-f90 openmp_lib_env.f90 [jemmyhu@saw-login1:~] ./openmp_lib_env start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 8 F F myid : 0 myid : 3 myid : 2 myid : 1 myid : 4 myid : 7 myid : 6 myid : 5 after

  13. /home/jemmyhu/CES706/openmp/Fortran/data-scope/openmp_lib_env-2.f90 … write(*,*) "changes before" call omp_set_dynamic(.TRUE.) call omp_set_nested(.TRUE.) !$omp parallel private(myid) !$omp master nthreads = omp_get_num_threads() dynamics = omp_get_dynamic() nnested = omp_get_nested() write(*,*) "nthreads, dynamics, nnested : ", nthreads, dynamics, nnested !$omp end master myid = omp_get_thread_num() write(*,*) "myid : ", myid !$omp end parallel write(*,*) "after“ …..

  14. [jemmyhu@saw-login1:~] ./openmp_lib_env-2 start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 8 F F myid : 0 myid : 2 myid : 4 myid : 1 myid : 5 myid : 6 myid : 7 myid : 3 after changes before nthreads, dynamics, nnested : 8 T T myid : 2 myid : 0 myid : 4 myid : 1 myid : 3 myid : 6 myid : 7 myid : 5 after

  15. [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_NUM_THREADS=4 [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] ./openmp_lib_env-2-ifort start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 4 F F myid : 0 myid : 2 myid : 1 myid : 3 after changes before myid : 2 myid : 1 myid : 3 nthreads, dynamics, nnested : 4 T T myid : 0 after [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] Intel compiler on silky

  16. Intel compiler on silky [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_NUM_THREADS=4 [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] ./openmp_lib_env-ifort start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 4 F F myid : 0 myid : 1 myid : 2 myid : 3 after [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope

  17. Intel compiler on silky [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_DYNAMIC="TRUE" [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] export OMP_NESTED="TRUE" [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope] ./openmp_lib_env-ifort start nthreads, dynamics, nnested : 1 T T before nthreads, dynamics, nnested : 4 T T myid : 1 myid : 2 myid : 0 myid : 3 after [jemmyhu@silky:~/CES706/openmp/Fortran/data-scope]

  18. Pathscale on Opteron [jemmyhu@wha781 data-scope]$ ./openmp_lib_env-f90 start nthreads, dynamics, nnested : 1 F F before nthreads, dynamics, nnested : 2 F F myid : 0 myid : 0 1 after [jemmyhu@wha781 data-scope]$ export OMP_DYNAMIC="TRUE" [jemmyhu@wha781 data-scope]$ export OMP_NESTED="TRUE" [jemmyhu@wha781 data-scope]$ ./openmp_lib_env-f90 ** OpenMP warning: dynamic thread adjustment not available (ignored OMP_DYNAMIC) start nthreads, dynamics, nnested : 1 F T before nthreads, dynamics, nnested : 2 F T myid : 1 myid : 0 after [jemmyhu@wha781 data-scope]$

  19. Example - pi

  20. #include <omp.h> /* OpenMP header file*/ #define NUM_STEPS 100000000 int main(int argc, char *argv[]) { int i; double x, pi; double sum = 0.0; double step = 1.0/(double) NUM_STEPS; int nthreads; /* do computation -- using all available threads */ #pragma omp parallel { #pragma omp master { nthreads = omp_get_num_threads(); } #pragma omp for private(x) reduction(+:sum) schedule(runtime) for (i=0; i < NUM_STEPS; ++i) { x = (i+0.5)*step; sum = sum + 4.0/(1.0+x*x); } #pragma omp master { pi = step * sum; } } /* print results */ printf("parallel program results with %d threads:\n", nthreads); printf("pi = %g (%17.15f)\n",pi, pi); return EXIT_SUCCESS; }

More Related