1 / 24

Computer architecture II

Computer architecture II. Programming: POSIX Threads OpenMP. OpenMP overview. Open specifications for Multi Processing A set of API for writing multi threaded applications C/C++ and Fortran Thread-based parallelism fork/join model OpenMP Compiler directives. Library calls

havard
Download Presentation

Computer architecture II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computer architecture II Programming: POSIX Threads OpenMP Computer Architecture II

  2. OpenMP overview Open specifications for Multi Processing A set of API for writing multi threaded applications C/C++ and Fortran Thread-based parallelism fork/join model OpenMP Compiler directives. Library calls Environment variables Computer Architecture II

  3. OpenMP Release History • 1997: OpenMP Fortran 1.0 • 1998: OpenMP C/C++ 1.0 • 1999: OpenMP Fortran 1.1 • 2000: OpenMP Fortran 2.0 • 2002: OpenMP C/C++ 2.0 Computer Architecture II

  4. Goals • Standard for shared-memory machines • major computer hardware and software vendors • Limited number of directives • Ease of Use: • Incrementally parallelize a serial program • Coarse-grain and fine-grain parallelism • Portability: • Fortran (77, 90, and 95), C, and C++ Computer Architecture II

  5. OpenMP Constructs • Directives • Parallel Region • Work-sharing • Synchronization • Runtime Library Routines • Environment variables Computer Architecture II

  6. OpenMP C Directive format # pragma omp directivename [clauses..] { code } Computer Architecture II

  7. 1.a. Parallel Regions Directive • Indicates a block of code that will be executed by multiple threads. • Fork-join model #include<omp.h> void main() { int x; sequential code(); #pragma omp parallel { parallel code(); } sequential code(); } M Master creates (forks) threads T0 T1 T2 Join threads Computer Architecture II

  8. 1.b. Work sharing Directive • Types • for • section • single • For Construct • Assigns work to all threads. • The method of assigning depends on a SCHEDULE clause. • A implicit barrier is to be assumed at the end • All the private variables are flushed at the end Computer Architecture II

  9. Work Sharing example for(i=0;I<N;i++) { a[i] = a[i] + b[i];} Sequential code #pragma omp parallel{ int id, i, Nthrds, istart, iend; id = omp_get_thread_num(); Nthrds = omp_get_num_threads(); istart = id * N / Nthrds; iend = (id+1) * N / Nthrds; for(i=istart;I<iend;i++) { a[i] = a[i] + b[i];}} OpenMP parallel region OpenMP parallel region and a work-sharing for-construct #pragma omp parallel #pragma omp for schedule(static) for(i=0;I<N;i++) { a[i] = a[i] + b[i];}

  10. Schedule Clause The schedule clause effects how loop iterations are mapped onto threads • schedule(static [,chunk]) – assigns a number of “chunk” iterations to each thread. • schedule(dynamic[,chunk]) – When free, each thread picks “chunk” iterations from a queue until all iterations have been executed. • schedule(guided[,chunk]) – a special dynamical schedule. At the beginning each thread grads “chunk” iteration, then the number decreases slowly. Computer Architecture II

  11. Section Directive • Non-iterative construct • Each section is executed by one thread # pragma omp parallel { # pragma omp sections { #pragma omp section code_executed_by_one(); #pragma omp section code_executed_byanother_one(); } } Computer Architecture II

  12. Single Directive • One thread only will execute the single section, while the others will do nothing. # pragma omp single code_executed_by_only_one Computer Architecture II

  13. Parallel Regions and work sharing Directives A parallel region directive could be combined with a work-sharing construct. # pragma omp parallel for ScheduleClause # pragma omp parallel sections Computer Architecture II

  14. Data Scoping Clauses • Scoping: in which blocks of programs are the declared variable visible • By default the majority of variables is shared • Exceptions • Loop index within a parallel for • Subroutines called within a parallel region • Local variables declared within lexical scope of a parallel region • Is recommended to declare explicitly the scope of variables by using the clauses • SHARED: the variables are shared among threads • PRIVATE: the variable is private to a thread • FIRSTPRIVATE: the variable is private and all the private copies are initialized to the value from the original object location before entering the parallel region • LASTPRIVATE: the value of the last iteration is copied to the original object location • REDUCTION: performs a reduction on the private variables at the end of the parallel construct Computer Architecture II

  15. Reduction example #include <omp.h> main () { int i; float a[100], b[100], result; result = 0.0; #pragma omp parallel for private(i) reduction(+:result) for (i=0; i < n; i++) result = result + (a[i] * b[i]); } a and b arrays are shared (by default) result is declared private and reduced at the end i is private by default (one of the 3 exceptions)

  16. 1.c. Synchronization directives !$omp barrier !$omp noWait !$omp critical !$omp master !$omp flush Computer Architecture II

  17. Synchronization Directives • When a BARRIER directive is reached, a thread will wait at that point until all other threads have reached that barrier. • Implicit barriers are applied at: • End parallel regions • End of work sharing constructs (for,sections,single) • End of critical sections Computer Architecture II

  18. Synchronization Directives • NoWait is a construct that overcomes the implicit barriers. • It is used with: • Parallel Regions Directives • Work sharing Directives Computer Architecture II

  19. Synchronization Directives • The CRITICAL directive specifies a region of code that must be executed by only one thread at a time • It blocks all other threads until the current thread exits that CRITICAL region. • # pragma omp critical name • The optional name enables multiple different CRITICAL regions to exist • Different CRITICAL regions with the same name are treated as the same region. Computer Architecture II

  20. Synchronization Directives • The FLUSH directive identifies a synchronization point at which the implementation must provide a consistent view of memory. Thread-visible variables are written back to memory at this point. • FLUSH is implied implicitly with these directives: • critical - entry and exit • barrier • parallel - exit • for - exit • sections - exit • single - exit Computer Architecture II

  21. 2. Runtime Library Routines • The OpenMP standard defines an API for library calls that perform a variety of functions: • Query the number of threads/processors, set number of threads to use • General purpose locking routines (semaphores) • Set execution environment functions: nested parallelism, dynamic adjustment of threads. Computer Architecture II

  22. Runtime Library Routines • sets the number of threads that will be used in the next parallel region. void omp_set_num_threads(int num_threads) • returns the number of threads that are currently in the team executing the parallel region from which it is called. int omp_get_num_threads(void) • returns the thread number of the thread, within the team. This number will be between 0 and OMP_GET_NUM_THREADS-1. The master thread of the team is thread 0 int omp_get_thread_num(void) • returns the number of processors that are available to the program. int omp_get_num_procs(void) • Used to determine if the section of code which is executing is parallel or not. int omp_in_parallel(void) Computer Architecture II

  23. Runtime Library Routines • By default, a program with multiple parallel regions will use the same number of threads to execute each region. • This behavior can be changed to allow the run-time system to dynamically adjust the number of threads that are created for a given parallel section. • To enables or disables dynamic adjustment (by the run time system) of the number of threads available for execution of parallel regions. void omp_set_dynamic(int dynamic_threads) Computer Architecture II

  24. 3. Environment Variables • Some of them are variants of run-time library calls • OMP_NUM_THREADS Sets the maximum number of threads to use during execution. For example: setenv OMP_NUM_THREADS 8 • OMP_DYNAMIC Enables or disables dynamic adjustment of the number of threads available for execution of parallel regions. Valid values are TRUE or FALSE. For example: setenv OMP_DYNAMIC TRUE Computer Architecture II

More Related