1 / 27

OMPi: A portable C compiler for OpenMP V2.0

OMPi: A portable C compiler for OpenMP V2.0. Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos. University of Ioannina. Presentation. Introduction OMPi OMPi Performance Conclusions. The OpenMP specification. High level API for parallel programming in a shared memory environment

kalkin
Download Presentation

OMPi: A portable C compiler for OpenMP V2.0

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OMPi:A portable C compiler for OpenMP V2.0 Elias Leontiadis George Tzoumas Vassilios V. Dimakopoulos University of Ioannina

  2. Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003

  3. The OpenMP specification • High level API for parallel programming in a shared memory environment • Fortran • Version 1.0, October 1997 • Version 1.1, November 1999 • Version 2.0, November 2000 • C/C++ • Version 1.0, October 1998 • Version 2.0, March 2002 • New features such as • timing routines • copyprivate and num_threads clauses • variable reprivatization • static threadprivate EWOMP 2003

  4. OpenMP compilers • Commercial compilers for specific machines • SUN, SGI, Intel, Fujitsu, etc. • OpenMP compiler projects (usually portable) • Nanos • OdinMP/CCp • Intone project • Omni EWOMP 2003

  5. Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003

  6. OMPi • Portable C compiler for OpenMP • Adheres to V.2.0 • Produces ANSI C code with POSIX threads library calls • Written entirely in C EWOMP 2003

  7. Compilation process OMPi generated C file system C compiler (cc) object file C source file OMPi library object files system linker a.out EWOMP 2003

  8. Code transformations • parallel construct • code is moved into a (thread) function • a struct is declared containing pointers to non-global shared variables • private variables are redeclared locally in the function body • original code is replaced by code that creates a team of threads executing the function • master thread executes the function, too EWOMP 2003

  9. Example int a; typedef struct { /* shared vars structure */ int (*b); /* b is shared, non-global */ } par0_t; int main() { int b, c; _omp_initialize(); { /* declare par0_vars, the shared var struct */ _OMP_PARALLEL_DECL_VARSTRUCT(par0); /* par0_vars->b will point to real b */ _OMP_PARALLEL_INIT_VAR(par0, b); /* Run the threads */ _omp_create_team(3, _OMP_THREAD, par0_thread, (void *) &par0_vars); _omp_destroy_team(_OMP_THREAD->parent); } } void *par0_thread(void *_omp_thread_data) { int _dummy = _omp_assign_key(_omp_thread_data); int (*b) = &_OMP_VARREF(par0, b); int c; c = (*(b)) + a; . . . } int a; /* global */ int main() { int b, c; #pragma omp parallel num_threads(3) \ private(c) { c = b + a; . . . } } EWOMP 2003

  10. Work sharing constructs • sectionsconstruct • a switch-case block is created • the code of each sectionis moved into a caseof the switch block • any thread may execute any section • forconstruct • each thread computes the bounds of the next chunk to execute • then, if a chunk is available, executes the for-loop within the computed bounds EWOMP 2003

  11. Threads • a pool of threads is created when the program starts, all threads are sleeping • initial pool size is number of CPUs or $OMP_NUM_THREADS • user can request a specific number of threads by using the num_threads clause or omp_set_num_threads() EWOMP 2003

  12. Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003

  13. Benchmarks • NAS parallel benchmarks • OpenMP C version of ported by Omni group (v2.3) • Results for Class W • Edinburgh University microbenchmarks (EPCC) • Measure synchronization overheads EWOMP 2003

  14. Platforms • SGI origin 2000 system • 48 MIPS R10000 CPUs • IRIX 6.5 • Compaq proliant ML 570 • 2 Intel Xeon CPUs • Redhat Linux 9.0 • SUN E-1000 Server • 4 Sparc CPUs • Solaris 5.7 EWOMP 2003

  15. Compilers • OdinMP/CCp v1.02 • Omni v1.4a • Intel C/C++ compiler (ICC) v7.1 • Mipspro v7.3 EWOMP 2003

  16. Compilation times for 2-CPU Linux system Compilation times for the SGI Origin 2000 system 70 200 odin odin 180 omni 60 omni 160 ompi ompi 50 140 icc mipspro 120 40 seconds seconds 100 30 80 60 20 40 10 20 0 0 bt lu sp bt lu sp NAS parallel benchmarks Compilation Time EWOMP 2003

  17. NAS parallel benchmarksSGI Origin 2000 (execution time) bt.W 110 ompi omni 100 mipspro 90 80 70 60 50 seconds 40 30 20 10 1 2 3 4 5 6 7 8 number of threads EWOMP 2003

  18. NAS parallel benchmarksSGI Origin 2000 cg.W 10 ompi omni 9 mipspro 8 7 6 5 4 seconds 3 2 1 0 1 2 3 4 5 6 7 8 number of threads EWOMP 2003

  19. NAS parallel benchmarksSGI Origin 2000 ft.W 6 ompi omni 5.5 mipspro 5 4.5 4 3.5 seconds 3 2.5 2 1.5 1 2 3 4 5 6 7 8 number of threads EWOMP 2003

  20. NAS parallel benchmarksSGI Origin 2000 lu.W 160 ompi omni mipspro 140 120 100 80 seconds 60 40 20 1 2 3 4 5 6 7 8 EWOMP 2003 number of threads

  21. NAS parallel benchmarks Sun E-1000 bt.W cg.W 1000 90 ompi ompi omni omni 900 80 800 70 700 60 600 50 seconds seconds 500 40 400 30 300 20 200 10 1 2 3 4 1 2 3 4 number of threads number of threads ft.W lu.W 40 2000 ompi ompi omni omni 1800 35 1600 30 1400 1200 25 seconds seconds 1000 20 800 600 15 400 EWOMP 2003 10 200 1 2 3 4 1 2 3 4 number of threads

  22. odin ompi 1000 1000 parallel parallel for for 900 900 parallel for parallel for barrier barrier 800 800 single single 700 critical 700 critical lock unlock lock unlock 600 600 ordered ordered atomic atomic microseconds microseconds 500 500 reduction reduction 400 400 300 300 200 200 100 100 0 0 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 number of threads number of threads EPCC microbenchmarksSGI (overheads) EWOMP 2003

  23. omni ompi parallel parallel 1400 1400 for for parallel for parallel for barrier 1200 barrier 1200 single single critical critical 1000 1000 lock unlock lock unlock ordered ordered atomic atomic 800 800 microseconds microseconds reduction reduction 600 600 400 400 200 200 0 0 1 2 3 4 1 2 3 4 number of threads number of threads EPCC microbenchmarksSUN EWOMP 2003

  24. Presentation • Introduction • OMPi • OMPi Performance • Conclusions EWOMP 2003

  25. Conclusions • C compiler for OpenMP V.2.0 • Written in C, generated code uses pthreads • Tested on Linux, Solaris, Irix • Performance satisfactory, comparable with native compilers EWOMP 2003

  26. Current status • Target solaris threads, sproc • Improve overheads (e.g. ordered) • Improve produced code (optimizations) • Profiling code EWOMP 2003

  27. Thank you http://www.cs.uoi.gr/~ompi

More Related