1 / 16

OpenMP in a H eterogeneous W orld

OpenMP in a H eterogeneous W orld. Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston. Top 10 Supercomputers (June 2011). Why OpenMP. Shared memory parallel programming model Extends C, C++. Fortran Directives-based

chase
Download Presentation

OpenMP in a H eterogeneous W orld

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OpenMP in a Heterogeneous World Ayodunni Aribuki Advisor: Dr. Barbara Chapman HPCTools Group University of Houston

  2. Top 10 Supercomputers (June 2011)

  3. Why OpenMP • Shared memory parallel programming model • Extends C, C++. Fortran • Directives-based • Single code for sequential and parallel version • Incremental parallelism • Little code modification • High-level • Leave multithreading details to compiler and runtime • Widely supported by major compilers • Open64, Intel, GNU, IBM, Microsoft, … • Portable www.openmp.org

  4. OpenMP Example Fork #pragmaomp parallel { inti; #pragmaomp for for(i=0;i<100;i++){ //do stuff } //do more stuff } 25-49 50-74 75-99 0-24 Implicit barrier More stuff More stuff More stuff More stuff Join

  5. Memory Memory Memory Memory Present/Future Architectures & Challenges they pose Node 0 Node 0 Node 1 Node 1 Node 2 Node 2 Node 3 Node 3 Memory Memory Memory Memory Memory … accelerator Heterogeneity Location Many more CPUS Scalability

  6. Heterogeneous Embedded Platform

  7. Heterogeneous High-Performance Systems Each node has multiple CPU cores, and some of the nodes are equipped with additional computational accelerators, such as GPUs. www.olcf.ornl.gov/wp-content/uploads/.../Exascale-ASCR-Analysis.pdf

  8. Programming Heterogeneous Multicore:Issues Always hardware-specific! • Must map data/computations to specific devices • Usually involves substantial rewrite of code • Verbose code • Move data to/fromdevice x • Launch kernel on device • Wait until y is ready/done • Portability becomes an issue • Multiple versions of same code • Hard to maintain

  9. Programming Models? Today’s Scenario // Run one OpenMP thread per device per MPI node #pragma omp parallel num_threads(devCount) if (initDevice()) { // Block and grid dimensions dim3 dimBlock(12,12); kernel<<<1,dimBlock>>>(); cudaThreadExit(); } else { printf("Device error on %s\n",processor_name); } MPI_Finalize(); return 0; } www.cse.buffalo.edu/faculty/miller/Courses/CSE710/heavner.pdf

  10. OpenMP in the Heterogeneous World • All threads are equal • No vocabulary for heterogeneity, separate device • All threads must have access to the memory • Distributed memories common in embedded systems • Memories may not be coherent • Implementations rely on OS and threading libraries • Memory allocation, synchronization e.g. Linux, Pthreads

  11. Extending OpenMP Example HWA Main Memory Uploadremote data Application data Application data #pragma ompparallel for target(dsp) for(j=0;i<m;i++) for (i=0;i<n,i++) c(i,j)=a(i,j)+b(i,j) Downloadremote data General Purpose Processor Cores RemoteProcedure call Device cores

  12. Heterogeneous OpenMP Solution Stack 12 OpenMP Parallel Computing Solution Stack OpenMP Application User layer • Language extensions • Efficient code generation Directives, Compiler OpenMP library Environment variables Prog. layer OpenMP API Runtime library • Target Portable Runtime Interface System layer OS/system support for shared memory MCAPI, MRAPI, MTAPI … Core 1 Core 2 Core n

  13. Summarizing My Research • OpenMP on heterogeneous architectures • Expressing heterogeneity • Generating efficient code for GPUs/DSPs • Managing memories • Distributed • Explicitly managed • Enabling portable implementations

  14. Backup

  15. MCA: Generic Multicore Programming (www.multicore-association.org) • Solve portability issue in embedded multicore programming • Defining and promoting open specifications for • Communication - MCAPI • Resource Management - MRAPI • Task Management - MTAPI

  16. Heterogeneous Platform: CPU + Nvidia GPU

More Related