1 / 17

MPI – An introduction by Jeroen van Hunen

MPI – An introduction by Jeroen van Hunen. What is MPI and why should we use it? Simple example + some basic MPI functions Other frequently used MPI functions Compiling and running code with MPI Domain decomposition Stokes solver Tracers/markers Performance Documentation.

idola
Download Presentation

MPI – An introduction by Jeroen van Hunen

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MPI – An introduction by Jeroen van Hunen • What is MPI and why should we use it? • Simple example + some basic MPI functions • Other frequently used MPI functions • Compiling and running code with MPI • Domain decomposition • Stokes solver • Tracers/markers • Performance • Documentation

  2. What is MPI? • Mainly a data communication tool: “Message-Passing Interface” • Allows parallel calculation on distributed memory machines • Usually Single-Program-Multiple-Data principle used: • all processors have similar tasks (e.g. in domain decomposition) • Alternative: OpenMP for shared memory machines Why should we use MPI? • If sequential calculations take too long • If sequential calculations use too much memory

  3. Simple MPI example Code: contains definitions, macros, function prototypes • initialize MPI • ask processor ‘rank’ • ask # processors p Output for 4 processors: stop MPI

  4. MPI calls for sending/receiving data

  5. MPI_SEND and MPI_RECV syntax in C: in Fortran: in C: in Fortran:

  6. MPI data types in C: in Fortran:

  7. Other frequently used MPI calls Sending and receiving at the same time: no risk for deadlocks: … or overwrite send buffer with received info:

  8. Other frequently used MPI calls Synchronizing the processors: wait for each other at the barrier: Broadcasting a message from one processor to all the others: both sending and receiving processors use same call to MPI_BCAST

  9. Other frequently used MPI calls “Reducing” (combining) data from all processors: add, find maximum/minimum, etc. OP can be one of the following: For results to be available at all processors, use MPI_Allreduce:

  10. Additional comments: • ‘wildcards’ are allowed in MPI calls for: • source: MPI_ANY_SOURCE • tag: MPI_ANY_TAG • MPI_SEND and MPI_RECV are ‘blocking’: • they wait until job is done

  11. Deadlocks: Non-matching send/receive calls my block the code • Deadlock • Depending • on buffer • Safe • Don’t let processor send a message to itself • In this case use MPI_SENDRECV

  12. Compiling and running code with MPI • Compiling: • Fortran: • mpif77 –o binary code.f • mpif90 –o binary code.f • C: • mpicc –o binary code.c • Running in general, no queueing system: • mpirun –np 4 binary • mpirun -np 4 -nolocal -machinefile mach binary • Running on Gonzales, with queueing system: • bsub -n 4 -W 8:00 prun binary

  13. z y x Domain decomposition • Total computational domain divided into ‘equal size’ blocks • Each processor only deals with its own block • At block boundaries some information exchange necessary • Block division matters: • surface/volume ratio • number of processor bnds.

  14. Stokes equation: Jacobi iterative solver At block boundary: MPI needed In block interior: no MPI needed N N1 N2 M1 M M2 W E W E S S1 S2 M=0.25*(N+S+E+W) M1 =0.25*(N1+S1+W) M2 =0.25*(E) M =M1+M2(using MPI_SENDRECV) Gauss-Seidel solver performs better, but is also slightly more difficult to implement. M1 =M1=M

  15. Tracers/Markers • 2nd order Runge-Kutta scheme: • k1= dt v(t,x(t)) • k2= dt v(t+dt/2, x(t) + k1/2) • x(t+dt) = x(t) + k2 • Procedure: • Calculate x(t+dt/2) • If in procn+1: • procn sends tracer coordinates to procn+1 • procn+1 reports tracer velocity back to procn • Calculate x(t) • If in procn+1: • procn sends tracer coordinates + function values • permantently to procn+1 proc n proc n+1 k1 k2

  16. Performance • For too small jobs communication quickly becomes the bottleneck. • This problem: • R-B convection (Ra=106) • 2-D 64x64 finite elements, 104 steps • 3-D 64x64x64 FE, 100 steps • Calculation on gonzales

  17. Documentation Books: PDF: www.hpc.unimelb.edu.au/software/mpi-docs/mpi-book.pdf

More Related