1 / 13

What is MPI ?

What is MPI ?. A standard defined for “Message Passing Interface” between parallel processors (CPU’s) Communications interface to Fortran, C or C++ (maybe others) Definitions apply across different platforms (can mix Unix, Mac, etc.)

bian
Download Presentation

What is MPI ?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. What is MPI ? • A standard defined for “Message Passing Interface” between parallel processors (CPU’s) • Communications interface to Fortran, C or C++ (maybe others) • Definitions apply across different platforms (can mix Unix, Mac, etc.) • Parallelization of code is explicit - recognized and defined by users • Memory can be • Shared between CPU’s • Distributed among CPU’s OR • A hybrid of these • Number of CPU’s allowed is not pre-defined, but is fixed in any one application • The required number of CPU’s is defined by the user at job startup and does not undergo runtime optimization. B. Meadows, Dalitz Mixing, Oct 24th, 2007

  2. How Efficient is MPI ? • The best you can do is speed up a job by a factor equal to the number of CPU’s involved. • Factors limiting this • Poor synchronization between CPU’s due to unbalanced loads • Sections of code that cannot be vectorized • Signalling delays. B. Meadows, Dalitz Mixing, Oct 24th, 2007

  3. Ways to Implement MPI in ML Fitting Two main alternatives: • Vectorize FCN - evaluates f(x) = -2S ln W • Vectorize MINUIT ? • Alternative A has been used in previous Babar analyses • E.g. Mixing analysis of D0 K+p- • Alternative B is reported here (done by DYAEB and tested by BTM) • An advantage of B over A is that the vectorization is implemented outside a user’s code. • Vectorizing FCN may not be efficient if an integral is computed on each call Unless the integral evaluation is also vectorized. B. Meadows, Dalitz Mixing, Oct 24th, 2007

  4. Vectorize FCN • Log-likelihood always includes a sum: where n = number of events or bins. • Vectorize computation of sum - 2 steps (“Scatter-Gather”): • Scatter: Divide up events (or bins) among the CPU’s. Each CPU computes • Gather: Re-combine the N CPU’s: B. Meadows, Dalitz Mixing, Oct 24th, 2007

  5. Vectorize FCN • Computation of the integral: also needs to be vectorized • Since it is also usually a sum (over bins) this can be done in a similar way. B. Meadows, Dalitz Mixing, Oct 24th, 2007

  6. Vectorize MINUIT • Several algorithms in MINUIT: • MIGRAD (Variable metric algorithm) Finds local minimum and error matrix at that point • SIMPLEX (Nelder-Mead method) Linear programming method • SEEK (MC method) Random search – virtually obsolete • Most often used is MIGRAD – so focus on that • Is easily vectorized, but results may not be at highest efficiency B. Meadows, Dalitz Mixing, Oct 24th, 2007

  7. Vectorize MIGRAD • WARNING: This is not very efficient when number of parameters is comparable to the number of CPU’s. Gain ~ (2*NPAR + 4) / (2*NPAR/NCPU + 4) B. Meadows, Dalitz Mixing, Oct 24th, 2007

  8. One iteration in MIGRAD • Compute function and gradient at current position • Use current curvature metric to compute step: • Take (large) step: • Compute function and gradient there then (cubic) interpolate back to local minimum (may need to iterate) • If satisfactory, improve Curvature metric B. Meadows, Dalitz Mixing, Oct 24th, 2007

  9. One iteration in MIGRAD • Most of the time is spent in computing the gradient: • Numerical evaluation of gradient requires 2 FCN calls per parameter: • Vectorize this computation in two steps (“Scatter-Gather”): • Scatter: Divide up parameters (xi) among the CPU’s. Each CPU computes • Gather: Re-combine the N CPU’s. B. Meadows, Dalitz Mixing, Oct 24th, 2007

  10. Running MPI CPU 0 CPU 0 CPU 0 CPU 0 CPU 1 CPU 1 Wait CPU 2 CPU 2 CPU… CPU… “Start” “Scatter” “Gather” B. Meadows, Dalitz Mixing, Oct 24th, 2007

  11. Running MPI • Run the program with Mpirun <job> -np N will submit identical jobs to N CPU’s (You can also specify IP addresses for these) • Each CPU must have all the data it needs to compute f(x) • So you need to structure each job to be able to run the parts you wish it to do: • Any set up (read in events, etc.) • The parts that are vectorized • BUT only the parts you want: • Make it wait for a signal when vectorized part is done. B. Meadows, Dalitz Mixing, Oct 24th, 2007

  12. Initialization of MPI Program FIT_Kpipi C C- Maximum likelihood fit of D -> Kpipi Dalitz plot. C Implicit none Save external fcn include 'mpif.h' MPIerr= 0 MPIrank= 0 MPIprocs= 1 MPIflag= 1 call MPI_INIT(MPIerr) ! Initialize MPI call MPI_COMM_RANK(MPI_COMM_WORLD, MPIrank, MPIerr) ! Get number of CPU’s call MPI_COMM_SIZE(MPI_COMM_WORLD, MPIprocs, MPIerr) ! Which one am I ? … call MINUIT, etc. call MPI_FINALIZE(MPIerr) B. Meadows, Dalitz Mixing, Oct 24th, 2007

  13. Use of Scatter-Gather Mechanismin MNDERI (Fortran) C Distribute the parameters from proc 0 to everyone 33 call MPI_BCAST(X, NPAR+1, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr) … C Use scatter-gather mechanism to compute subset of derivatives in each process: nperproc= (NPAR-1)/MPIprocs + 1 iproc1= 1+nperproc*MPIrank iproc2= MIN(NPAR,iproc1+nperproc-1) call MPI_SCATTER(GRD, nperproc, MPI_DOUBLE_PRECISION, A GRD(iproc1), nperproc, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr) C C Loop over variable parameters DO 60 i=iproc1,iproc2 … compute G(I) End Do C C Wait until everyone is done: call MPI_GATHER(GRD(iproc1), nperproc, MPI_DOUBLE_PRECISION, A GRD, nperproc, MPI_DOUBLE_PRECISION, 0, MPI_COMM_WORLD, MPIerr) C everyone but proc 0 goes back to await the next set of parameters If ( MPIrank.ne.0) GO TO 33 C … Continue computation (CPU 0 only) B. Meadows, Dalitz Mixing, Oct 24th, 2007

More Related