1 / 20

Introduction to Parallel Computing and MPI

Learn about parallel computing and how it can be useful, including different parallel paradigms, how to parallelize problems, and an overview of the Message Passing Interface (MPI) standard.

toddhenry
Download Presentation

Introduction to Parallel Computing and MPI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FLASH TutorialMay 13, 2004 Parallel Computing and MPI

  2. What is Parallel Computing ?And why is it useful • Parallel Computing is more than one cpu working together on one problem • It is useful when • Large problem, could take very long • Data size too big to fit in the memory of one processor • When to parallelize • Problem could be subdivided into relatively independent tasks • How much to parallelize • While the speedup in computation relative to single processor is of the order of number of processors

  3. Parallel paradigms • SIMD – Single instruction multiple data • Processors work in lock-step • MIMD – Multiple instruction multiple data • Processors do their own thing with occasional synchronization • Shared Memory • One way communications • Distributed Memory • Message passing • Loosely Coupled • When the process on each cpu is fairly self contained and relatively independent of processes on other cpu’s • Tightly Coupled • When cpu’s need to communicate with each other frequently

  4. How to Parallelize • Divide a problem into a set of mostly independent tasks • Partitioning a problem • Tasks get their own data • Localize a task • They operate on their own data for the most part • Try to make it self contained • Occasionally • Data may be needed from other tasks • Inter-process communication • Synchronization may be required between tasks • Global operation • Map tasks to different processors • One processor may get more than one task • Task distribution should be well balanced

  5. New Code Components • Initialization • Query parallel state • Identify process • Identify number of processes • Exchange data between processes • Local, Global • Synchronization • Barriers, Blocking Communication, Locks • Finalization

  6. MPI • Message Passing Interface, standard for distributed memory model of parallelism • MPI-2 will support one-way communication, commonly associated with shared memory operations • Works with communicators; a collection of processors • MPI_COMM_WORLD default • Has support for lowest level communication operations and composite operations • Has blocking and non-blocking operations

  7. Communicators COMM1 COMM2

  8. Low level Operations in MPI • MPI_Init • MPI_Comm_size • Find number of processors • MPI_Comm_rank • Find my processor number • MPI_Send/Recv • Communicate with other processors one at a time • MPI_Bcast • Global data transmission • MPI_Barrier • Synchronization • MPI_Finalize

  9. Advanced Constructs in MPI • Composite Operations • Gather/Scatter • Allreduce • Alltoall • Cartesian grid operations • Shift • Communicators • Creating subgroups of processors to operate on • User-defined Datatypes • I/O • Parallel file operations

  10. 0 1 2 0 1 All to All 2 3 0 1 Point to Point 2 3 Collective 0 1 2 3 0 1 2 3 One to All Broadcast Shift Communication Patterns

  11. Communication Overheads • Latency vs. Bandwidth • Blocking vs. Non-Blocking • Overlap • Buffering and copy • Scale of communication • Nearest neighbor • Short range • Long range • Volume of data • Resource contention for links • Efficiency • Hardware, software, communication method

  12. Parallelism in FLASH • Short range communications • Nearest neighbor • Long range communications • Regridding • Other global operations • All-reduce operations on physical quantities • Specific to solvers • multi-pole method • FFT based solvers

  13. Domain Decomposition P1 P0 P2 P3

  14. Border Cells / Ghost Points • When splitting up solnData, need data from other processors. • Need a layer of cells from each processor • Need to update each time step

  15. Border/Ghost Cells Short Range communication

  16. MPI_Cart_create Create topology MPE_Decomp1d Domain decomp on topology MPI_Cart_shift Who’s on the left/right? MPI_SendRecv Ghost cells left MPI_SendRecv Ghost cells right MPI_Comm_rank MPI_Comm_size Manually decompose grid over processors Calculate left/right MPI_Send/MPI_Recv Carefully to avoid deadlocks Two MPI Methods for doing it

  17. Adaptive Grid Issues • Discretization not uniform • Simple left-right guard cell fills inadequate • Adjacent grid points may not be mapped to the nearest neighbors in processors topology • Redistribution of work necessary

  18. Regridding • Change in number of cells/blocks • Some processors get more work than others • Load imbalance • Redistribute data to even out work on all processors • Long range communications • Large quantities of data moved

  19. Regridding

  20. Other parallel operations in FLASH • Global max/sum etc (Allreduce) • Physical quantities • In solvers • Performance monitoring • Alltoall • FFT based solver on UG • User defined datatypes and file operations • Parallel I/O

More Related