1 / 16

Distributed Parallel Processing – MPICH-VMI

Distributed Parallel Processing – MPICH-VMI. Avneesh Pant. VMI. What is VMI? Virtual Machine Interface High performance communication middleware Abstracts underlying communication network What is MPICH-VMI

ima
Download Presentation

Distributed Parallel Processing – MPICH-VMI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Distributed Parallel Processing – MPICH-VMI Avneesh Pant

  2. VMI • What is VMI? • Virtual Machine Interface • High performance communication middleware • Abstracts underlying communication network • What is MPICH-VMI • MPI library based on MPICH 1.2 from Argonne that uses VMI for underlying communication

  3. Features • Communication over heterogeneous networks • Infiniband, Myrinet, TCP, Shmem supported • Underlying networks selected at runtime • Enables cross-site jobs over compute grids • Optimized point-to-point communication • Higher level MPICH protocols (eager and rendezvous) implemented over RDMA Put and Get primitives • RDMA emulated on networks without native RDMA support (TCP) • Extensive support for profiling • Profiling counters collect information about communication pattern of the application • Profiling information logged to a databases during MPI_Finalize • Profile Guided Optimization (PGO) framework uses profile databases to optimize subsequent executions of the application

  4. Features • Hiding pt-to-pt communication latency • RMDA get protocol very useful in overlapping communication and computation • PGO infrastructure maps MPI processes to nodes to take advantage of heterogeneity of underlying network, effectively hiding latencies • Optimized Collectives • RDMA based collectives (e.g., MPI_Barrier) • Multicast based collectives (e,g., MPI_Bcast experimental implementation using multicast) • Topology aware collectives (currently MPI_Bcast, MPI_Reduce, MPI_Allreduce supported)

  5. MPI on Teragrid • MPI flavors available on Teragrid • MPICH-GM • MPICH-G2 • MPICH-VMI 1 • Deprecated! Was part of CTSS v1 • MPICH-VMI 2 • Available as part of CTSS v2 and v3 • All are part of CTSS • Which one to use? • We are biased!

  6. MPI on Teragrid • MPI Designed for • MPICH-GM -> Single site runs using myrinet • MPICH-G2 -> Running across globus grids • MPICH-VMI2 -> Scale out seamlessly from single site to across grid • Currently need to keep two separate executables • Single site using MPICH-GM and Grid job using MPICH-G2 • MPICH-VMI2 allows you to use the same executable with comparable or better performance

  7. MPI on Teragrid

  8. Using MPICH-VMI • Two flavors of MPICH-VMI2 on Teragrid • GCC compiled library • Intel compiled library • Recommended not to mix them together • CTSS defines keys for each compiled library • GCC: mpich-vmi-2.1.0-1-gcc-3-2 • Intel: mpich-vmi-2.1.0-1-intel-8.0

  9. Setting the Environment • To use MPICH VMI 2.1 • $ soft add +mpich-vmi-2.1.0-1-{gcc-3-2 | intel-8.0} • To preserve VMI 2.1 environment across sessions, add • “+mpich-vmi-2.1.0-1-{gcc-3-2 | intel-8.0}” to the .soft file in your home directory • Intel 8.1 is also available at NCSA. Other sites do not have Intel 8.1 completely installed yet. • Softenv brings in the compiler wrapper scripts into your environment • mpicc and mpiCC for C and C++ codes • mpif77 and mpif90 for F77 and F90 codes • Some underlying compilers such as GNU compiler suite do not support F90. Use “mpif90 –show” to determine underlying compiler being used.

  10. Compiling with MPICH-VMI • The compiler scripts are wrappers that include all MPICH-VMI specific libraries and paths • All underlying compiler switches are supported and passed to the compiler • eg. mpicc hello.c –o hello • The MPICH-VMI library by default is compiled with debug symbols.

  11. Running with MPICH-VMI • mpirun script is available for launching jobs • Supports all standard arguments in addition to MPICH-VMI specific arguments • mpirun uses ssh, rsh and MPD for launching jobs. Default is MPD. • Provides automatic selection/failover • If MPD ring not available, falls back to ssh/rsh • Supports standard way to run jobs • mpirun –np <# of procs> -machinefile <nodelist file> <executable> <arguments> • -machinefile argument not needed when running within PBS or LSF environment • Can select network to use at runtime by specifying • -specfile <network> • Supported networks are myrinet, tcp and xsite-myrinet-tcp • Default network on Teragrid is Myrinet • Recommend to always specify network explicitly using –specfile switch

  12. Running with MPICH-VMI • MPICH-VMI 2.1 specific arguments related to three broad categories • Parameters for runtime tuning • Parameters for launching GRID jobs • Parameters for controlling profiling of job • mpirun –help option to list all tunable parameters • All MPICH-VMI 2.1 specific parameters are optional. GRID jobs require some parameters to be set. • To run a simple job within a Teragrid cluster • mpirun –np 4 /path/to/hello • mpirun –np 4 –specfile myrinet /path/to/hello • Within PBS $PBS_NODEFILE contains the path to the nodes allocated at runtime • mpirun –np <# procs> –machinefile $PBS_NODEFILE /path/to/hello • For cross-site jobs, additional arguments required (discussed later)

  13. For Detecting/Reporting Errors • Verbosity switches • -v Verbose Level 1. Output VMI startup messages and make MPIRUN verbose. • -vv Verbose Level 2. Additionally output any warning messages. • -vvv Verbose Level 3. Additionally output any error messages. • -vvvv Verbose Level 10. Excess Debug. Useful only for developers of MPICH-VMI and submitting crash dumps.

  14. Running Inter Site Jobs • A MPICH-VMI GRID job consists of one or more subjobs • A subjob is launched on each site using individual mpirun commands. The specfile selected should be one of the xsite network transports (xsite-mst-tcp or xsite-myrinet-tcp). • The higher performance SAN (Infiniband or Myinet) is used for intra site communication. Cross site communication uses TCP automatically • In Addition to Intra Site Parameters all Inter Site Runs Must Specify the same Grid Specific Parameters • A Grid CRM Must be Available on the Network to Synchronize Subjobs • Grid CRM on Teragrid is available at tg-master2.ncsa.uiuc.edu • No reason why any other site can’t host their own • In fact, you can run one on your own desktop! • Grid Specific Parameters • -grid-procs Specifies the total number of processes in the job. –np parameter to mpirun still specifies the number of processes in the subjob • -grid-crm Specifies the host running the grid CRM to be used for subjob synchronization. • -key Alphanumeric string that uniquely identifies the grid job. This should be the same for all subjobs!

  15. Running Inter Site Jobs • Running xsite across SDSC (2 procs) and NCSA (6 procs) • @SDSC: mpirun -np 2 grid-procs 8 -key myxsitejob -specfile xsite-myrinet-tcp –grid-crm tg-master2.ncsa.teragrid.org cpi • @NCSA: mpirun -np 6 grid-procs 8 -key myxsitejob -specfile xsite-myrinet-tcp –grid-crm tg-master2.ncsa.teragrid.org cpi

  16. MPICH-VMI2 Support • Support • help@teragrid.org • Mailing lists: http://vmi.ncsa.uiuc.edu/mailingLists.php • Announcements: vmi-announce@yams.ncsa.uiuc.edu • Users: vmi-user@yams.ncsa.uiuc.edu • Developers: vmi-devel@yams.ncsa.uiuc.edu

More Related