Distributed parallel processing mpich vmi
1 / 16

Distributed Parallel Processing – MPICH-VMI - PowerPoint PPT Presentation

  • Uploaded on

Distributed Parallel Processing – MPICH-VMI. Avneesh Pant. VMI. What is VMI? Virtual Machine Interface High performance communication middleware Abstracts underlying communication network What is MPICH-VMI

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Distributed Parallel Processing – MPICH-VMI' - ima

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript


  • What is VMI?

    • Virtual Machine Interface

    • High performance communication middleware

    • Abstracts underlying communication network

  • What is MPICH-VMI

    • MPI library based on MPICH 1.2 from Argonne that uses VMI for underlying communication


  • Communication over heterogeneous networks

    • Infiniband, Myrinet, TCP, Shmem supported

    • Underlying networks selected at runtime

    • Enables cross-site jobs over compute grids

  • Optimized point-to-point communication

    • Higher level MPICH protocols (eager and rendezvous) implemented over RDMA Put and Get primitives

    • RDMA emulated on networks without native RDMA support (TCP)

  • Extensive support for profiling

    • Profiling counters collect information about communication pattern of the application

    • Profiling information logged to a databases during MPI_Finalize

    • Profile Guided Optimization (PGO) framework uses profile databases to optimize subsequent executions of the application


  • Hiding pt-to-pt communication latency

    • RMDA get protocol very useful in overlapping communication and computation

    • PGO infrastructure maps MPI processes to nodes to take advantage of heterogeneity of underlying network, effectively hiding latencies

  • Optimized Collectives

    • RDMA based collectives (e.g., MPI_Barrier)

    • Multicast based collectives (e,g., MPI_Bcast experimental implementation using multicast)

    • Topology aware collectives (currently MPI_Bcast, MPI_Reduce, MPI_Allreduce supported)

Mpi on teragrid
MPI on Teragrid

  • MPI flavors available on Teragrid

    • MPICH-GM

    • MPICH-G2

    • MPICH-VMI 1

      • Deprecated! Was part of CTSS v1

    • MPICH-VMI 2

      • Available as part of CTSS v2 and v3

  • All are part of CTSS

    • Which one to use?

    • We are biased!

Mpi on teragrid1
MPI on Teragrid

  • MPI Designed for

    • MPICH-GM -> Single site runs using myrinet

    • MPICH-G2 -> Running across globus grids

    • MPICH-VMI2 -> Scale out seamlessly from single site to across grid

  • Currently need to keep two separate executables

    • Single site using MPICH-GM and Grid job using MPICH-G2

    • MPICH-VMI2 allows you to use the same executable with comparable or better performance

Using mpich vmi

  • Two flavors of MPICH-VMI2 on Teragrid

    • GCC compiled library

    • Intel compiled library

    • Recommended not to mix them together

  • CTSS defines keys for each compiled library

    • GCC: mpich-vmi-2.1.0-1-gcc-3-2

    • Intel: mpich-vmi-2.1.0-1-intel-8.0

Setting the environment
Setting the Environment

  • To use MPICH VMI 2.1

    • $ soft add +mpich-vmi-2.1.0-1-{gcc-3-2 | intel-8.0}

  • To preserve VMI 2.1 environment across sessions, add

    • “+mpich-vmi-2.1.0-1-{gcc-3-2 | intel-8.0}” to the .soft file in your home directory

    • Intel 8.1 is also available at NCSA. Other sites do not have Intel 8.1 completely installed yet.

  • Softenv brings in the compiler wrapper scripts into your environment

    • mpicc and mpiCC for C and C++ codes

    • mpif77 and mpif90 for F77 and F90 codes

    • Some underlying compilers such as GNU compiler suite do not support F90. Use “mpif90 –show” to determine underlying compiler being used.

Compiling with mpich vmi
Compiling with MPICH-VMI

  • The compiler scripts are wrappers that include all MPICH-VMI specific libraries and paths

  • All underlying compiler switches are supported and passed to the compiler

    • eg. mpicc hello.c –o hello

  • The MPICH-VMI library by default is compiled with debug symbols.

Running with mpich vmi
Running with MPICH-VMI

  • mpirun script is available for launching jobs

  • Supports all standard arguments in addition to MPICH-VMI specific arguments

  • mpirun uses ssh, rsh and MPD for launching jobs. Default is MPD.

  • Provides automatic selection/failover

    • If MPD ring not available, falls back to ssh/rsh

  • Supports standard way to run jobs

    • mpirun –np <# of procs> -machinefile <nodelist file> <executable> <arguments>

    • -machinefile argument not needed when running within PBS or LSF environment

  • Can select network to use at runtime by specifying

    • -specfile <network>

    • Supported networks are myrinet, tcp and xsite-myrinet-tcp

  • Default network on Teragrid is Myrinet

    • Recommend to always specify network explicitly using –specfile switch

Running with mpich vmi1
Running with MPICH-VMI

  • MPICH-VMI 2.1 specific arguments related to three broad categories

    • Parameters for runtime tuning

    • Parameters for launching GRID jobs

    • Parameters for controlling profiling of job

  • mpirun –help option to list all tunable parameters

    • All MPICH-VMI 2.1 specific parameters are optional. GRID jobs require some parameters to be set.

  • To run a simple job within a Teragrid cluster

    • mpirun –np 4 /path/to/hello

    • mpirun –np 4 –specfile myrinet /path/to/hello

  • Within PBS $PBS_NODEFILE contains the path to the nodes allocated at runtime

    • mpirun –np <# procs> –machinefile $PBS_NODEFILE /path/to/hello

  • For cross-site jobs, additional arguments required (discussed later)

For detecting reporting errors
For Detecting/Reporting Errors

  • Verbosity switches

    • -v Verbose Level 1. Output VMI startup messages and make MPIRUN verbose.

    • -vv Verbose Level 2. Additionally output any warning messages.

    • -vvv Verbose Level 3. Additionally output any error messages.

    • -vvvv Verbose Level 10. Excess Debug. Useful only for developers of MPICH-VMI and submitting crash dumps.

Running inter site jobs
Running Inter Site Jobs

  • A MPICH-VMI GRID job consists of one or more subjobs

  • A subjob is launched on each site using individual mpirun commands. The specfile selected should be one of the xsite network transports (xsite-mst-tcp or xsite-myrinet-tcp).

  • The higher performance SAN (Infiniband or Myinet) is used for intra site communication. Cross site communication uses TCP automatically

  • In Addition to Intra Site Parameters all Inter Site Runs Must Specify the same Grid Specific Parameters

  • A Grid CRM Must be Available on the Network to Synchronize Subjobs

    • Grid CRM on Teragrid is available at tg-master2.ncsa.uiuc.edu

    • No reason why any other site can’t host their own

    • In fact, you can run one on your own desktop!

  • Grid Specific Parameters

    • -grid-procs Specifies the total number of processes in the job. –np parameter to mpirun still specifies the number of processes in the subjob

    • -grid-crm Specifies the host running the grid CRM to be used for subjob synchronization.

    • -key Alphanumeric string that uniquely identifies the grid job. This should be the same for all subjobs!

Running inter site jobs1
Running Inter Site Jobs

  • Running xsite across SDSC (2 procs) and NCSA (6 procs)

    • @SDSC: mpirun -np 2 grid-procs 8 -key myxsitejob -specfile xsite-myrinet-tcp –grid-crm tg-master2.ncsa.teragrid.org cpi

    • @NCSA: mpirun -np 6 grid-procs 8 -key myxsitejob -specfile xsite-myrinet-tcp –grid-crm tg-master2.ncsa.teragrid.org cpi

Mpich vmi2 support
MPICH-VMI2 Support