1 / 7

Process Manager Update – May 6

Process Manager Update – May 6. The Process Manager component (PM) The Process Manager implementation (MPD2) Issues generated for other components by process management. The Process Manager Component. Added limits to interface definition Example on next page

barney
Download Presentation

Process Manager Update – May 6

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Process Manager Update – May 6 • The Process Manager component (PM) • The Process Manager implementation (MPD2) • Issues generated for other components by process management

  2. The Process Manager Component • Added limits to interface definition • Example on next page • Not implemented yet in terms of parsing and passing on to MPD • Dynamic jobs (MPI_Comm_spawn) • Current interface allows the process manager to be given a list of nodes and a number of processes to start, independently. • Process manager implementation can then use unused nodes to start spawned processs (or not) • MPI_UNIVERSE_SIZE allows MPI job to get hint about how many processes can usefully be spawned

  3. Limits Specification Example <create-process-group            totalprocs='2'>       <process-spec    range='0'    cwd='/home/rbutler/mpd2'  exec='infloop'>         <arg idx='1' value="hello"> </arg>         <arg idx='2' value="from 0"> </arg>         <limit type='cpu' value="2"/>       </process-spec>       <process-spec  range='1'   cwd='/home/rbutler/mpd2'     exec='infloop'>         <arg idx='1' value="hello"> </arg>         <arg idx='2' value="from 1"> </arg>         <env name='foo' value="bar"> </env>         <limit type='cpu' value="3"/>       </process-spec>       <host-spec>            magpie       </host-spec> </create-process-group>

  4. The Process Manager Implementation • Improvements to MPD resulting from production use on Chiba • Mostly in recovering from errors and crashes by applications • Support for limits (those supported in setrlimit) • Improvements in configuring and building along with MPICH2 • Support for MPI_Comm_spawn through PMI interface to MPICH2 application • Interactive debugging via mpigdb

  5. Coercing gdb Into Functioning as a Primitive Parallel Debugger • Key is control of stdin, stdout, stderr by MPD, through mpigdb • Replaces mpiexec or mpirun on interactive command line • Usable through SSS process manager component • Stdout, stderr collected in tree, labeled by rank, and merged for scalability (0-9) (gdb) p x (0-2): $1 = 3.4 (3): $1 = 3.8 (4-9): $1 = 4.1 • Stdin can be broadcast to all or to a subset of processes • z 3 (to send input to process 3 only) • Same for interrupts • Can run under debugger control, interrupt and query hung processes, parallel attach to running parallel job

  6. Issues Generated For Other Components • Job steps • Option 1: QM handles (preferred) • Process manager starts process groups directly • Need public definition of user interface to QM • Option 2: PM implementation handles PBS-like scripts from QM • A bit weird: mpirun in a PBS script is trapped by extra layer (MPISH) because the “real” mpirun is a call to MPD itself • In use on Chiba • QM interface for requesting allocation of some number of nodes but starting up on different number of nodes, particularly for option 1. • QM interface for requesting dynamic rebuilds • Limits in QM interface?

  7. Tale of Two Queue Manager Implementations Same XML syntax; different content QM Interface (XML) qsub1 QM1 totalprocs=1, exec=myscript mycript contains: mpirun –np 64 cpi (mpirun intercepted) PM Different XML syntax Underlying process manager (MPD) totalprocs=64, exec=cpi qsub2 QM2 (mpirun is interactive interface to underlying process manager) QM Interface (XML)

More Related