1 / 33

CS 591 x

CS 591 x. I/O in MPI. I/O in MPI. MPI exists as many different implementations MPI implementations are based on MPI standards MPI standards are developed and maintained by the MPI Forum. I/O in MPI. MPI implementations conform well to MPI standards MPI 1 standards avoid the issue of I/O

amora
Download Presentation

CS 591 x

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 591 x I/O in MPI

  2. I/O in MPI • MPI exists as many different implementations • MPI implementations are based on MPI standards • MPI standards are developed and maintained by the MPI Forum

  3. I/O in MPI • MPI implementations conform well to MPI standards • MPI 1 standards avoid the issue of I/O • This is a problem since it is rare that a useful program does no I/O • How to handle I/O is left to the individual implementations

  4. I/O in MPI • To use C I/O functions – which processes have access to stdin, stdout, stderr? • This is undefined in MPI. • Sometimes all processes have access to stdout. • In some implementations only one process has access to stdout

  5. I/O in MPI • Sometimes stdout is only available to rank 0 in MPI_COMM_WORLD • Same is true of stdin • Some implementations provide no access to stdin

  6. I/O in MPI • So how do you create portable programs? • Make some assumptions • Do some checking

  7. I/O in MPI • Recall in our MPI implementation – • MPI running under PBS puts stdout in a file (*.oxxxxx) • No direct access to stdin

  8. stdin in PBS/Torque • -I -- means interactive • can be on qsub command line or in script • job still starts under the control of scheduler • When job starts PBS/MPI will provide you with an interactive shell • Not terribly obvious

  9. I/O in MPI • Two ways to deal with I/O in MPI • define a specific approach in your program • use specialized parallel I/O system • I/O in parallel systems in a hot topic in high performance computing research

  10. I/O in MPI • Learn or define a single process that can do input (stdin) and output (stdout) • Usually this will be rank 0 in MPI_COMM_WORLD • Write program to have IO process manage all user IO (user input/reports, prompts,etc.)

  11. I/O in MPI • Attribute caching • recall that topologies are attributes associated (attached to communicators) • There are other attributes attached to communicators… • … and you can assign your own • for example, designate a process to handle IO

  12. Attribute Caching • Duplicate the communicator • MPI_Comm_dup(old_comm, &new_comm); • Define a key value (index) for the new attribute • MPI_Keyval_create(MPI_DUP_FN, MPI_NULL_DELETE_FN, &IO_KEY, extra_arg);

  13. Attribute caching • Define a value for the attribute – define the rank of the designated IO process • *io_rank = 0; • Assign the attribute to to communicator • MPI_Attr_put(io_comm, IO_KEY, io_rank); • To retrieve an attribute • MPI_Attr_get(io_comm, IO_KEY, &io_rank_att, &flag);

  14. Attribute Caching • Attribute caching functions are local • you may need to share attribute values with other processes in the comm.

  15. I/O Process • Even though no IO mechanism is defined in MPI… • MPI implementations should have several predefined attributes for MPI_COMM_WORLD • One of these in MPI_IO • Defines which in process in the comm is suppose to be able to do IO

  16. I/O process • If no process can do IO • MPI_IO = MPI_PROC_NULL • If every process in the comm can do IO • MPI_IO = MPI_ANY_SOURCE • If some can and some cannot • process that can MPI_IO = myrank • process that cannot MPI_IO = rank that can

  17. I/O Process • MPI_IO really means which process can do output • still may not have access to stdin

  18. MPI-IO –stdin, stdout,stderr • for stdout – • create an io communicator • identify an IO process in the communicator • or – create an IO process in the communicator • IO process gathers results from compute processes • IO process outputs results

  19. MPI-IO -stdin • Recall that • all processes have access to stdin • -only one process may have access to stdin, or • no processes have access to stdin • How will we know?

  20. Testing stdin in MPI #include <stdio.h> #include "mpi.h" main(int argc, char** argv) { int size, rank, numb; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); printf("enter an integer "); scanf(" %d",&numb); printf("Hello world! I'm %d of %d - numb = %d\n", rank, size,numb); MPI_Finalize(); }

  21. Testing stdin in MPI #include <stdio.h> #include "mpi.h" main(int argc, char** argv) { int size, rank, numb; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &size); MPI_Comm_rank(MPI_COMM_WORLD, &rank); if (rank == 0) { printf("enter an integer "); scanf(" %d",&numb);} MPI_Bcast(&numb, 1, MPI_INT, 0, MPI_COMM_WORLD); printf("Hello world! I'm %d of %d - numb = %d\n", rank, size,numb); MPI_Finalize();}

  22. stdin – what to do? • If all processes have access to stdin – • designate one process as the IO process • have that process read from stdin • distribute input to other processes • If only one process has access to stdin- • identify which process has access to stdin • have IO process read from stdin • distribute data to other processes

  23. stdin – What to do? • If no process has access to stdin – • pass data as command line arguments • read input data from files • create include files with data values • nuisance

  24. File IO in MPI • File IO can be a major bottleneck in the performance of a parallel application • Parallel application can have large (enormous) data sets • We often think of file IO as a side-effect – least in terms of performance – not true in parallel applications • “One half hour of IO for every 2 hours of computation”

  25. MPI File IO types of Applications • Large grids and meshes • storing grid point results for post pressing • distributing data for input • Checkpointing • periodically saving the state of a job • how much work can you afford to lose?

  26. MPI File IO types of applications • Disk caching • data to large for local memories • Data mining • small compute load but a lot of file IO • combing through large datasets • ex. CFD

  27. File IO in MPI • Recall that the use stdin, stdout, stderr assume, generally, a single channel for each of these • This is not true with respect to file IO – sort of. • Gathering to an IO node may not be the most efficient strategy

  28. File IO in MPI • In parallel systems you have multiple processors running concurrently • each may have the ability to do file IO – concurrently • Know your architecture • Network shared disk storage • diskless compute nodes • directories shared across nodes

  29. Directories on Energy • /home/user - is shared and same on all nodes (r/w) • /usr/local/packages/ - is shared and same on all nodes (ro) • all other directories on any node are local to each node • Implications?

  30. IO example • staging data for input • dividing data before input to job • distribute data pieces to local compute node disk drives • each compute node reads local files to get its piece of the data • as opposed to “read and scatter” • uses standard file IO calls

  31. IO Example • Dump and collect • In some cases large results datasets do not need to gathered to an IO node • compute node writes data to file on local disk drive • postprocess program “visits” compute nodes and collects locally stored data • postprocessor store integrated data set.

  32. File IO strategy • IO Process/Scatter-Gather vs. Local IO/distribute-collect • Depends on – • use of input/output • size of dataset • file IO capacity of compute nodes • available disk space • disk IO performance

More Related