1 / 28

Parallel I/O

Parallel I/O . Bronson Messer Deputy Director of Science, OLCF ORNL Theoretical Astrophysics Group Dept. of Physics & Astronomy, University of Tennessee . I’ve done all this computation in parallel, but what’s the result?. How do we get output from a parallel program?

carlyn
Download Presentation

Parallel I/O

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel I/O Bronson Messer Deputy Director of Science, OLCF ORNL Theoretical Astrophysics Group Dept. of Physics & Astronomy, University of Tennessee

  2. I’ve done all this computation in parallel, but what’s the result? • How do we get output from a parallel program? • All the messages from individual processes to STDOUT sure are annoying, huh? • How do we get input into a parallel program? • Read and broadcast works for command line arguments just fine • What if the input is TBs in size?

  3. A few observations • Most large-scale applications where I/O is important are heavy on the “O” and light on the “I” • I/O really matters for performance: it is often the slowest ingredient, sometimes by orders of magnitude

  4. Common Ways of Doing I/O in Parallel Programs • Sequential I/O: • All processes send data to rank 0, and 0 writes it to the file

  5. Pros and Cons of Sequential I/O • Pros: • parallel machine may support I/O from only one process (e.g., no common file system) • Some I/O libraries (e.g. HDF-4, NetCDF) not parallel • resulting single file is handy for ftp, mv • big blocks improve performance • short distance from original, serial code • Cons: • lack of parallelism limits scalability, performance (single node bottleneck)

  6. Another Way • Each process writes to a separate file • Pros: • parallelism, high performance • Cons: • lots of small files to manage • difficult to read back data from different number of processes

  7. What is Parallel I/O? • Multiple processes of a parallel program accessing data (reading or writing) from a common file FILE P(n-1) P0 P1 P2

  8. Why Parallel I/O? • Non-parallel I/O is simple but • Poor performance (single process writes to one file) or • Awkward and not interoperable with other tools (each process writes a separate file) • Parallel I/O • Provides high performance • Can provide a single file that can be used with other tools (such as visualization programs) • This may or may not be true with ‘modern’ viz tools, but it sure is easier to wrangle one file than 144,000.

  9. Why is MPI a Good Setting for Parallel I/O? • Writing is like sending a message and reading is like receiving. • Any parallel I/O system will need a mechanism to • define collective operations (MPI communicators) • define noncontiguous data layout in memory and file (MPI datatypes) • Test completion of nonblocking operations (MPI request objects) • i.e., lots of MPI-like machinery

  10. FILE P(n-1) P0 P1 P2 Using MPI for Simple I/O Each process needs to write a chunk of data to a common file

  11. Writing to a File • Use MPI_File_write or MPI_File_write_at • Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags to MPI_File_open • If the file doesn’t exist previously, the flag MPI_MODE_CREATE must also be passed to MPI_File_open • We can pass multiple flags by using bitwise-or ‘|’ in C, or addition ‘+” in Fortran

  12. Using File Views • Processes write to shared file • MPI_File_set_view assigns regions of the file to separate processes

  13. File Views • Specified by a triplet (displacement, etype, and filetype) passed to MPI_File_set_view • displacement = number of bytes to be skipped from the start of the file • etype = basic unit of data access (can be any basic or derived datatype) • filetype = specifies which portion of the file is visible to the process

  14. File View Example MPI_File thefile; for (i=0; i<BUFSIZE; i++) buf[i] = myrank * BUFSIZE + i; MPI_File_open(MPI_COMM_WORLD, "testfile", MPI_MODE_CREATE | MPI_MODE_WRONLY, MPI_INFO_NULL, &thefile); MPI_File_set_view(thefile, myrank * BUFSIZE, MPI_INT, MPI_INT, "native", MPI_INFO_NULL); MPI_File_write(thefile, buf, BUFSIZE, MPI_INT, MPI_STATUS_IGNORE); MPI_File_close(&thefile);

  15. filetype = two MPI_INTs followed by a gap of four MPI_INTs A Simple File View Example etype = MPI_INT head of file FILE displacement filetype filetype and so on...

  16. Ways to Write to a Shared File • MPI_File_seek • MPI_File_read_at • MPI_File_write_at • MPI_File_read_shared • MPI_File_write_shared • Collective operations like Unix seek combine seek and I/O for thread safety use shared file pointer good when order doesn’t matter

  17. Whew, this is getting complicated… • The bad news: it’s an inherently complicated process and, in “real life”, it is more complicated

  18. E.g. to use Lustre on jaguar Total Output Size File System Hints Transfer Size Blocksize Collective vs Independent Weak vs Strong Scaling This is why IO is complicated….. Number of Files per Ouput Dump Number of Processors Strided or Contiguous Access Striping Chunking IO Library File Size Per Processor

  19. … but, if we have to go through all this… • …we might as well get something out of it! • What if we could have output files that • Were easily portable from machine to machine, • Could be read by a variety of visualization packages, • Could be ‘looked at’ with a simple command, • And could be written fast. • I’d buy that, right? • What’s the cost? • A LOT of typing…

  20. Parallel HDF5 • Can store data structures, arrays, vectors, grids, complex data types, text • Can use basic HDF5 types integers, floats, reals or user defined types such as multi-dimensional arrays, objects and strings • Stores metadata necessary for portability - endian type, size, architecture • Not the only choice for platform-independent parallel I/O, but it is the most common

  21. Data Model • Groups • Arranged in directory hierarchy • root group is always ‘/’ • Datasets • Dataspace • Datatype • Attributes • Bind to Group & Dataset • References • Similar to softlinks • Can also be subsets of data “/” (root) “author”=Jane Doe “date”=10/24/2006 “subgrp” “Dataset0” type,space “Dataset1” type, space “time”=0.2345 “validity”=None “Dataset0.1” type,space “Dataset0.2” type,space This is why all the typing is necessary…

  22. And it performs too...

  23. The pain begins 23 comm = MPI_COMM_WORLD 24 info = MPI_INFO_NULL 26 CALL MPI_INIT(mpierror) 29 ! 30 ! Initialize FORTRAN predefined datatypes 32 CALL h5open_f(error) 34 ! 35 ! Setup file access property list for MPI-IO access. 37 CALL h5pcreate_f(H5P_FILE_ACCESS_F, plist_id, error) 38 CALL h5pset_fapl_mpio_f(plist_id, comm, info, error) 40 ! 41 ! Create the file collectively. 43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 45 ! 46 ! Close the file. 49 CALL h5fclose_f(file_id, error) 51 ! 52 ! Close FORTRAN interface 54 CALL h5close_f(error) 56 CALL MPI_FINALIZE(mpierror) • First, create the parallel file

  24. Pain continued… • Then create a dataset 43 CALL h5fcreate_f(filename, H5F_ACC_TRUNC_F, file_id, error, access_prp = plist_id) 73 CALL h5screate_simple_f(rank, dimsf, filespace, error) 76 ! 77 ! Create the dataset with default properties. 78 ! 79 CALL h5dcreate_f(file_id, “dataset1”, H5T_NATIVE_INTEGER, filespace, dset_id, error) 90 ! 91 ! Close the dataset. 92 CALL h5dclose_f(dset_id, error) 93 ! 94 ! Close the file. 95 CALL h5fclose_f(file_id, error)

  25. More typing… • Then, write to the dataset… 88 ! Create property list for collective dataset write 89 ! 90 CALL h5pcreate_f(H5P_DATASET_XFER_F, plist_id, error) 91 CALL h5pset_dxpl_mpio_f(plist_id, & H5FD_MPIO_COLLECTIVE_F, error) 92 93 ! 94 ! Write the dataset collectively. 95 ! 96 CALL h5dwrite_f(dset_id, H5T_NATIVE_INTEGER, data, & error, & file_space_id = filespace, & mem_space_id = memspace, & xfer_prp = plist_id)

  26. h5dump is a simple way to take a look HDF5 "SDS_row.h5" { GROUP "/" { DATASET ”dataset1" { DATATYPE H5T_STD_I32BE DATASPACE SIMPLE { ( 8, 5 ) / ( 8, 5 ) } DATA { 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 11, 11, 11, 11, 11, 11, 11, 11, 11, 11, 12, 12, 12, 12, 12, 12, 12, 12, 12, 12, 13, 13, 13, 13, 13, 13, 13, 13, 13, 13 } } } }

  27. But, in the end, you a file you can use 11552 processors 1 file

  28. Stuff to remember • Big, multiphysics, parallel applications spew 10’s-100’s of TB over the time of a run • Without parallel I/O, millions(!!!) of files would have to be produced • Parallel I/O has to perform well if you don’t want to spend most of your time writing to disk • Figuring out how to write and optimizing performance takes considerable thought (i.e. probably as much as the original program design • Platform independent formats like HDF5 might not make that process easier, but you get a lot of added value

More Related