csci 4320 6360 parallel programming computing tues fri 12 1 30 p m mpi file i o
Download
Skip this Video
Download Presentation
CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:30 p.m. MPI File I/O

Loading in 2 Seconds...

play fullscreen
1 / 40

CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:30 p.m. MPI File I/O - PowerPoint PPT Presentation


  • 109 Views
  • Uploaded on

Prof. Chris Carothers Computer Science Department MRC 309a [email protected] www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2013 Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' CSCI-4320/6360: Parallel Programming & Computing Tues./Fri. 12-1:30 p.m. MPI File I/O' - hyatt-black


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
csci 4320 6360 parallel programming computing tues fri 12 1 30 p m mpi file i o
PPC 2013 - MPI Parallel File I/O

Prof. Chris Carothers

Computer Science Department

MRC 309a

[email protected]

www.cs.rpi.edu/~chrisc/COURSES/PARALLEL/SPRING-2013

Adapted from: people.cs.uchicago.edu/~asiegel/courses/cspp51085/.../mpi-io.ppt

CSCI-4320/6360: Parallel Programming & ComputingTues./Fri. 12-1:30 p.m.MPI File I/O
common ways of doing i o in parallel programs
PPC 2013 - MPI Parallel File I/OCommon Ways of Doing I/O in Parallel Programs
  • Sequential I/O:
    • All processes send data to rank 0, and 0 writes it to the file
pros and cons of sequential i o
PPC 2013 - MPI Parallel File I/OPros and Cons of Sequential I/O
  • Pros:
    • parallel machine may support I/O from only one process (e.g., no common file system)
    • Some I/O libraries (e.g. HDF-4, NetCDF, PMPIO) not parallel
    • resulting single file is handy for ftp, mv
    • big blocks improve performance
    • short distance from original, serial code
  • Cons:
    • lack of parallelism limits scalability, performance (single node bottleneck)
another way
PPC 2013 - MPI Parallel File I/OAnother Way
  • Each process writes to a separate file
  • Pros:
    • parallelism, high performance
  • Cons:
    • lots of small files to manage
    • LOTS OF METADATA – stress parallel filesystem
    • difficult to read back data from different number of processes
what is parallel i o
PPC 2013 - MPI Parallel File I/OWhat is Parallel I/O?
  • Multiple processes of a parallel program accessing data (reading or writing) from a common file

FILE

P(n-1)

P0

P1

P2

why parallel i o
PPC 2013 - MPI Parallel File I/OWhy Parallel I/O?
  • Non-parallel I/O is simple but
    • Poor performance (single process writes to one file) or
    • Awkward and not interoperable with other tools (each process writes a separate file)
  • Parallel I/O
    • Provides high performance
    • Can provide a single file that can be used with other tools (such as visualization programs)
why is mpi a good setting for parallel i o
PPC 2013 - MPI Parallel File I/OWhy is MPI a Good Setting for Parallel I/O?
  • Writing is like sending a message and reading is like receiving.
  • Any parallel I/O system will need a mechanism to
    • define collective operations (MPI communicators)
    • define noncontiguous data layout in memory and file (MPI datatypes)
    • Test completion of nonblocking operations (MPI request objects)
  • i.e., lots of MPI-like machinery
mpi io background
PPC 2013 - MPI Parallel File I/OMPI-IO Background
  • Marc Snir et al (IBM Watson) paper exploring MPI as context for parallel I/O (1994)
  • MPI-IO email discussion group led by J.-P. Prost (IBM) and Bill Nitzberg (NASA), 1994
  • MPI-IO group joins MPI Forum in June 1996
  • MPI-2 standard released in July 1997
  • MPI-IO is Chapter 9 of MPI-2
using mpi for simple i o
PPC 2013 - MPI Parallel File I/O

FILE

P(n-1)

P0

P1

P2

Using MPI for Simple I/O

Each process needs to read a chunk of data from a common file

using individual file pointers
PPC 2013 - MPI Parallel File I/OUsing Individual File Pointers

#include<stdio.h>

#include<stdlib.h>

#include "mpi.h"

#define FILESIZE 1000

int main(int argc, char **argv){

int rank, nprocs;

MPI_File fh;

MPI_Status status;

int bufsize, nints;

int buf[FILESIZE];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

bufsize = FILESIZE/nprocs;

nints = bufsize/sizeof(int);

MPI_File_open(MPI_COMM_WORLD, "datafile",

MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);

MPI_File_seek(fh, rank * bufsize, MPI_SEEK_SET);

MPI_File_read(fh, buf, nints, MPI_INT, &status);

MPI_File_close(&fh);

}

using explicit offsets
PPC 2013 - MPI Parallel File I/OUsing Explicit Offsets

#include<stdio.h>

#include<stdlib.h>

#include "mpi.h"

#define FILESIZE 1000

int main(int argc, char **argv){

int rank, nprocs;

MPI_File fh;

MPI_Status status;

int bufsize, nints;

int buf[FILESIZE];

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

bufsize = FILESIZE/nprocs;

nints = bufsize/sizeof(int);

MPI_File_open(MPI_COMM_WORLD, "datafile", MPI_MODE_RDONLY, MPI_INFO_NULL, &fh);

MPI_File_read_at(fh, rank*bufsize, buf, nints, MPI_INT, &status);

MPI_File_close(&fh);

}

function details
PPC 2013 - MPI Parallel File I/OFunction Details

MPI_File_open(MPI_Comm comm, char *file, int mode, MPI_Info info, MPI_File *fh)

(note: mode = MPI_MODE_RDONLY, MPI_MODE_RDWR, MPI_MODE_WRONLY,

MPI_MODE_CREATE, MPI_MODE_EXCL,

MPI_MODE_DELETE_ON_CLOSE, MPI_MODE_UNIQUE_OPEN,

MPI_MODE_SEQUENTIAL, MPI_MODE_APPEND)

MPI_File_close(MPI_File *fh)

MPI_File_read(MPI_File fh, void *buf, int count, MPI_Datatype type, MPI_Status *status)

MPI_File_read_at(MPI_File fh, int offset, void *buf, int count,

MPI_Datatype type, MPI_Status *status)

MPI_File_seek(MPI_File fh, MPI_Offset offset, in whence);

(note: whence = MPI_SEEK_SET, MPI_SEEK_CUR, or MPI_SEEK_END)

MPI_File_write(MPI_File fh, void *buf, int count, MPI_Datatype datatype, MPI_Status *status)

MPI_File_write_at( …same as read_at … );

(Note: Many other functions to get/set properties (see Gropp et al))

writing to a file
PPC 2013 - MPI Parallel File I/OWriting to a File
  • Use MPI_File_write or MPI_File_write_at
  • Use MPI_MODE_WRONLY or MPI_MODE_RDWR as the flags to MPI_File_open
  • If the file doesn’t exist previously, the flag MPI_MODE_CREATE must also be passed to MPI_File_open
  • We can pass multiple flags by using bitwise-or ‘|’ in C, or addition ‘+” in Fortran
mpi datatype interlude
PPC 2013 - MPI Parallel File I/OMPI Datatype Interlude
  • Datatypes in MPI
    • Elementary: MPI_INT, MPI_DOUBLE, etc
      • everything we’ve used to this point
  • Contiguous
    • Next easiest: sequences of elementary types
  • Vector
    • Sequences separated by a constant “stride”
mpi datatypes cont
PPC 2013 - MPI Parallel File I/OMPI Datatypes, cont
  • Indexed: more general
    • does not assume a constant stride
  • Struct
    • General mixed types (like C structs)
creating simple datatypes
PPC 2013 - MPI Parallel File I/OCreating simple datatypes
  • Let’s just look at the simplest types: contiguous and vector datatypes.
  • Contiguous example
    • Let’s create a new datatype which is two ints side by side. The calling sequence is

MPI_Type_contiguous(int count, MPI_Datatype oldtype, MPI_Datatype *newtype);

MPI_Datatype newtype;

MPI_Type_contiguous(2, MPI_INT, &newtype);

MPI_Type_commit(newtype); /* required */

using file views
PPC 2013 - MPI Parallel File I/OUsing File Views
  • Processes write to shared file
  • MPI_File_set_view assigns regions of the file to separate processes
file views
PPC 2013 - MPI Parallel File I/OFile Views
  • Specified by a triplet (displacement, etype, and filetype) passed to MPI_File_set_view
  • displacement = number of bytes to be skipped from the start of the file
  • etype = basic unit of data access (can be any basic or derived datatype)
  • filetype = specifies which portion of the file is visible to the process
  • This is a collective operation and so all processors/ranks must use the same data rep, etypes in the group determined when the file was open..
file interoperability
PPC 2013 - MPI Parallel File I/OFile Interoperability
  • Users can optionally create files with a portable binary data representation
  • “datarep” parameter to MPI_File_set_view
  • native -default, same as in memory, not portable
  • internal - impl. defined representation providing an impl. defined level of portability
  • external32 - a specific representation defined in MPI, (basically 32-bit big-endian IEEE format), portable across machines and MPI implementations
file view example
PPC 2013 - MPI Parallel File I/OFile View Example

MPI_File thefile;

for (i=0; i<BUFSIZE; i++)

buf[i] = myrank * BUFSIZE + i;

MPI_File_open(MPI_COMM_WORLD, "testfile",

MPI_MODE_CREATE | MPI_MODE_WRONLY,

MPI_INFO_NULL, &thefile);

MPI_File_set_view(thefile, myrank * BUFSIZE,

MPI_INT, MPI_INT, "native",

MPI_INFO_NULL);

MPI_File_write(thefile, buf, BUFSIZE, MPI_INT,

MPI_STATUS_IGNORE);

MPI_File_close(&thefile);

ways to write to a shared file
PPC 2013 - MPI Parallel File I/OWays to Write to a Shared File

like Unix seek

  • MPI_File_seek
  • MPI_File_read_at
  • MPI_File_write_at
  • MPI_File_read_shared
  • MPI_File_write_shared
  • Collective operations

combine seek and I/O

for thread safety

use shared file pointer

good when order

doesn’t matter

collective i o in mpi
PPC 2013 - MPI Parallel File I/OCollective I/O in MPI
  • A critical optimization in parallel I/O
  • Allows communication of “big picture” to file system
  • Framework for 2-phase I/O, in which communication precedes I/O (can use MPI machinery)
  • Basic idea: build large blocks, so that reads/writes in I/O system will be large

Small individual

requests

Large collective

access

collective i o
PPC 2013 - MPI Parallel File I/OCollective I/O
  • MPI_File_read_all, MPI_File_read_at_all, etc
  • _all indicates that all processes in the group specified by the communicator passed to MPI_File_open will call this function
  • Each process specifies only its own access information -- the argument list is the same as for the non-collective functions
collective i o1
PPC 2013 - MPI Parallel File I/OCollective I/O
  • By calling the collective I/O functions, the user allows an implementation to optimize the request based on the combined request of all processes
  • The implementation can merge the requests of different processes and service the merged request efficiently
  • Particularly effective when the accesses of different processes are noncontiguous and interleaved
collective non contiguous mpi io examples
PPC 2013 - MPI Parallel File I/OCollective non-contiguousMPI-IO examples

#define “mpi.h”

#define FILESIZE 1048576

#define INTS_PER_BLK 16

int main(int argc, char **argv){

int *buf, rank, nprocs, nints, bufsize;

MPI_File fh;

MPI_Datatype filetype;

MPI_Init(&argc, &argv);

MPI_Comm_rank(MPI_COMM_WORLD, &rank);

MPI_Comm_size(MPI_COMM_WORLD, &nprocs);

bufsize = FILESIZE/nprocs;

buf = (int *) malloc(bufsize);

nints = bufsize/sizeof(int);

MPI_File_open(MPI_COMM_WORLD, “filename”, MPI_MODE_RD_ONLY, MPI_INFO_NULL, &fh);

MPI_Type_vector(nints/INTS_PER_BLK, INTS_PER_BLK, INTS_PER_BLK*nprocs, MPI_INT, &filetype);

MPI_Type_commit(&filetype);

MPI_File_set_view(fh, INTS_PER_BLK*sizeof(int)*rank, MPI_INT, filetype, “native”, MPI_INFO_NULL);

MPI_File_read_all(fh, buf, nints, MPI_INT, MPI_STATUS_IGNORE);

MPI_Type_free(&filetype);

free(buf)

MPI_Finalize();

return(0);

}

more on mpi read all
PPC 2013 - MPI Parallel File I/OMore on MPI_Read_all
  • Note that the _all version has the same argument list
  • Difference is that all processes involved in MPI_Open must call this the read
  • Contrast with the non-all version where any subset may or may not call it
  • Allows for many optimizations
split collective i o
PPC 2013 - MPI Parallel File I/OSplit Collective I/O
  • A restricted form of nonblocking collective I/O
  • Only one active nonblocking collective operation allowed at a time on a file handle
  • Therefore, no request object necessary

MPI_File_write_all_begin(fh, buf, count, datatype);

// available on Blue Gene/L, but may not improve

// performance

for (i=0; i<1000; i++) {

/* perform computation */

}

MPI_File_write_all_end(fh, buf, &status);

passing hints to the implementation
PPC 2013 - MPI Parallel File I/OPassing Hints to the Implementation

MPI_Info info;

MPI_Info_create(&info);

/* no. of I/O devices to be used for file striping */

MPI_Info_set(info, "striping_factor", "4");

/* the striping unit in bytes */

MPI_Info_set(info, "striping_unit", "65536");

MPI_File_open(MPI_COMM_WORLD, "/pfs/datafile",

MPI_MODE_CREATE | MPI_MODE_RDWR, info, &fh);

MPI_Info_free(&info);

examples of hints used in romio
PPC 2013 - MPI Parallel File I/OExamples of Hints (used in ROMIO)
  • striping_unit
  • striping_factor
  • cb_buffer_size
  • cb_nodes
  • ind_rd_buffer_size
  • ind_wr_buffer_size
  • start_iodevice
  • pfs_svr_buf
  • direct_read
  • direct_write

MPI-2 predefined hints

New Algorithm Parameters

Platform-specific hints

i o consistency semantics
PPC 2013 - MPI Parallel File I/OI/O Consistency Semantics
  • The consistency semantics specify the results when multiple processes access a common file and one or more processes write to the file
  • MPI guarantees stronger consistency semantics if the communicator used to open the file accurately specifies all the processes that are accessing the file, and weaker semantics if not
  • The user can take steps to ensure consistency when MPI does not automatically do so
example 1
PPC 2013 - MPI Parallel File I/O

Process 0

Process 1

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=0,cnt=100)

MPI_File_read_at(off=0,cnt=100)

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=100,cnt=100)

MPI_File_read_at(off=100,cnt=100)

Example 1
  • File opened with MPI_COMM_WORLD. Each process writes to a separate region of the file and reads back only what it wrote.
  • MPI guarantees that the data will be read correctly
example 2
PPC 2013 - MPI Parallel File I/OExample 2
  • Same as example 1, except that each process wants to read what the other process wrote (overlapping accesses)
  • In this case, MPI does not guarantee that the data will automatically be read correctly

Process 0

Process 1

/* incorrect program */

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=0,cnt=100)

MPI_Barrier

MPI_File_read_at(off=100,cnt=100)

/* incorrect program */

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=100,cnt=100)

MPI_Barrier

MPI_File_read_at(off=0,cnt=100)

  • In the above program, the read on each process is not guaranteed to get the data written by the other process!
example 2 contd
PPC 2013 - MPI Parallel File I/OExample 2 contd.
  • The user must take extra steps to ensure correctness
  • There are three choices:
    • set atomicity to true
    • close the file and reopen it
    • ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process/MPI rank
      • Can hurt performance….
example 2 option 1 set atomicity to true
PPC 2013 - MPI Parallel File I/O

Process 0

Process 1

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_set_atomicity(fh1,1)

MPI_File_write_at(off=0,cnt=100)

MPI_Barrier

MPI_File_read_at(off=100,cnt=100)

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_set_atomicity(fh2,1)

MPI_File_write_at(off=100,cnt=100)

MPI_Barrier

MPI_File_read_at(off=0,cnt=100)

Example 2, Option 1Set atomicity to true
example 2 option 2 close and reopen file
PPC 2013 - MPI Parallel File I/OExample 2, Option 2Close and reopen file

Process 0

Process 1

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=0,cnt=100)

MPI_File_close

MPI_Barrier

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_read_at(off=100,cnt=100)

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=100,cnt=100)

MPI_File_close

MPI_Barrier

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_read_at(off=0,cnt=100)

example 2 option 3
PPC 2013 - MPI Parallel File I/OExample 2, Option 3
  • Ensure that no write sequence on any process is concurrent with any sequence (read or write) on another process
  • a sequence is a set of operations between any pair of open, close, or file_sync functions
  • a write sequence is a sequence in which any of the functions is a write operation
example 2 option 31
PPC 2013 - MPI Parallel File I/O

Process 0

Process 1

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_write_at(off=0,cnt=100)

MPI_File_sync

MPI_Barrier

MPI_File_sync /*collective*/

MPI_File_sync /*collective*/

MPI_Barrier

MPI_File_sync

MPI_File_read_at(off=100,cnt=100)

MPI_File_close

MPI_File_open(MPI_COMM_WORLD,…)

MPI_File_sync /*collective*/

MPI_Barrier

MPI_File_sync

MPI_File_write_at(off=100,cnt=100)

MPI_File_sync

MPI_Barrier

MPI_File_sync /*collective*/

MPI_File_read_at(off=0,cnt=100)

MPI_File_close

Example 2, Option 3
general guidelines for achieving high i o performance
PPC 2013 - MPI Parallel File I/OGeneral Guidelines for Achieving High I/O Performance
  • Buy sufficient I/O hardware for the machine
  • Use fast file systems, not NFS-mounted home directories
  • Do not perform I/O from one process only
  • Make large requests wherever possible
  • For noncontiguous requests, use derived datatypes and a single collective I/O call
optimizations
PPC 2013 - MPI Parallel File I/O Optimizations
  • Given complete access information, an implementation can perform optimizations such as:
    • Data Sieving: Read large chunks and extract what is really needed
    • Collective I/O: Merge requests of different processes into larger requests
    • Improved prefetching and caching
summary
PPC 2013 - MPI Parallel File I/OSummary
  • MPI-IO has many features that can help users achieve high performance
  • The most important of these features are the ability to specify noncontiguous accesses, the collective I/O functions, and the ability to pass hints to the implementation
  • Users must use the above features!
  • In particular, when accesses are noncontiguous, users must create derived datatypes, define file views, and use the collective I/O functions
ad