1 / 25

Parallel and Grid I/O Infrastructure

Parallel and Grid I/O Infrastructure. W. Gropp, R. Ross, R. Thakur Argonne National Lab. A. Choudhary, W. Liao Northwestern University. G. Abdulla, T. Eliassi-Rad Lawrence Livermore National Lab. Outline. Introduction PVFS and ROMIO Parallel NetCDF Query Pattern Analysis

leia
Download Presentation

Parallel and Grid I/O Infrastructure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parallel and Grid I/O Infrastructure W. Gropp, R. Ross, R. Thakur Argonne National Lab A. Choudhary, W. Liao Northwestern University G. Abdulla, T. Eliassi-Rad Lawrence Livermore National Lab

  2. Outline • Introduction • PVFS and ROMIO • Parallel NetCDF • Query Pattern Analysis Please interrupt at any point for questions! Parallel and Grid I/O Infrastructure

  3. What is this project doing? • Extending existing infrastructure work • PVFS parallel file system • ROMIO MPI-IO implementation • Helping match application I/O needs to underlying capabilities • Parallel NetCDF • Query Pattern Analysis • Linking with Grid I/O resources • PVFS backend for GridFTP striped server • ROMIO on top of Grid I/O API Parallel and Grid I/O Infrastructure

  4. What Are All These Names? • MPI - Message Passing Interface Standard • Also known as MPI-1 • MPI-2 - Extensions to MPI standard • I/O, RDMA, dynamic processes • MPI-IO - I/O part of MPI-2 extensions • ROMIO - Implementation of MPI-IO • Handles mapping MPI-IO calls into communication (MPI) and file I/O • PVFS - Parallel Virtual File System • An implementation of a file system for Linux clusters Parallel and Grid I/O Infrastructure

  5. Fitting the Pieces Together • Query Pattern Analysis (QPA) and Parallel NetCDF both written in terms of MPI-IO calls • QPA tools pass information down through MPI-IO hints • Parallel NetCDF written using MPI-IO for data read/write • ROMIO implementation uses PVFS as storage medium on Linux clusters or could hook to Grid I/O resources Parallel and Grid I/O Infrastructure

  6. PVFS and ROMIO • Provide a little background on the two • What they are, example to set context, status • Motivate the work • Discuss current research and development • I/O interfaces • MPI-IO Hints • PVFS2 Our work with these two closely tied together. Parallel and Grid I/O Infrastructure

  7. Parallel Virtual File System • Parallel file system for Linux clusters • Global name space • Distributed file data • Builds on TCP, local file systems • Tuned for high performance concurrent access • Mountable like NFS file systems • User-level interface library (used by ROMIO) • 200+ users on mailing list, 100+ downloads/month • Up from 160+ users in March • Installations at OSC, Univ. of Utah, Phillips Petroleum, ANL, Clemson Univ., etc. Parallel and Grid I/O Infrastructure

  8. PVFS Architecture • Client - Server architecture • Two server types • Metadata server (mgr) - keeps track of file metadata (permissions, owner) and directory structure • I/O servers (iod) - orchestrate movement of data between clients and local I/O devices • Clients access PVFS one of two ways • MPI-IO (using ROMIO implementation) • Mount through Linux kernel (loadable module) Parallel and Grid I/O Infrastructure

  9. PVFS Performance • Ohio Supercomputer Center cluster • 16 I/O servers (IA32), 70+ clients (IA64), IDE disks • Block partitioned data, accessed through ROMIO PVFS and ROMIO

  10. ROMIO • Implementation of MPI-2 I/O specification • Operates on wide variety of platforms • Abstract Device Interface for I/O (ADIO) aids in porting to new file systems • Fortran and C bindings • Successes • Adopted by industry (e.g. Compaq, HP, SGI) • Used at ASCI sites (e.g. LANL Blue Mountain) Parallel and Grid I/O Infrastructure

  11. Example of Software Layers • FLASH Astrophysics application stores checkpoints and visualization data using HDF5 • HDF5 in turn uses MPI-IO (ROMIO) to write out its data files • PVFS client library isused by ROMIO to writedata to PVFS file system • PVFS client libraryinteracts with PVFSservers over network Parallel and Grid I/O Infrastructure

  12. Example of Software Layers (2) • FLASH Astrophysics application stores checkpoints and visualization data using HDF5 • HDF5 in turn uses MPI-IO (IBM) to write out its data files • GPFS File System storesdata to disks Parallel and Grid I/O Infrastructure

  13. Status of PVFS and ROMIO • Both are freely available, widely distributed, documented, and supported products • Current work focuses on: • Higher performance through more rich file systems interfaces • Hint mechanisms for optimizing behavior of both ROMIO and PVFS • Scalability • Fault tolerance Parallel and Grid I/O Infrastructure

  14. Why Does This Work Matter? • Much of I/O on big machines goes through MPI-IO • Direct use of MPI-IO (visualization) • Indirect use through HDF5 or NetCDF (fusion, climate, astrophysics) • Hopefully soon through Parallel NetCDF! • On clusters, PVFS is currently the most deployed parallel file system • Optimizations in these layers are of direct benefit to those users • Providing guidance to vendors for possible future improvements Parallel and Grid I/O Infrastructure

  15. I/O Interfaces • Scientific applications keep structured data sets in memory and in files • For highest performance, the description of the structure must be maintained through software layers • Allow the scientist to describe the data layout in memory and file • Avoid packing into buffers in intermediate layers • Minimize the number of file system operations needed to perform I/O Parallel and Grid I/O Infrastructure

  16. Memory File File System Interfaces • MPI-IO is a great starting point • Most underlying file systems only provide POSIX-like contiguous access • List I/O work was first step in the right direction • Proposed FS interface • Allows movement of lists ofdata regions in memory andfile with one call Parallel and Grid I/O Infrastructure

  17. List I/O • Implemented in PVFS • Transparent to user throughROMIO • Distributed in latest releases Parallel and Grid I/O Infrastructure

  18. Flattening A File Datatype Datatype size of a byte # of Datatypes 1 2 3 # of Bytes 0 1 2 3 4 5 6 7 8 9 10 11 File Offsets 0 2 6 10 File Lengths 1 3 3 2 List I/O Example • Simple datatyperepeated over file • Desire to read first9 bytes • This is converted intofour [offset,length] pairs • One can see how this process could result in a very large list of offsets and lengths Parallel and Grid I/O Infrastructure

  19. Describing Regular Patterns • List I/O can’t describe regular patterns (e.g. a column of a 2D matrix) in an efficient manner • MPI datatypes can do this easily • Datatype I/O is our solution to this problem • Concise set of datatype constructors used to describe types • API for passing these descriptions to a file system Parallel and Grid I/O Infrastructure

  20. Datatype I/O • Built using a generic datatype processing component (also used in MPICH2) • Optimizing for performance • Prototype for PVFS in progress • API and server support • Prototype of support in ROMIO in progress • Maps MPI datatypes to PVFS datatypes • Passes through new API • This same generic datatype component could be used in other projects as well Parallel and Grid I/O Infrastructure

  21. # of Datatypes 1 2 # of Bytes Datatype size of a byte 3 0 1 2 3 4 5 6 7 8 9 10 11 Datatype I/O Example • Same datatype as in previous example • Describe datatype with one construct: • index {(0,1), (2,2)} describes pattern of one short block and one longer one • automatically tiled (as with MPI types for files) • Linear relationship between # of contiguous pieces and size of request is removed Parallel and Grid I/O Infrastructure

  22. MPI Hints for Performance • ROMIO has a number of performance optimizations built in • The optimizations are somewhat general, but there are tuning parameters that are very specific • buffer sizes • number and location of processes to perform I/O • data sieving and two-phase techniques • Hints may be used to tune ROMIO to match the system Parallel and Grid I/O Infrastructure

  23. ROMIO Hints • Currently all of ROMIO’s optimizations may be controlled with hints • data sieving • two-phase I/O • list I/O • datatype I/O • Additional hints are being considered to allow ROMIO to adapt to access patterns • collective-only I/O • sequential vs. random access • inter-file dependencies Parallel and Grid I/O Infrastructure

  24. PVFS2 • PVFS (version 1.x.x) plays an important role as a fast scratch file system for use today • PVFS2 will supersede this version, adding • More comprehensive system management • Fault tolerance through lazy redundancy • Distributed metadata • Component-based approach for supporting new storage and network resources • Distributed metadata and fault tolerance will extend scalability into thousands and tens of thousands of clients and hundreds of servers • PVFS2 implementation is underway Parallel and Grid I/O Infrastructure

  25. Summary • ROMIO and PVFS are a mature foundation on which to make additional improvements • New, rich I/O descriptions allow for higher performance access • Addition of new hints to ROMIO allows for fine-tuning its operation • PVFS2 focuses on the next generation of clusters Parallel and Grid I/O Infrastructure

More Related