1 / 17

Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University

Profile-Guided I/O Partitioning. Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University {yiwang, kaeli}@ece.neu.edu. Outline. Introduction Related work Profile-guided I/O partitioning Benchmarks Experimental results Conclusions and future work.

vivien
Download Presentation

Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profile-Guided I/O Partitioning Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University {yiwang, kaeli}@ece.neu.edu

  2. Outline • Introduction • Related work • Profile-guided I/O partitioning • Benchmarks • Experimental results • Conclusions and future work

  3. Introduction • The I/O bottleneck • The growing gap between the speed of processors and I/O devices • Some applications access disks very frequently • I/O intensive applications • Multimedia applications • Database applications • Parallel scientific applications

  4. Related work • Fast disks • FC-connected SCSI disks • Smart caching I/O controller (EMC, IO Integrity) • Parallel I/O • Parallel disks (i.e., RAID) • Parallel file systems (NFS, PIOF, HPS, etc.) • Runtime parallel systems (MPI-IO, ROMIO, ADIO) • Compiler technology • (Loop tiling, compiler-directed collective I/O) • To achieve high performance, I/O should be parallelized at multiple levels (application, file system, disks)

  5. I/O Partitioning • Our target applications are parallel scientific codes running on Beowulf clusters • I/O is parallelized at both the application level (using MPI and MPI-IO) and the disk level (using file partitioning) • Ideally, every process will only access files on local disk (though this is typically not possible due to data sharing) • How to recognize the access patterns ? • dynamically (profiling) • statically (compiler)

  6. Profile generation Run the application Capture I/O traces Apply our partitioning algorithm Rerun the tuned application

  7. I/O traces and partitioning • For every process, for every contiguous file access, we capture the following I/O profile information: • Process ID • File ID • Address • Chunk size • I/O operation (read/write) • Timestamp • Generate a partition for every process • Partitioning is NP-complete

  8. Our Greedy Algorithm For each MPI-IO process create a file partition; For each contiguous data chunk identify the process that most frequently accesses this chunk; assign the chunk to the associated partition; For each partition reorder data in the partition based on first access to each chunk;

  9. Benchmarks • NASA Parallel Benchmark (NPB2.4)/BT • Computational fluid dynamics • Generates a file (~1.6 GB) dynamically and then reads it • Writes/reads sequentially in chunk sizes of 2040 Bytes • SPEChpc96/seismic • Seismic processing • Generates a file (~1.5 GB) dynamically and then reads it back • Writes sequential chunks of 96 KB and reads sequential chunks of 2 KB • mpi-tile-io • Parallel Benchmarking Consortium • Tile access to a two-dimensional matrix (~1 GB) with overlap • Writes/reads sequentially chunks of 32 KB, with 2KB of overlap • All applications uses MPI and MPI-IO for computation, communication and I/O

  10. Conclusions and future work • We obtain scalable speedup due to: • creating parallel I/O channels • reducing disk seek time • reducing communication overhead • I/O access patterns are generally independent of data values, for the applications studied • Investigating static (compile time) approaches to I/O partitioning

  11. Northeastern University Computer Architecture Research Grouphttp://www.ece.neu.edu/groups/nucar This project is supported by the NSF-funded Center for Subsurface Sensing and Imaging System (CenSSIS)

More Related