slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University PowerPoint Presentation
Download Presentation
Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University

Loading in 2 Seconds...

play fullscreen
1 / 17

Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Profile-Guided I/O Partitioning. Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University {yiwang, kaeli}@ece.neu.edu. Outline. Introduction Related work Profile-guided I/O partitioning Benchmarks Experimental results Conclusions and future work.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Yijian Wang David Kaeli Electrical and Computer Engineering Department Northeastern University' - vivien


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Profile-Guided I/O Partitioning

Yijian Wang

David Kaeli

Electrical and Computer Engineering Department

Northeastern University

{yiwang, kaeli}@ece.neu.edu

slide2

Outline

  • Introduction
  • Related work
  • Profile-guided I/O partitioning
  • Benchmarks
  • Experimental results
  • Conclusions and future work
slide3

Introduction

  • The I/O bottleneck
    • The growing gap between the speed of processors and I/O devices
    • Some applications access disks very frequently
  • I/O intensive applications
    • Multimedia applications
    • Database applications
    • Parallel scientific applications
slide4

Related work

  • Fast disks
    • FC-connected SCSI disks
    • Smart caching I/O controller (EMC, IO Integrity)
  • Parallel I/O
    • Parallel disks (i.e., RAID)
    • Parallel file systems (NFS, PIOF, HPS, etc.)
    • Runtime parallel systems (MPI-IO, ROMIO, ADIO)
    • Compiler technology
      • (Loop tiling, compiler-directed collective I/O)
    • To achieve high performance, I/O should be parallelized at multiple levels (application, file system, disks)
slide5

I/O Partitioning

  • Our target applications are parallel scientific codes running on Beowulf clusters
  • I/O is parallelized at both the application level (using MPI and MPI-IO) and the disk level (using file partitioning)
  • Ideally, every process will only access files on local disk (though this is typically not possible due to data sharing)
  • How to recognize the access patterns ?
    • dynamically (profiling)
    • statically (compiler)
profile generation
Profile generation

Run the application

Capture I/O traces

Apply our partitioning algorithm

Rerun the tuned application

slide7

I/O traces and partitioning

  • For every process, for every contiguous file access, we capture the following I/O profile information:
    • Process ID
    • File ID
    • Address
    • Chunk size
    • I/O operation (read/write)
    • Timestamp
  • Generate a partition for every process
  • Partitioning is NP-complete
slide8

Our Greedy Algorithm

For each MPI-IO process

create a file partition;

For each contiguous data chunk

identify the process that most frequently accesses this chunk;

assign the chunk to the associated partition;

For each partition

reorder data in the partition based on first access to each chunk;

benchmarks
Benchmarks
  • NASA Parallel Benchmark (NPB2.4)/BT
    • Computational fluid dynamics
    • Generates a file (~1.6 GB) dynamically and then reads it
    • Writes/reads sequentially in chunk sizes of 2040 Bytes
  • SPEChpc96/seismic
    • Seismic processing
    • Generates a file (~1.5 GB) dynamically and then reads it back
    • Writes sequential chunks of 96 KB and reads sequential chunks of 2 KB
  • mpi-tile-io
    • Parallel Benchmarking Consortium
    • Tile access to a two-dimensional matrix (~1 GB) with overlap
    • Writes/reads sequentially chunks of 32 KB, with 2KB of overlap
  • All applications uses MPI and MPI-IO for computation, communication and I/O
slide16

Conclusions and future work

  • We obtain scalable speedup due to:
    • creating parallel I/O channels
    • reducing disk seek time
    • reducing communication overhead
  • I/O access patterns are generally independent of data values, for the applications studied
  • Investigating static (compile time) approaches to I/O partitioning
northeastern university computer architecture research group http www ece neu edu groups nucar

Northeastern University Computer Architecture Research Grouphttp://www.ece.neu.edu/groups/nucar

This project is supported by the NSF-funded

Center for Subsurface Sensing and Imaging System (CenSSIS)