improving parallel i o performance with data layout awareness l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Improving Parallel I/O Performance with Data Layout Awareness PowerPoint Presentation
Download Presentation
Improving Parallel I/O Performance with Data Layout Awareness

Loading in 2 Seconds...

play fullscreen
1 / 26
bevan

Improving Parallel I/O Performance with Data Layout Awareness - PowerPoint PPT Presentation

112 Views
Download Presentation
Improving Parallel I/O Performance with Data Layout Awareness
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Improving Parallel I/O Performance with Data Layout Awareness Yong Chen, Oak Ridge National Laboratory Xian-He Sun, Illinois Institute of Technology Rajeev Thakur, Argonne National Laboratory Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology Presented by: Robert Latham, Argonne National Laboratory Cluster-2010

  2. I/O Bottleneck in High-Performance Computing • Significant gap between computing and I/O • Long I/O latency leads perf. degradation • Applications tend to be data intensive • Limited I/O attributed as the cause of low sustained performance Cluster-2010

  3. I/O for Computational Science

  4. Limitation of Current Parallel I/O System • Historically, parallel file system and parallel I/O middleware were designed and developed separately • There’s information gap between parallel I/O sub-systems • Parallel file system decides data layout on storage • Parallel I/O middleware optimizes, groups and rearranges accesses • The separation and information gap lose the potential optimization opportunity that benefits overall parallel I/O performance • For instance, collective I/O relies on the logical layout of file accesses, whereas the physical layout determines access latency & concurrency • Current parallel I/O does not explore layout awareness well Cluster-2010

  5. Data Layout and Data Accesses • Data layout mechanism decides how data are distributed among multiple file servers • A crucial factor determining the data access latency and the I/O subsystem performance • Significance and performance improvement demonstrated by arranging data properly • Log-like reordering • Parallel Log-structured File System (PLFS) • Optimize I/O access with aware of data layout is a necessity • A challenging and tedious task for users • Manual rearrangement is limited • Not scalable for petascale/exascale systems Cluster-2010

  6. Layout-aware Parallel I/O Strategy • A Parallel I/O Strategy with Data Layout Awareness • Consider physical data layout and data locality in parallel I/O strategy • Foster better integration of parallel file system and middleware • Achieve a better matched I/O • Contributions of this research • Demonstrate data layout has a clear impact on I/O performance • Propose layout-aware independent I/O and collective I/O strategy considering physical data layout and data locality • Verify with prototyping system that layout-aware parallel I/O strategy achieves performance improvement over existing systems Cluster-2010

  7. Independent I/O Cluster-2010

  8. Layout-aware Independent I/O Exploit data layout to reduce contention and improve data locality Cluster-2010

  9. Layout-aware Independent I/O • Considers the layout and improves access locality • Reveals data layout via file system calls and cached at middleware • Avoids the interruption caused by the contention from processes • Reduces the performance loss due to the contention • Decouples the network communication and I/O operations • Avoids I/O serialization to file servers • Reduces imbalanced response time for different processes • The total execution time can be improved even though the response time for individual processes are not well balanced • Note that the total response time, or the time-to-solution, is what users care for a parallel application Cluster-2010

  10. Collective I/O and Two-phase Implementation Cluster-2010

  11. Layout-aware Collective I/O Cluster-2010

  12. Layout-aware Collective I/O Cluster-2010

  13. Layout-aware Collective I/O • Conventional collective I/O: combine noncontiguous accesses and split in a logically contiguous way • Layout-aware collective I/O: combine noncontiguous accesses and split in a logically noncontiguous way but with better physical locality and reduced data access contention • Layout-aware collective I/O can be beneficial • Still performs collective I/O – the overlapping and redundant requests are removed • The number of requests to the parallel file system is controlled by taking advantage of noncontiguous parallel file system calls • Access rearranging and reordering exploit better locality and reduce access contention Cluster-2010

  14. Experimental Setup • Experimental environment • 65-node Sun Fire Linux-based cluster • Sun Fire X4240 head node • 12x500GB 7.2K-RPM SATA-II drives configured as a RAID-5 • Sun Fire X2200 compute nodes • 250GB 7.2K-RPM SATA hard drive • MPICH2-1.0.5p3 release • PVFS2 2.8.1 • Benchmarks • Synthetic user-level checkpointing application • IOR benchmark Cluster-2010

  15. Layout-aware Independent I/O on PVFS2 Sustained bandwidth decreased (execution time increased) when the number of processes increased even though total image size remained same due to contention Bandwidth improved by 8.36% and 45.7% on average respectively Achieved stable performance under various cases Cluster-2010

  16. Layout-aware Collective I/O Performance Left: IOR Random Reads Testing Up to 74% speedup On average, 40% speedup Right: IOR Random Writes Testing Up to 38% speedup On average, 23% speedup Cluster-2010

  17. Layout-aware Collective I/O Performance Left: IOR Interleaved Reads Testing Up to 112% speedup On average, 28% speedup Right: IOR Interleaved Writes Testing Up to 45% speedup On average, 16% speedup Cluster-2010

  18. Conclusion Poor I/O performance has been a bottleneck in HPC Parallel I/O middleware and parallel file systems are critical Little has been done to exploit a layout aware optimization and to foster a better integration of these two subsystems We propose a new layout-aware parallel I/O strategy and exploit this strategy for both independent I/O and collective I/O Preliminary results have demonstrated the potential More research needed for next-generation I/O architectures to support layout awareness, access awareness and intelligence Cluster-2010

  19. Ongoing and Future Work Application-specific customized data layout strategy Adapt to a proper data layout depending on specific access pattern Continue investigation on layout and access awareness optimizations Cluster-2010

  20. Any Questions? Thank you. Welcome to visit: http://ft.ornl.gov http://www.cs.iit.edu/~scs This research was sponsored in part by Cluster-2010

  21. Improving Parallel I/O Performance with Data Layout Awareness Yong Chen, Oak Ridge National Laboratory Xian-He Sun, Illinois Institute of Technology Rajeev Thakur, Argonne National Laboratory Huaiming Song, Illinois Institute of Technology Hui Jin, Illinois Institute of Technology Presented by: Robert Latham, Argonne National Laboratory Cluster-2010

  22. Backup Slides Cluster-2010

  23. Performance Gap Between Computing and I/O Cluster-2010

  24. Data Layout Matters Cluster-2010

  25. Independent I/O An Ideal Case Cluster-2010

  26. Dynamic Application-specific I/O Optimization Architecture Cluster-2010