1 / 19

Multi-threading in HDF5: Paths Forward

Multi-threading in HDF5: Paths Forward. Current implementation - Future directions. Outline. Introduction Current implementation Paths forward: Improve concurrency Reduce latency Conclusions and Recommendations. Introduction. Introduction. HDF5 design principles Flexibility

pier
Download Presentation

Multi-threading in HDF5: Paths Forward

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-threading in HDF5:Paths Forward Current implementation - Future directions HDF5 Workshop at PSI

  2. Outline • Introduction • Current implementation • Paths forward: • Improve concurrency • Reduce latency • Conclusions and Recommendations HDF5 Workshop at PSI

  3. Introduction HDF5 Workshop at PSI

  4. Introduction • HDF5 design principles • Flexibility • Adaptability to new computational environments • Current challenges: • Multi-threaded applications run on multi-core systems • HDF5 thread-safe library cannot support concurrency built into such applications HDF5 Workshop at PSI

  5. Current implementation HDF5 Workshop at PSI

  6. Current Implementation • HDF5 uses single global semaphore • Controls modification of memory and file data structures: • One thread at a time enters the library • An application thread enters HDF5 API routine, acquires semaphore • Other threads are blocked until the thread completes API call and releases semaphore • No simultaneous modifications of data structures that can cause file corruption • No race conditions when several threads try to modify a memory data structure HDF5 Workshop at PSI

  7. Current Implementation • Pros: • Current implementation provides thread-safety needed to avoid corruption of data structures • Cons: • No concurrent use of HDF5 library by multi-threaded applications HDF5 Workshop at PSI

  8. Paths forward HDF5 Workshop at PSI

  9. Improving Concurrency • Replace single global semaphore with semaphores that guard individual data structures • Pros: • Greater level of concurrency • No corruption of internal data structures • Each thread waits only when it needs to modify a data structure locked by another thread • Reduces waiting time for a resource to become available HDF5 Workshop at PSI

  10. Improving Concurrency • Cons: • Replacing the global semaphore with individual semaphores, locks, etc. requires careful analysis of HDF5 data structures and their interactions • 300K lines of C code in library will require 4-6 FTE years of knowledgeable staff • Significant future maintenance effort • Testing challenges HDF5 Workshop at PSI

  11. Reducing Latency • Reduce waiting time for each thread to acquire global semaphore • Reduce time by removing known HDF5 bottlenecks: • I/O performance • “Compute bound” (CB) operations • Datatype conversions • Compression and other filters • General overhead • E.g., structures for storing and accessing chunked datasets and metadata HDF5 Workshop at PSI

  12. Reducing Latency: I/O Performance • Support asynchronous I/O (AIO) access to data in HDF5 file • AIO initiated within the library in response to an API call • Completes in the backgroundafter API call has returned • Global semaphore is released when API call returns – less waiting time HDF5 Workshop at PSI

  13. Reducing Latency: CB operations • Use multiple threads within HDF5 library to • Perform datatype conversion • Perform compression on one chunk • Multiple threads work on one chunk • Perform compression on many chunks • Each thread works on a chunk HDF5 Workshop at PSI

  14. Reducing Latency: General Optimizations • Traditional optimizations • Some examples: • Algorithm improvements for handling • Chunk cache • Hyperslabselections • Memory usage • Data structure improvements • Chunk indices with O(1) lookup speed • Advanced B-tree implementations HDF5 Workshop at PSI

  15. Reducing Latency • Pros: • Smaller development effort, ~ 1.5 FTE years • Localized changes to the library • Easier to maintain • Incremental improvements • Cons: • Still uses global semaphore HDF5 Workshop at PSI

  16. Recommendations HDF5 Workshop at PSI

  17. Decision • Reduce Latency • Decision factors: • Available expertise • Cost • Already funded features: • AIO • Using multiple threads to compress a chunk • Future maintainability HDF5 Workshop at PSI

  18. Other considerations • Approaches are not mutually exclusive • Both can be implemented in the future if funding is available HDF5 Workshop at PSI

  19. Thank You! Questions? HDF5 Workshop at PSI

More Related