320 likes | 364 Views
This study benchmarks .NetCDF4 performance against .NetCDF3, exploring configurations, storage layouts, system cache usage, and different access patterns like contiguous and chunked layouts. It delves into the impact of cache sizes, hyperslab selections, compression, and chunk sizes on performance to offer insights on achieving better efficiency and avoiding pitfalls.
E N D
Part I • Will the performance in netCDF4 comparable with that in netCDF3?
Configurations • Dataset • 40 MB: 6 files • 1 MB: 6 files • Storage Layout • Contiguous • Chunked (HDF5 default cache size: 1 MB) • Chunked (HDF5 cache size: 64 MB) • System Cache
System Cache • On • Use all caches and buffers provided by kernel • Drop • “drop_caches” to read data from disk • “fsync” to write data into disk
Default Hyperslab • One big hyperslab is selected
H5Pset_alloc_time(EARLY) H5Pset_alloc_time(EARLY)
Part II • Can I get better performance with netCDF4? If yes, under what circumstances can I get better performance?
256 256 16 16384 240 1 Non-contiguous Access • Logical layout for 2-dimensional arrays
Chunk size [4096][1] Chunk size [8192][1] Chunk size [16384][1] Non-contiguous Access • Physical layout
13. Compression • Compression ratio
Part III • Can netCDF4 performance be bad? How can I avoid the bad performance?
14. Chunk size • Too small chunk size is bad • Little bit smaller than é(number of elements) / Nù is bad
3162 3162 791 790 14. Chunk size dataset chunk
14. Chunk size (more) é3162/nù + 1 é3162/nù é3162/nù - 1
15. Many Hyperslab selections H5Pcreate() H5Dopen()
Conclusion • The performance in netCDF4 is comparable with that in netCDF3 • Improvement • Non-contiguous access pattern • Adjusted cache size • Compression • Pitfall • Small chunk size • Many small hyperslab selections