1 / 33

HDF4 and HDF5 Performance Preliminary Results

HDF4 and HDF5 Performance Preliminary Results. Elena Pourmal IV HDF-EOS Workshop September 19 - 21 2000. Why compare?. HDF5 emerges as a new standard proved to be robust most of the planned features have been implemented in HDF5-1.2.2 has a lot of new features compared to HDF4

Download Presentation

HDF4 and HDF5 Performance Preliminary Results

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDF4 and HDF5 PerformancePreliminary Results Elena Pourmal IV HDF-EOS Workshop September 19 - 21 2000

  2. Why compare? • HDF5 emerges as a new standard • proved to be robust • most of the planned features have been implemented in HDF5-1.2.2 • has a lot of new features compared to HDF4 • time for performance study and tuning • Users move their data and applications to HDF5 • HDF4 is not “bad,” but has limited capabilities

  3. Files over 2GB Unlimited number of objects One data model (multidimensional array of structures) || support Thread safe Mounting files Diversity of datatypes (compound, VL, opaque) and operations (create, write, read, delete, shared) “Native” file is portable Modifiable I/O pipe-line (registration of compression methods) Selections (unions and regular blocks) Files less than 2GB Max limit 20000 of objects Different data models for SD, GR, RI, Vdatas N/A N/A N/A Only predefined datatypes such as float32, int16, char8 “Native” file is not portable N/A Selections (simple regular subsampling) HDF5 HDF4

  4. What to compare?(short list of common features) • File I/O operations • plain read and write • hyperslab selections • regular subsampling • access to large number of objects • storage overhead • Data organization in the file and access to it • Vdata vs compound datasets • Chunking, unlimited dimensions, compression

  5. Benchmark Environment • 440-Mhz UltraSPARC i-IIi • 1G memory • Sun OS 5.7 • gettimeofday() • 2 - 550 Mhz Pentium III Xeon • 1G memory • RedHat 6.2 • clock() • each measurement was taken 10 times, average and best times were collected

  6. Benchmarks • Writing 1Dim and 2Dim datasets of integers • Reading 2Dim contiguous hyperslabs of integers • Reading 2Dim contiguous hyperslabs of integers with subsampling • Reading fixed size hyperslabs of integers from different locations in the dataset • Writing and reading Vdatas and Compound Datasets • CERES data

  7. Writing 1Dim and 2Dim Datasets

  8. Writing 1Dim Datasets • In this test we created one-dimensional arrays of integers with sizes varying from 8Kbytes to 8000 Kbytes in steps of 8Kbytes. We measured the average and best times for writing these arrays into HDF4 and HDF5 files. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

  9. Writing 1Dim Datasets HDF5 performs about 8 times better than HDF4. System activity affects timing results.

  10. Writing 2Dim Datasets • In this test we created two-dimensional arrays with sizes varying from 40 X 40 bytes to 4000 X 4000 bytes in steps of 40 bytes for each dimension. We measured the average and best times for writing these arrays into HDF4 and HDF5 files. The graphs were plotted by averaging the values obtained for the same array size, without considering the shape of the array. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

  11. Writing 2Dim Datasets HDF4 shows nonlinear growth. HDF5 performs about 10 times better than HDF4.

  12. Reading 2Dim Contiguous Hyperslabs

  13. Reading Contiguous Hyperslabs • In this test we created a file with 1000 X 1000 array of integers. Subsequently, we read hyperslabs of different sizes starting from a fixed position in the array and the measurements for read were averaged over 10 runs. HDF5-1.2.2, HDF5-1.2.2-patched and HDF5 development libraries were tested. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

  14. Reading Hyperslabs For hyperslabs > 1MB, HDF5 becomes more than 3 times slower than HDF4. It also shows nonlinear growth.

  15. Reading Hyperslabs (latest version of the HDF5 development branch) For hyperslabs > 2MB, HDF5 becomes more about 1.5 times slower than HDF4. It still shows nonlinear growth.

  16. Reading contiguous hyperslabs(fixed size) • In this test, the size of the hyperslab was fixed to 100x100 elements. The hyperslab was moved, first along the X axis, then along the Y axis, and finally along the diagonal and the read performance was measured. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

  17. Reading 100x100 Hyperslabs from Different Locations For small hyperslabs HDF5 performs about 3 times better than HDF4.

  18. Reading Hyperslabs with Subsampling

  19. Subsampling Hyperslabs • In this test we created a file with 1000x1000 array of integers. Subsequently, we read every second element of the hyperslabs of different sizes starting from a fixed position in the array and the measurements for read were averaged over 10 runs. HDF5-1.2.2, and HDF5 development libraries were tested. • Test was performed on Solaris platform. Neither HDF4 nor HDF5 performed data conversion.

  20. Reading Each Second Element of the Hyperslabs HDF5 shows nonlinear growth. HDF4 performs about 3 times for the hyperslabs with the size > .5MB

  21. First Attempt to Improve the Performance HDF4 still performs 2 times better for the hyperslabs > 2MB. HDF5 shows nonlinear growth.

  22. Current Behavior (HDF5 development branch) HDF5 growth linear and performs about 10 times better than HDF4.

  23. Vdatas vs Compound Datasets

  24. Vdatas and Compound Datasets • In this test we created HDF4 files with Vdata and HDF5 files with compound dataset with sizes from 1000 to 1000000 number of records: • float a; short b;float c[3]; char d; • write operation, write with packing data and partial read were tested. • Test was performed on Linux platforms. We also looked into data conversion issues.

  25. Writing Data (VSwrite and H5Dwrite) Conversion does not affect HDF4 performance. It does affect HDF5 ( more than in 15 times)

  26. Writing Data (timing includes packing:VSpack and H5Tpack) Data packing was added to the previous test. For HDF5 we have very small effect.

  27. Reading Two Fields Unpacking slows down HDF4 significantly ( about 8 times) HDF5 was reading packed data in this test.

  28. CERES Data File

  29. Structure of CERES file Vgroup CERES_ES8 Vgroup Data Fields Vgroup Geolocation Fields 2 1 18 19 SDS Vdata SDS Vdata

  30. Ceres File • Used H4toH5 converter to create an HDF5 version of the file • 81MB (HDF4), 80MB (HDF5) • 1 min 55 sec on Linux • 3 min 56 sec on Solaris • Benchmarks • read up to 14 datasets (2148x660 floats) • subsampling: read two columns from the same datasets • Benchmark was run on Solaris and Linux platforms

  31. Reading CERES data on big and little - endian machines On Solaris platform, HDF5 was twice faster than HDF4. On Linux (data conversion is on), HDF4 was about 1.3-1.5 faster.

  32. Subsetting CERES Data Current version of HDF5 shows about 3 times better performance.

  33. Conclusion • Goal: tune HDF5 and give our users recommendations on its efficient usage • Continue to study HDF4 and HDF5 performance • try more platforms: O2K, NT/Windows • try other features (e.g. chunking, compression) • specific HDF5 features (e.g. writing/reading big files, VL datatypes, compound datatypes, selections) • Users input is necessary, send us access patterns you use! • Results will be available @http://hdf.ncsa.uiuc.edu

More Related