1 / 30

HDF5 library and tools

HDF5 library and tools. Kent Yang The HDF Group ESDSWG SPG Oct. 21, 2010. Why HDF5?. HDF4 shortcomings Limits on object and file size (<2GB) Limited number of objects (<20K) I/O performance Code complexity . HDF5 . Recognized by communities HDF-EOS5 and netCDF-4 built on HDF5

dolph
Download Presentation

HDF5 library and tools

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDF5 library and tools Kent Yang The HDF Group ESDSWG SPG Oct. 21, 2010 9th ESDSWG meeting

  2. Why HDF5? • HDF4 shortcomings • Limits on object and file size (<2GB) • Limited number of objects (<20K) • I/O performance • Code complexity 9th ESDSWG meeting

  3. HDF5 • Recognized by communities • HDF-EOS5 and netCDF-4 built on HDF5 • Widely used by many organizations • 2002 R&D 100 Award • Not compatible with HDF4 9th ESDSWG meeting

  4. Some HDF5 Features Data elements stored physically adjacent to each other Contiguous (default) Better access time for subsets; extensible Chunked Improves storage efficiency, transmission speed Chunked & Compressed 9th ESDSWG meeting

  5. Accessing data in contiguous dataset M rows M seeks are needed to find the starting location of the element. Data is read/written using M disk accesses. Performance may be very bad. 9th ESDSWG meeting

  6. Motivation for chunking storage M rows Two seeks are needed to find two chunks. Data is read/written using two disk accesses. For this pattern chunking helps with I/O performance. 9th ESDSWG meeting

  7. Chunking storage • Chunk cache can be used to speed up the performance • Chunk size cannot be changed after the dataset is created • Do not make chunk sizes too small (e.g. 1x1)! • Metadata overhead for each chunk • Each chunk is read individually • Many small reads inefficient 9th ESDSWG meeting

  8. HDF5 compression filters • GZIP (deflate) • SZIP – Rice algorithm developed at JPL • Good for floating-point numbers • Quick decoding time • Shuffle • Use with GZIP or SZIP to gain better compression ratio • Scale + offset • performs a scale and/or offset operation on each data value and truncates the resulting value to a minimum number of bits 9th ESDSWG meeting

  9. HDF5 compression filters • Shuffle • Use with GZIP and SZIP to gain better compression ratio • How Shuffling works? • Four 32-bit integers: 1, 23, 43, 56 • In hexadecimal form: 0x01, 0x17, 0x2B, 0x38 Easy to compress 9th ESDSWG meeting

  10. High Level APIs • Included along with the HDF5 library • Simplify steps for creating, writing, and reading objects. • Do not entirely ‘wrap’ HDF5 library 9th ESDSWG meeting

  11. HDF5 Platforms Supported • Systems • AIX • Various Linux • Solaris • Windows • Mac OS • FreeBSD • CrayXT3 • Open VMS • Compilers • IBM C and Fortran • GNU C, gfortran, g95 • Intel C and Fortran • PGI C and Fortran • Sun C and Fortran • Windows Visual Studio and intelfortran 9th ESDSWG meeting

  12. HDFView • A Java tool can view and edit HDF5 file contents URL: http://www.hdfgroup.org/hdf-java-html/hdfview/ 9th ESDSWG meeting

  13. HDF5 Command-line tools • h5ls • h5dump • h5repack • h5diff What these tools can do for you 9th ESDSWG meeting

  14. h5dump • Structure • Dataset • Binary • XML Examine file contents and dump file contents in an ASCII or binary file 9th ESDSWG meeting

  15. h5dump: Object Headers > h5dump -HSDS.h5 HDF5 "SDS.h5" { GROUP "/" { GROUP "Floats" { DATASET "FloatArray" { DATATYPE H5T_IEEE_F32LE DATASPACE SIMPLE { ( 4, 3 ) / ( 4, 3 ) } } } DATASET "IntArray" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 5, 6 ) / ( 5, 6 ) } } } } 9th ESDSWG meeting

  16. h5ls • Dump file contents but show the contents like Unix ls command. • h5ls -r SDS2.h5 • /Floats Group • /Floats/DoubleArray Dataset {10, 5} • /Floats/FloatArray Dataset {4, 3} • /Floats/subs Group • /IntArray Dataset {5, 6} 9th ESDSWG meeting

  17. h5repack • Remove inaccessible objects / junk spaces • Change storage layout • Apply compression filter Copies a file to a new file with different storage layouts and compression filters 9th ESDSWG meeting

  18. h5repack • Remove inaccessible objects • h5repack tools_junk.h5 tmp.h5 • Change layout • h5repack tools_bad_layout.h5 tmp.h5 • h5repack -l CHUNK=16x16 tools_bad_layout.h5 tmp.h5 • Change compression • h5repack -f GZIP=6 tmp.h5 tmp2.h5 9th ESDSWG meeting

  19. h5diff • Like Unix diff • Can apply to • Individual dataset • Whole file Show differences between two files or two objects 9th ESDSWG meeting

  20. Others • h5copy - Copies an object within a file or across files • h5import - Imports binary/ASCII data into an HDF5 file • h5check – Verifies whether an HDF5 file is compliant with the HDF5 File format specification • …… 9th ESDSWG meeting

  21. HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 9th ESDSWG meeting 21

  22. Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c gcc -I/home/packages/hdf5_1.6.6/Linux_2.6/include -UH5_DEBUG_API -DNDEBUG -I/home/packages/szip/static/encoder/Linux2.6-gcc/include -D_LARGEFILE_SOURCE -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_POSIX_SOURCE -D_BSD_SOURCE -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -c Sample_c.c gcc -std=c99 -Wno-long-long -O -fomit-frame-pointer -finline-functions -L/home/packages/szip/static/encoder/Linux2.6-gcc/lib Sample_c.o -L/home/packages/hdf5_1.6.6/Linux_2.6/lib /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5_hl.a /home/packages/hdf5_1.6.6/Linux_2.6/lib/libhdf5.a -lsz -lz -lm -Wl,-rpath -Wl,/home/packages/hdf5_1.6.6/Linux_2.6/lib 9th ESDSWG meeting 22

  23. Tools under the development • h5watch • Allows the user to monitor the growth of a dataset • Prints out the new elements appended to a dataset • h5edit • New tool for editing an HDF5 file • Initial implementation will include creating and deleting attributes 9th ESDSWG meeting

  24. HelpDesk Send emails to help@hdfgroup.org Requests from NASA: within 2 days 9th ESDSWG meeting

  25. Update HDF-EOS website • Software • Evaluating many packages • Examples • Adding examples for many • NASA products • Forums • Moderating the forum http://hdfeos.org 9th ESDSWG meeting

  26. NCL/IDL/MATLAB examples • Many examples from different NASA data centers’ • Example codes and plots • URLs: http://hdfeos.org/zoo 9th ESDSWG meeting

  27. An example to access AIRS Swath … data=eos_file->radiances_L2_Standard_cloud_cleared_radiance_product(:,:,0) ; read specific subset of data field ; In order to read the radiances data field from the HDF-EOS2 file, the group ; under which the data field is placed must be appended to the data field in NCL. For more information, ; visit section 4.3.2 of http://hdfeos.org/software/ncl.php. data@lat2d=eos_file->Latitude_L2_Standard_cloud_cleared_radiance_product ; associate longitude and latitude data@lon2d=eos_file->Longitude_L2_Standard_cloud_cleared_radiance_product data@_FillValue=-9999 ; … res@gsnCenterString="radiances at Channel=567" plot(2)=gsn_csm_contour_map_polar(xwks,data_2,res) res@gsnCenterString="radiances at Channel=1339" plot(3)=gsn_csm_contour_map_polar(xwks,data_3,res) delete(plot) ; cleaning up resources used delete(data) NCL 9th ESDSWG meeting

  28. HDF5 and CF • No restrictions for any CF attributes to be created/added inside an HDF5 file • We will provide example codes on how one can add CF attributes to an HDF5 file 9th ESDSWG meeting

  29. More information • About HDF5 : http://hdfgroup.org/HDF5 • More HDF5 tutorials: http://hdfeos.org/workshops/ws14/agenda.php 9th ESDSWG meeting

  30. Thank you ! 9th ESDSWG meeting

More Related