Migrating from HDF5 1.6 to HDF5 1.8 - PowerPoint PPT Presentation

migrating from hdf5 1 6 to hdf5 1 8 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Migrating from HDF5 1.6 to HDF5 1.8 PowerPoint Presentation
Download Presentation
Migrating from HDF5 1.6 to HDF5 1.8

play fullscreen
1 / 91
Migrating from HDF5 1.6 to HDF5 1.8
212 Views
Download Presentation
jorn
Download Presentation

Migrating from HDF5 1.6 to HDF5 1.8

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Migrating from HDF5 1.6 to HDF5 1.8 HDF and HDF-EOS Workshop XII

  2. Outline • Status of the HDF5 1.6 and 1.8 releases • Overview of the HDF5 1.8 features • How to move applications to HDF5 1.8 ? HDF and HDF-EOS Workshop XII

  3. Status of HDF5 releases HDF and HDF-EOS Workshop XII

  4. Current HDF5 Releases • HDF5 1.8.0 was released in February 2008 • Major update of HDF5 1.6.* series (stable set of features and APIs since 1998) • New features • 200 new APIs • Changes to file format • Changes to APIs • Backward compatible • HDF5 1.8.1 was released in June 2008 • Minor bug fixes • Included Fortran90 APIs for new C functions HDF and HDF-EOS Workshop XII

  5. Current HDF5 Releases • HDF5 1.6.7 was released in February 2008 • Addressed backward compatibility bug for reading files with corrupted object header information • New maintenance releases will be in November 2008 • HDF5 1.6.8 and 1.8.2 • Minor bug fixes • Tools improvements • Current plans are to support HDF5 1.6 and 1.8 until November 2009 HDF and HDF-EOS Workshop XII

  6. Information About Current Releases http://www.hdfgroup.org/HDF5 HDF and HDF-EOS Workshop XII

  7. Goal of the Tutorial • Help with transition to the 1.8 releases • Discuss new features beneficial to applications written for 1.6 releases • Raise awareness about forward/backward compatibility issues with the 1.8 releases • Get feedback from the users who already moved to 1.8 releases HDF and HDF-EOS Workshop XII

  8. Why New Features? • Need to address some deficiencies in initial design • Examples: • Big overhead in file sizes • Non-tunable metadata cache implementation • Handling of free-space in a file HDF and HDF-EOS Workshop XII

  9. Why New Features? • Need to address new requirements • Add support for • New types of indexing (object creation order) • Big volumes of variable-length data (DNA sequences) • Simultaneous real-time streams (fast append to one -dimensional datasets) • UTF-8 encoding for objects’ path names • Accessing objects stored in another HDF5 files (external or user-defined links) HDF and HDF-EOS Workshop XII

  10. What Did We Do in HDF5 1.8? • Extended File Format Specification • Reviewed group implementations • Introduced new link object • Revamped metadata cache implementation • Improved handling of datasets and datatypes • Introduced shared object header message • Extended error handling • Enhanced backward/forward APIs and file format compatibility HDF and HDF-EOS Workshop XII

  11. What Did We Do in HDF5 1.8? And much more good stuff to make HDF5 Better and Faster HDF and HDF-EOS Workshop XII

  12. HDF5 File Format Extension HDF and HDF-EOS Workshop XII

  13. HDF5 File Format Extension • Why: • Address deficiencies of the original file format • Address space overhead in an HDF5 file • Enable new features • What: • New routine that instructs the HDF5 library to create all objects using the latest version of the HDF5 file format (cmp. with the earliest version when object became available, for example, array datatype) HDF and HDF-EOS Workshop XII

  14. HDF5 File Format Extension Example /* Use the latest version of a file format for each object created in a file */ fapl_id = H5Pcreate(H5P_FILE_ACCESS); H5Pset_libver_bounds(fapl_id, H5F_LIBVER_LATEST, H5F_LIBVER_LATEST); fid = H5Fcreate(…,…,…,fapl_id); or fid = H5Fopen(…,…,fapl_id); HDF and HDF-EOS Workshop XII

  15. Group Revisions HDF and HDF-EOS Workshop XII

  16. Better Large Group Storage • Why: • Faster, more scalable storage and access for large groups • What: • New format and method for storing groups with many links HDF and HDF-EOS Workshop XII

  17. Informal Benchmark • Create a file and a group in a file • Create up to 10^6 groups with one dataset in each group • Compare files sizes and performance of HDF5 1.8.1 using the latest group format with the performance of HDF5 1.8.1 (default, old format) and 1.6.7 • Note: Default 1.8.1 and 1.6.7 became very slow after 700000 groups HDF and HDF-EOS Workshop XII

  18. Time to Open and Read a Dataset HDF and HDF-EOS Workshop XII

  19. Time to Close the File HDF and HDF-EOS Workshop XII

  20. File Size HDF and HDF-EOS Workshop XII

  21. Access Links by Creation Order • Why: • Allow iteration and lookup of group’s links (children) by creation order as well as by name order • Support NetCDF access model for NetCDF-4 • What: • Option to access objects in group according to relative creation time HDF and HDF-EOS Workshop XII

  22. Access Links by Creation Order Example /* Track and index creation order of the links */ H5Pset_link_creation_order(gcpl_id, (H5P_CRT_ORDER_TRACKED | H5P_CRT_ORDER_INDEXED)); /* Create a group */ gid = H5Gcreate(fid, GNAME, H5P_DEFAULT, gcpl_id, H5P_DEFAULT); HDF and HDF-EOS Workshop XII

  23. Example: h5dump --group=1 tordergr.h5 HDF5 "tordergr.h5" { GROUP "1" { GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a21" { } GROUP "a22" { } } } GROUP "b" { } GROUP "c" { … HDF and HDF-EOS Workshop XII

  24. Example: h5dump --sort_by=creation_order HDF5 "tordergr.h5" { GROUP "1" { GROUP "c" { } GROUP "b" { } GROUP "a" { GROUP "a1" { } GROUP "a2" { GROUP "a22" { } GROUP "a21" { } } } HDF and HDF-EOS Workshop XII

  25. Compact Groups • Why: • Save space and access time for small groups • If groups are small, don’t need B-tree overhead • What: • Alternate storage for groups with few links • Default storage when “latest format” is specified • Library converts to “original” storage (B-tree based) using default or user-specified threshold HDF and HDF-EOS Workshop XII

  26. Compact Groups • Example • File with 11,600 groups • With original group structure, file size ~ 20 MB • With compact groups, file size ~ 12 MB • Total savings: 8 MB (40%) • Average savings/group: ~700 bytes HDF and HDF-EOS Workshop XII

  27. Compact Groups Example /* Change storage to “dense” if number of group members is bigger than 16 and go back to compact storage if number of group members is smaller than 12 */ H5Pset_link_phase_change(gcpl_id, 16, 12) /* Create a group */ g_id = H5Gcreate(…,…,…,gcpl_id,…); HDF and HDF-EOS Workshop XII

  28. Intermediate Group Creation • Why: • Simplify creation of a series of connected groups • Avoid having to create each intermediate group separately, one by one • What: • Intermediate groups can be created when creating an object in a file, with one function call HDF and HDF-EOS Workshop XII

  29. / / A A B C dset1 Intermediate Group Creation • Want to create “/A/B/C/dset1” • “A” exists, but “B/C/dset1” do not One call creates groups “B” & “C”, then creates “dset1” HDF and HDF-EOS Workshop XII

  30. Intermediate Group Creation Example /* Create link creation property list */ lcrp_id = H5Pcreate(H5P_LINK_CREATE); /* Set flag for intermediate group creation Groups B and C will be created automatically */ H5Pset_create_intermediate_group(lcrp_id, TRUE); ds_id = H5Dcreate (file_id, "/A/B/C/dset1",…,…, lcrp_id,…,…,); HDF and HDF-EOS Workshop XII

  31. Link Revisions HDF and HDF-EOS Workshop XII

  32. <address> “/target dataset” What are Links? • Links connect groups to their members • “Hard” links point to a target by address • “Soft” links store the path to a target root group Hard link Soft link dataset HDF and HDF-EOS Workshop XII

  33. Links: Before and After • New data model for handling links • Links may have properties (UTF-8 name encoding, creation order indexing, storage property, etc.) Before After Group Name and other properties Group Name Object Object HDF and HDF-EOS Workshop XII

  34. Anonymous Object • Object can be created without being immediately linked into graph structure • Group, dataset and datatype • See new H5*create_anon APIs Group Object • Use H5O* APIs to manipulate the objects HDF and HDF-EOS Workshop XII

  35. New: External Links • Why: • Access objects stored in other HDF5 files in a transparent way • What: • Store location of file and path within that file • Can link across files HDF and HDF-EOS Workshop XII

  36. “target object” <address> “External_link” “file2.h5” “/A/B/C/D/E” New: External Links file2.h5 root group file1.h5 root group group External link object “External_link” in file1.h5 points to the group /A/B/C/D/E in file2.h5 HDF and HDF-EOS Workshop XII

  37. External Links Example /* Create an external link */ H5Lcreate_external(TARGET_FILE, ”/A/B/C/D/E", source_file_id, ”External_link”, …,…); /* We will use external link to create a group in a target file */ gr_id = H5Gcreate(source_file_id,”External_link/F”,…,…,…,…); /* We can access group “External_link/F” in the source file and group “/A/B/C/D/E/F” in the target file */ HDF and HDF-EOS Workshop XII

  38. New: User-defined Links • Why: • Allow applications to create their own kinds of links and link operations, such as • Create “hard” external link that finds an object by address • Create link that accesses a URL • Keep track of how often a link is accessed, or other behavior • What: • Applications can create new kinds of links by supplying custom callback functions • Can do anything HDF5 hard, soft, or external links do HDF and HDF-EOS Workshop XII

  39. Traversing an HDF5 File HDF and HDF-EOS Workshop XII

  40. Traversing HDF5 File • Why: • Allow applications to iterate through the objects in a group or visit recursively all objects under a group • What: • New APIs to traverse a group hierarchy • New APIs to iterate through a group using different types of indices (name or creation order) • H5Giterate is deprecated in favor of new functions HDF and HDF-EOS Workshop XII

  41. Traversing HDF5 File Example of some new APIs /* Check if object “A/B” exists in a root group */ H5Lexists(file_id, “A/B”, …); /* Iterate through group members of a root group using name as an index; this function doesn’t recursively follow links into subgroups */ H5Literate(file_id, H5_INDEX_NAME, H5_ITER_INC, &idx, iter_link_cb, &info); /* Visit all objects under the root group; this function recursively follow links into subgroups */ H5Lvisit(file_id, H5_INDEX_NAME, H5_ITER_INC, visit_link_cb, &info); HDF and HDF-EOS Workshop XII

  42. Traversing HDF5 File • Things to remember • Never use H5Ldelete in any HDF5 iterate or visit call back functions • Always close parent object before deleting a child object HDF and HDF-EOS Workshop XII

  43. Shared Object Header Messages HDF and HDF-EOS Workshop XII

  44. Dataset 1 Dataset 2 Dataset 3 datatype datatype datatype dataspace dataspace dataspace data 1 data 2 data 3 Shared Object Header Messages • Why: metadata duplicated many times, wasting space • Example: • You create a file with 10,000 datasets • All use the same datatype and dataspace • HDF5 needs to write this information 10,000 times! HDF and HDF-EOS Workshop XII

  45. Shared Object Header Messages What: • Enable messages to be shared automatically • HDF5 shares duplicated messages on its own! Dataset 1 Dataset 2 datatype dataspace data 1 data 2 HDF and HDF-EOS Workshop XII

  46. Shared Messages • Happens automatically • Works with datatypes, dataspaces, attributes, fill values, and filter pipelines • Saves space if these objects are relatively large • May be faster if HDF5 can cache shared messages • Drawbacks • Usually slower than non-shared messages • Adds overhead to the file • Index for storing shared datatypes • 25 bytes per instance • Older library versions can’t read files with shared messages HDF and HDF-EOS Workshop XII

  47. Two Informal Tests • File with 24 datasets, all with same big datatype • 26,000 bytes normally • 17,000 bytes with shared messages enabled • Saves 375 bytes per dataset • But, make a bad decision: invoke shared messages but only create one dataset… • 9,000 bytes normally • 12,000 bytes with shared messages enabled • Probably slower when reading and writing, too. • Moral: shared messages can be a big help, but only in the right situation! HDF and HDF-EOS Workshop XII

  48. Error Handling HDF and HDF-EOS Workshop XII

  49. Extendible Error-handling APIs • Why: Enable application to integrate error reporting with HDF5 library error stack • What: New error handling API • H5Epush - push major and minor error ID on specified error stack • H5Eprint – print specified stack • H5Ewalk – walk through specified stack • H5Eclear – clear specified stack • H5Eset_auto – turn error printing on/off for specified stack • H5Eget_auto – return settings for specified stack traversal HDF and HDF-EOS Workshop XII

  50. Error-handling Programming Model • Create new class, major and minor error messages • Register messages with the HDF5 library • Manage errors • Use default or create new error stack • Push error • Print error stack • Close stack HDF and HDF-EOS Workshop XII