1 / 69

Introduction to HDF5

Introduction to HDF5. Barbara Jones The HDF Group The 13 th HDF & HDF-EOS Workshop November 3-5, 2009. Before We Begin …. HDF-EOS Home Page: http://hdfeos.org/ Workshop Info: http://hdfeos.org/workshops/ws13/workshop_thirteen.php The HDF Group Page: http://hdfgroup.org/

ttarter
Download Presentation

Introduction to HDF5

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction to HDF5 Barbara Jones The HDF Group The 13th HDF & HDF-EOS Workshop November 3-5, 2009 HDF/HDF-EOS Workshop XIII

  2. Before We Begin … HDF-EOS Home Page: http://hdfeos.org/ Workshop Info: http://hdfeos.org/workshops/ws13/workshop_thirteen.php The HDF Group Page: http://hdfgroup.org/ HDF5 Home Page: http://hdfgroup.org/HDF5/ HDF Helpdesk: help@hdfgroup.org HDF Mailing Lists: http://hdfgroup.org/services/support.html HDF/HDF-EOS Workshop XIII

  3. HDF = Hierarchical Data Format HDF5 is the second HDF format • Development started in 1996 • First release was in 1998 HDF4 is the first HDF format • Originally called HDF • Development started in 1987 • Still supported by The HDF Group HDF/HDF-EOS Workshop XIII

  4. HDF5 is like… 5 HDF/HDF-EOS Workshop XIII

  5. HDF5 is designed … • for high volume and/or complex data • for every size and type of system (portable) • for flexible, efficient storage and I/O • to enable applications to evolve in their use of HDF5 and to accommodate new models • to support long-term data preservation HDF/HDF-EOS Workshop XIII

  6. HDF5 Technology HDF5 is a data model, library and file format for managing data. HDF/HDF-EOS Workshop XIII

  7. HDF5 Technology • HDF5 (Abstract) Data Model • Defines the “building blocks” for data organization and specification • Files, Groups, Datasets, Attributes, Datatypes, Dataspaces, … • HDF5 Library (C, Fortran 90, C++ APIs) • Also Java Language Interface and High Level Libraries • HDF5 Binary File Format • Bit-level organization of HDF5 file • Defined by HDF5 File Format Specification • Tools For Accessing Data in HDF5 Format • h5dump, h5repack, HDFView, … HDF/HDF-EOS Workshop XIII

  8. HDF5 Abstract Data Modela.k.a. HDF5 Logical Data Modela.k.a. HDF5 Data Model HDF/HDF-EOS Workshop XIII

  9. HDF5 File lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 An HDF5 file is a container that holds data objects. Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 HDF/HDF-EOS Workshop XIII

  10. HDF5 Groups and Links HDF5 groups and links organize data objects. / SimOut Viz Experiment Notes: Serial Number: 99378920 Date: 3/13/09 Configuration: Standard 3 lat | lon | temp ----|-----|----- 12 | 23 | 3.1 15 | 24 | 4.2 17 | 21 | 3.6 HDF/HDF-EOS Workshop XIII

  11. HDF5 Objects The two primary HDF5 objects are: • HDF5 Group: A grouping structure containing zero or more HDF5 objects • HDF5 Dataset: Raw data elements, together with information that describes them (There are other HDF5 objects that help support Groups and Datasets.) HDF/HDF-EOS Workshop XIII

  12. HDF5 Groups • Used to organize collections • Every file starts with a root group • Similar to UNIX directories • Path to object defines it • Objects can be shared: • /A/k and/B/l are the same “/” C A B temp l k temp = Group = Dataset HDF/HDF-EOS Workshop XIII

  13. HDF5 Datasets HDF5 Datasets organize and contain your “raw data values”. They consist of: • Your raw data • Metadata describing the data: - The information to interpret the data (Datatype) - The information to describe the logical layout of the data elements (Dataspace) - Characteristics of the data (Properties) - Additional optional information that describes the data (Attributes) HDF/HDF-EOS Workshop XIII

  14. HDF5 Dataset Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer (optional) Attributes Properties Time = 32.4 Chunked Pressure = 987 Compressed Temp = 56 HDF/HDF-EOS Workshop XIII

  15. HDF5 Dataspaces An HDF5 Dataspace describes the logical layout for the data elements: • Array • multiple elements in dataset organized in a multi-dimensional (rectangular) array • maximum number of elements in each dimension may be fixed or unlimited • NULL • no elements in dataset • Scalar • single element in dataset HDF/HDF-EOS Workshop XIII

  16. HDF5 Dataspaces Two roles: Dataspace contains spatial information (logical layout) about a dataset stored in a file • Rank and dimensions • Permanent part of dataset definition Partial I/0: Dataspace describes application’s data buffer and data elements participating in I/O Rank = 2 Dimensions = 4x6 Rank = 1 Dimension = 10 HDF/HDF-EOS Workshop XIII

  17. HDF5 Datatypes The HDF5 datatype describes how to interpret individual data elements. HDF5 datatypes include: • integer, float, unsigned, bitfield, … • user-definable (e.g., 13-bit integer) • variable length types (e.g., strings) • references to objects/dataset regions • enumerations - names mapped to integers • opaque • compound (similar to C structs) HDF/HDF-EOS Workshop XIII

  18. HDF5 Dataset 3 5 V Datatype: 16-byte integer Dataspace: Rank = 2 Dimensions = 5 x 3 HDF/HDF-EOS Workshop XIII

  19. HDF5 Properties • Properties (also known as Property Lists) are characteristics of HDF5 objects that can be modified • Default properties handle most needs • By changing properties one can take advantage of the more powerful features in HDF5 HDF/HDF-EOS Workshop XIII

  20. Storage Properties Data elements stored physically adjacent to each other Contiguous (default) Better access time for subsets; extensible Chunked Improves storage efficiency, transmission speed Chunked & Compressed HDF/HDF-EOS Workshop XIII

  21. HDF5 Attributes (optional) • An HDF5 attribute has a name and a value • Attributes typically contain user metadata • Attributes may be associated with - HDF5 groups - HDF5 datasets - HDF5 named datatypes • An attribute’s value is described by a datatype and a dataspace • Attributes are analogous to datasets except… - they are NOT extensible - they do NOT support compression or partial I/O HDF/HDF-EOS Workshop XIII

  22. HDF5 Abstract Data Model Summary • The Objects in the Data Model are the “building blocks” for data organization and specification • Files, Groups, Links, Datasets, Datatypes, Dataspaces, Attributes, … • Projects using HDF5 “map” their data concepts to these HDF5 Objects HDF/HDF-EOS Workshop XIII

  23. HDF5 Software HDF/HDF-EOS Workshop XIII

  24. HDF5 Software Layers & Storage Tools … High Level APIs API h5dump tool h5repack tool HDFview tool Java Interface HDF5 Data Model ObjectsGroups, Datasets, Attributes, … HDF5Library Language Interfaces Tunable PropertiesChunk Size, I/O Driver, … C, Fortran, C++ Memory Mgmt Datatype Conversion Chunked Storage Version Compatibility and so on… Internals Filters Virtual File Layer Split Files Posix I/O Custom MPI I/O I/O Drivers Storage HDF5 File Format ? File on Parallel Filesystem Split Files File Other HDF/HDF-EOS Workshop XIII

  25. HDF5 API and Applications aClimate Model MATLAB Applications EOS library Domain DataObjects … HDF5 Library Storage HDF/HDF-EOS Workshop XIII

  26. HDF5 Home Page HDF5 home page: http://hdfgroup.org/HDF5/ • Two releases: HDF5 1.8 and HDF5 1.6 HDF5 source code: • Written in C, and includes optional C++, Fortran 90 APIs, and High Level APIs • Contains command-line utilities (h5dump, h5repack, h5diff, ..) and compile scripts HDF pre-built binaries: • When possible, include C, C++, F90, and High Level libraries. Check ./lib/libhdf5.settings file. • Built with and require the SZIP and ZLIB external libraries HDF/HDF-EOS Workshop XIII

  27. Useful Tools For New Users h5dump: Tool to “dump” or display contents of HDF5 files h5cc, h5c++, h5fc: Scripts to compile applications HDFView: Java browser to view HDF4 and HDF5 files http://www.hdfgroup.org/hdf-java-html/hdfview/ HDF/HDF-EOS Workshop XIII

  28. h5dump Utility h5dump [options] [file] -H, --header Display header only – no data -d <names> Display the specified dataset(s). -g <names> Display the specified group(s) and all members. -p Display properties. <names> is one or more appropriate object names. HDF/HDF-EOS Workshop XIII

  29. “/” Example of h5dump Output HDF5 "dset.h5" { GROUP "/" { DATASET "dset" { DATATYPE { H5T_STD_I32BE } DATASPACE { SIMPLE ( 4, 6 ) / ( 4, 6 ) } DATA { 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24 } } } } ‘dset’ HDF/HDF-EOS Workshop XIII

  30. HDF5 Compile Scripts • h5cc – HDF5 C compiler command • h5fc – HDF5 F90 compiler command • h5c++ – HDF5 C++ compiler command To compile: % h5cc h5prog.c % h5fc h5prog.f90 HDF/HDF-EOS Workshop XIII

  31. Compile option: -show -show: displays the compiler commands and options without executing them % h5cc –show Sample_c.c Will show the correct paths and libraries used by the installed HDF5 library. Will show the correct flags to specify when building an application with that HDF5 library. HDF/HDF-EOS Workshop XIII

  32. Browsing HDF5 Files with HDFView HDF/HDF-EOS Workshop XIII

  33. HDFView Structure of File Contents of Dataset HDF/HDF-EOS Workshop XIII

  34. HDFView File Menu HDF/HDF-EOS Workshop XIII

  35. HDF-EOS5 File in HDFView HDF/HDF-EOS Workshop XIII

  36. Introduction to HDF5 Programming Model and APIs HDF/HDF-EOS Workshop XIII

  37. Operations Supported by the API • Create objects (groups, datasets, attributes, complex data types, …) • Assign storage and I/O properties to objects • Perform complex subsetting during read/write • Use variety of I/O “devices” (parallel, remote, etc.) • Transform data during I/O • Make inquiries on file and object structure, content, properties HDF/HDF-EOS Workshop XIII

  38. General Programming Paradigm • Properties of object are optionally defined • Creation properties • Access properties • Object is opened or created • Object is accessed, possibly many times • Object is closed HDF/HDF-EOS Workshop XIII

  39. Order of Operations • An order is imposed on operations by argument dependencies For Example: A file must be opened before a dataset -because- the dataset open call requires a file handle as an argument. • Objects can be closed in any order. HDF/HDF-EOS Workshop XIII

  40. The General HDF5 API • Currently C, Fortran 90, Java, and C++ bindings. • C routines begin with prefix H5? ? is a character corresponding to the type of object the function acts on Example Functions: H5D :Dataset interface e.g.,H5Dread H5F :File interface e.g.,H5Fopen H5S : dataSpace interface e.g.,H5Sclose HDF/HDF-EOS Workshop XIII

  41. HDF5 Defined Types For portability, the HDF5 library has its own defined types: hid_t: object identifiers (native integer) hsize_t: size used for dimensions (unsigned long or unsigned long long) herr_t: function return value hvl_t: variable length datatype For C, include hdf5.h in your HDF5 application. HDF/HDF-EOS Workshop XIII

  42. The HDF5 API • For flexibility, the API is extensive • 300+ functions • This can be daunting… but there is hope • A few functions can do a lot • Start simple • Build up knowledge as more features are needed Victronix Swiss Army Cybertool 34 HDF/HDF-EOS Workshop XIII

  43. Basic Functions H5Fcreate (H5Fopen) create (open) File H5Screate_simple/H5Screate create dataSpace H5Dcreate (H5Dopen) create (open) Dataset H5Dread, H5Dwrite access Dataset H5Dclose close Dataset H5Sclose close dataSpace H5Fclose close File NOTE: The order specified above is not required. HDF/HDF-EOS Workshop XIII

  44. Other Common Functions DataSpaces: H5Sselect_hyperslab (Partial I/O) H5Sselect_elements (Partial I/O) H5Dget_space Groups: H5Gcreate, H5Gopen, H5Gclose Attributes: H5Acreate, H5Aopen_name, H5Aclose, H5Aread, H5Awrite Property lists: H5Pcreate, H5Pclose H5Pset_chunk, H5Pset_deflate HDF/HDF-EOS Workshop XIII

  45. High Level APIs • Included along with the HDF5 library • Simplify steps for creating, writing, and reading objects. • Do not entirely ‘wrap’ HDF5 library HDF/HDF-EOS Workshop XIII

  46. Example HDF5 Code HDF/HDF-EOS Workshop XIII

  47. Steps to Create a File • Decide on properties the file should have and create them if necessary: • Creation properties, like size of user block • Access properties (improve performance) • Use default properties (H5P_DEFAULT) 2. Create the file 3. Close the file and the property lists, as needed HDF/HDF-EOS Workshop XIII

  48. Code: Create a File hid_t file_id; herr_t status; file_id = H5Fcreate("file.h5", H5F_ACC_TRUNC, H5P_DEFAULT, H5P_DEFAULT); status = H5Fclose (file_id); “/” (root) Note: Return codes not checked for errors in code samples. HDF/HDF-EOS Workshop XIII

  49. Metadata Data Dataspace Rank Dimensions 3 Dim_1 = 4 Dim_2 = 5 Dim_3 = 7 Datatype Integer Properties Chunked Compressed Dataset Components HDF/HDF-EOS Workshop XIII

  50. A Steps to Create a Dataset 1. Define dataset characteristics a) Datatype – integer b) Dataspace - 4x6 c) Properties if needed, or use H5P_DEFAULT 2. Decide where to put it • Obtain location ID: • Group ID puts it in a Group • File ID puts it in Root Group 3. Create dataset in file 4. Close everything “/” (root) HDF/HDF-EOS Workshop XIII

More Related