Update on hdf5 1 8
This presentation is the property of its rightful owner.
Sponsored Links
1 / 72

Update on HDF5 1.8 PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

HDF. Update on HDF5 1.8. The HDF Group HDF and HDF-EOS Workshop X November 28, 2006. Why HDF5 1.8?. … as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know.

Download Presentation

Update on HDF5 1.8

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Update on hdf5 1 8

HDF

Update on HDF5 1.8

The HDF Group

HDF and HDF-EOS Workshop X

November 28, 2006


Why hdf5 1 8

Why HDF5 1.8?


Update on hdf5 1 8

… as we know, there are known knowns; there are things we know we know.

We also know there are known unknowns; that is to say we know there are some things we do not know.

But there are also unknown unknowns -- the ones we don't know we don't know.

Donald Rumsfeld

HDF and HDF-EOS Workshop X, Landover MD


Some things we knew we knew

Some things we knew we knew

  • Need high level APIs – image, etc.

  • Need more datatypes - packed n-bit, etc.

  • Need external and other links

  • Tools needed – h5pack, etc.

  • Caching embellishments

  • Eventually, multithreading

HDF and HDF-EOS Workshop X, Landover MD


Things we knew we did not know

Things we knew we did not know

  • New requirements from EOS and ASCI

  • New applications that would use HDF5

  • How HDF5 would really perform in parallel

  • What new tools, features and options needed

  • New APIs, API features

HDF and HDF-EOS Workshop X, Landover MD


Things we didn t know we didn t know

Things we didn’t know we didn’t know

  • Completely unanticipated applications

    • New data types and structures

      • E.g. DNA sequences

    • New operations

      • E.g. write many real-time streams simultaneously

HDF and HDF-EOS Workshop X, Landover MD


Hdf5 1 8 topics

HDF5 1.8 topics

  • Dataset and datatype improvements

  • Group improvements

  • Link Revisions

  • Shared object header nessages

  • Metadata cache improvements

  • Other improvements

  • Platform-specific changes

  • High level APIs

  • Parallel HDF5

  • Tool improvements

HDF and HDF-EOS Workshop X, Landover MD


Dataset and datatype improvements

Dataset and Datatype Improvements


Text based data type descriptions

Text-based data type descriptions

  • Why:

    • Simplify datatype creation

    • Make datatype creation code more readable

    • Facilitate debugging by printing the text description of a data type

  • What:

    • New routine to create a data type through the text description of the data type:H5LTdtype_to_text

HDF and HDF-EOS Workshop X, Landover MD


Text data type description example

Text data type description – Example

  • Create a datatype of compound type.

    /* Create the data type with text description */

    dtype = H5Ttext_to_type(

    “typedef struct foo {int a; float b;} foo_t;”)

    /* Convert the data type back to text */

    H5Ttype_to_text(dtype, NULL, H5T_C, &tsize)

HDF and HDF-EOS Workshop X, Landover MD


Serialized datatypes and dataspaces

Serialized datatypes and dataspaces

  • Why:

    • Allow datatype and dataspace info to be transmitted between processes

    • Allow datatype/dataspace to be stored in non-HDF5 files

  • What:

    • A new set of routines to serialize/deserialize HDF5 datatypes and dataspaces.

HDF and HDF-EOS Workshop X, Landover MD


Int to float convert during i o

Int to float convert during I/O

  • Why: Convert ints to floats during I/O

  • What: Int to float conversion supported during I/O

HDF and HDF-EOS Workshop X, Landover MD


Revised conversion exception handling

Revised conversion exception handling

  • Why: Give apps greater control over exceptions (range errors, etc.) during datatype conversion.

  • What: Revised conversion exception handling

HDF and HDF-EOS Workshop X, Landover MD


Revised conversion exception handling1

Revised conversion exception handling

  • To handle exceptions during conversions, register handling function through H5Pset_type_conv_cb().

  • Cases of exception:

    • H5T_CONV_EXCEPT_RANGE_HI

    • H5T_CONV_EXCEPT_RANGE_LOW

    • H5T_CONV_EXCEPT_TRUNCATE

    • H5T_CONV_EXCEPT_PRECISION

    • H5T_CONV_EXCEPT_PINF

    • H5T_CONV_EXCEPT_NINF

    • H5T_CONV_EXCEPT_NAN

  • Return values: H5T_CONV_ABORT, H5T_CONV_UNHANDLED, H5T_CONV_HANDLED

HDF and HDF-EOS Workshop X, Landover MD


Compression filter for n bit data

Compression filter for n-bit data

  • Why:

    Compact storage for user-defined datatypes

  • What:

    • When data stored on disk, padding bits chopped off and only significant bits stored

    • Supports most datatypes

    • Works with compound datatypes

HDF and HDF-EOS Workshop X, Landover MD


N bit compression example

N-bit compression example

  • In memory, one value of N-Bit datatype is stored like this:

    | byte 3 | byte 2 | byte 1 | byte 0 |

    |????????|????SPPP|PPPPPPPP|PPPP????|

    S-sign bit P-significant bit ?-padding bit

  • After passing through the N-Bit filter, all padding bits are chopped off, and the bits are stored on disk like this:

    | 1st value | 2nd value |

    |SPPPPPPP PPPPPPPP|SPPPPPPP PPPPPPPP|...

  • Opposite (decompress) when going from disk to memory

HDF and HDF-EOS Workshop X, Landover MD


Offset size storage filter

Offset+size storage filter

  • Why:Use less storage when less precision needed

  • What:

    • Performs scale/offset operation on each value

    • Truncates result to fewer bits before storing

    • Currently supports integers and floats

  • Example

    H5Pset_scaleoffset (dcr,H5Z_SO_INT,H5Z_SO_INT_MINBITS_DEFAULT);

    H5Dcreate(……, dcr)

    H5Dwrite (…);

HDF and HDF-EOS Workshop X, Landover MD


Example with floating point type

Example with floating-point type

  • Data: {104.561, 99.459, 100.545, 105.644}

  • Choose scaling factor: decimal precision to keepE.g. scale factor D = 2

    1. Find minimum value (offset): 99.459

    2. Subtract minimum value from each element

    Result: {5.102, 0, 1.086, 6.185}

    3. Scale data by multiplying 10D = 100

    Result: {510.2, 0, 108.6, 618.5}

    4. Round the data to integer

    Result: {510 , 0, 109, 619}5. Pack and store using min number of bits

HDF and HDF-EOS Workshop X, Landover MD


Null dataspace

“NULL” Dataspace

  • Why:

    • Allow datasets with no elements to be described

    • NetCDF 4 needed a “place holder” for attributes

  • What:

    • A dataset with no dimensions, no data

HDF and HDF-EOS Workshop X, Landover MD


Group improvements

Group improvements


Access links by creation time order

Access links by creation-time order

  • Why:

    • Allow iteration & lookup of group’s links (children) by creation order as well as by name order

    • Support netCDF access model for netCDF 4

  • What: Option to access objects in group according to relative creation time

HDF and HDF-EOS Workshop X, Landover MD


Compact groups

“Compact groups”

  • Why:

    • Save space and access time for small groups

    • If groups small, don’t need B-tree overhead

  • What:

    • Alternate storage for groups with few links

  • Example

    • File with 11,600 groups

    • With original group structure, file size ~ 20 MB

    • With compact groups, file size ~ 12 MB

    • Total savings: 8 MB (40%)

    • Average savings/group: ~700 bytes

HDF and HDF-EOS Workshop X, Landover MD


Better large group storage

Better large group storage

  • Why: Faster, more scalable storage and access for large groups

  • What: New format and method for storing groups with many links

HDF and HDF-EOS Workshop X, Landover MD


Intermediate group creation

Intermediate group creation

  • Why:

    • Simplify creation of a series of connected groups

    • Avoid having to create each intermediate group separately, one by one

  • What:

    • Intermediate groups can be created when creating an object in a file, with one function call

HDF and HDF-EOS Workshop X, Landover MD


Example add intermediate groups

/

/

A

A

B

C

dset1

Example: add intermediate groups

  • Want to create “/A/B/C/dset1”

  • “A” exists, but “B/C/dset1” do not

H5Dcreate(file_id, “/A/B/C/dset1”,..)

One call creates groups “B” & “C”, then creates “dset1”

HDF and HDF-EOS Workshop X, Landover MD


Link revisions

Link Revisions


What are links

<address>

“/target dataset”

What are links?

Links connect groups to their members

“Hard” links point to a target by address

“Soft” links store the path to a target

root group

Hard link

Soft link

dataset

HDF and HDF-EOS Workshop X, Landover MD


New external links

“target dataset”

<address>

“dataset EL”

“file2.h5”

“target dataset”

New: external Links

  • Why: Access objects by file & path within file

  • What:

    • Store location of file and path within that file

    • Can link across files

file2.h5

root group

file1.h5

root group

dataset

HDF and HDF-EOS Workshop X, Landover MD


New user defined links

New: User-defined Links

  • Why:

    • Allow applications to create their own kinds of links and link operations, such as

      • Create “hard” external link that finds an object by address

      • Create link that accesses a URL

      • Keep track of how often a link accessed, or other behavior

  • What:

    • App can create new kinds of links by supplying custom callback functions

    • Can do anything HDF5 hard, soft, or external links do

HDF and HDF-EOS Workshop X, Landover MD


Shared object header messages

Shared Object Header Messages


Shared object header messages1

Dataset 1

Dataset 2

Dataset 3

datatype

datatype

datatype

dataspace

dataspace

dataspace

data 1

data 2

data 3

Shared object header messages

  • Why: metadata duplicated many times, wasting space

  • Example:

    • You create a file with 10,000 datasets

    • All use the same datatype and dataspace

    • HDF5 needs to write this information 10,000 times!

HDF and HDF-EOS Workshop X, Landover MD


Shared object header messages2

Shared object header messages

What:

  • Enable messages to be shared automatically

  • HDF5 shares duplicated messages on its own!

Dataset 1

Dataset 2

datatype

dataspace

data 1

data 2

HDF and HDF-EOS Workshop X, Landover MD


Shared messages

Shared Messages

  • Happens automatically

  • Works with datatypes, dataspaces, attributes, fill values, and filter pipelines

  • Saves space if these objects are relatively large

  • May be faster if HDF5 can cache shared messages

  • Drawbacks

    • Usually slower than non-shared messages

    • Adds overhead to the file

      • Index for storing shared datatypes

      • 25 bytes per instance

    • Older library versions can’t read files with shared messages

HDF and HDF-EOS Workshop X, Landover MD


Two informal tests

Two informal tests

  • File with 24 datasets, all with same big datatype

    • 26,000 bytes normally

    • 17,000 bytes with shared messages enabled

    • Saves 375 bytes per dataset

  • But, make a bad decision: invoke shared messages but only create one dataset…

    • 9,000 bytes normally

    • 12,000 bytes with shared messages enabled

    • Probably slower when reading and writing, too.

  • Moral: shared messages can be a big help, but only in the right situation!

HDF and HDF-EOS Workshop X, Landover MD


Metadata cache improvements

Metadata cache improvements


Metadata cache improvements1

Metadata Cache improvements

  • Why:

    • Improve I/O performance and memory usage when accessing many objects

  • What:

    • New metadata cache APIs

      • control cache size

      • monitor actual cache size and current hit rate

    • Under the hood: adaptive cache resizing

      • Automatically detects the current working size

      • Sets max cache size to the working set size

HDF and HDF-EOS Workshop X, Landover MD


Metadata cache improvements2

Metadata cache improvements

  • Note: most applications do not need to worry about the cache

  • See “Advanced topics” for details

  • And if you do see unusual memory growth or poor performance, please contact us. We want to help you.

HDF and HDF-EOS Workshop X, Landover MD


Other improvements

Other improvements


New extendible error handling api

New extendible error-handling API

  • Why: Enable app to integrate error reporting with HDF5 library error stack

  • What: New error handling API

    • H5Epush - push major and minor error ID on specified error stack

    • H5Eprint – print specified stack

    • H5Ewalk – walk through specified stack

    • H5Eclear – clear specified stack

    • H5Eset_auto – turn error printing on/off for specified stack

    • H5Eget_auto – return settings for specified stack traversal

HDF and HDF-EOS Workshop X, Landover MD


Attribute improvements

Attribute improvements

  • Why:

    • Use less storage when large numbers of attributes attached to a single object

    • Iterate over or look up attributes by creation order

  • What:

    • Property to create index on the order in which the attributes are created

    • Improved attribute storage

HDF and HDF-EOS Workshop X, Landover MD


Support for unicode character set

Support for Unicode Character Set

  • Why:

    • So apps can create names using Unicode

    • netCDF 4 needed this

  • What

    • UTF-8 Unicode encoding now supported

    • For string datatypes, names of links and attributes

  • Example:

    H5Pset_char_encoding(lcpl_id, H5T_CSET_UTF8)

    H5Llink(file_id, "UTF-8 name", …, lcpl_id, …);

HDF and HDF-EOS Workshop X, Landover MD


Efficient copying of hdf5 objects

Efficient copying of HDF5 objects

  • Why:

    • Enable apps to copy objects efficiently

  • What

    • New routines to copy an object in an HDF5 file within the current file or to another file

    • Done at a low-level in the HDF5 file, allowing

      • Entire group hierarchies to be copied quickly

      • Compressed datasets to be copied without going through a decompression/compression cycle

HDF and HDF-EOS Workshop X, Landover MD


Performance of object copy routines

Performance of object copy routines

HDF and HDF-EOS Workshop X, Landover MD


Data transformation filter

Data transformation filter

  • Why:

    • Apply arithmetic operations to data during I/O

  • What:

    • Data transformation filter

    • Transform expressed by algebraic formula

    • Only +, -, *, and /supported

  • Example:

    • Expression parameter set, such as x*(x-5)

    • When dataset read/written, x*(x-5) applied per element

    • When reading, values in file are unchanged

    • When writing, transformed data written to file

HDF and HDF-EOS Workshop X, Landover MD


Stackable virtual file drivers

Stackable Virtual File Drivers

  • What is Virtual File Driver (VFD)?

HDF and HDF-EOS Workshop X, Landover MD


Structure of hdf5 library

Structure of HDF5 Library

  • Object API (C, Fortran 90, Java, C++)

  • Specify objects and transformation properties

  • Invoke data movement operations and data transformations

  • Library internals

  • Performs data transformations and other prep for I/O

  • Configurable transformations (compression, etc.)

  • Virtual file I/O (C only)

  • Perform byte-stream I/O operations (open/close, read/write, seek)

  • User-implementable I/O (stdio, network, memory, etc.)

HDF and HDF-EOS Workshop X, Landover MD


Stackable vfd

Stackable VFD

  • HDF5 VFD allows

    • Storing data using different physical file layout. E.g., Family VFD (writes file as “family of files”)

    • Doing different types of I/O. E.g., stdio (standard I/O); MPI-I/O (for parallel I/O)

HDF and HDF-EOS Workshop X, Landover MD


Stackable vfd1

Stackable VFD

  • Why “stackable:”

    • Before now, only one VFD could be used at a time

    • VFDs could not inter-operative

  • What is “stackable:”

    • A Non-terminal VFD may stack on top of compatible non-terminal and eventually Terminal VFD’s

  • Two kinds of VFD

    • Non-terminal (e.g. Family)

    • Terminal (e.g. stdio; MPI-I/O)

HDF and HDF-EOS Workshop X, Landover MD


Stackable vfd2

Stackable VFD

Application

HDF5 API

Non-terminal

VFD

split

Family File

Default I/O path

metadata

rawdata

Terminal

VFD

Sec2

stdio

mpiio

HDF5 Files

HDF and HDF-EOS Workshop X, Landover MD


Platform specific changes

Platform-specific changes


Platform specific changes1

Platform-specific changes

  • Why: Better UNIX/Linux Portability

  • What:

    • 1.8 uses latest GNU “auto” tools (autoconf, automake, libtool)

      • improves portability between many machine and OS configurations

    • Build can now be done in parallel

      • with gmake “–j” flag

      • speeds up build, test and install processes

    • Build infrastructure includes many other improvements as well

HDF and HDF-EOS Workshop X, Landover MD


Platforms to be dropped

Operating systems

HPUX 11.00

MAC OS 10.3

AIX 5.1 and 5.2

SGI IRIX64-6.5

Linux 2.4

Solaris 2.8 and 2.9

Compilers

GNU C compilers older than 3.4 (Linux)

Intel 8.*

PGI V. 5.*, 6.0

MPICH 1.2.5

Platforms to be dropped

http://www.hdfgroup.org/HDF5/release/alpha/obtain518.html

HDF and HDF-EOS Workshop X, Landover MD


Platforms to be added

Systems

Alpha Open VMS

MAC OSX 10.4 (Intel)

Solaris 2.* on Intel (?)

Cray XT3

Windows 64-bit (32-bit binaries)

Linux 2.6

BG/L

Compilers

g95

PGI V. 6.1

Intel 9.*

MPICH 1.2.7

MPICH2

Platforms to be added

HDF and HDF-EOS Workshop X, Landover MD


High level apis

High level APIs


High level fortran apis

High-Level Fortran APIs

  • Fortran APIs have been added for H5Lite, H5Image and H5Table.

HDF and HDF-EOS Workshop X, Landover MD


Dimension scales

Dimension scales

  • Similar to

    • Dimension scales in HDF4

    • Coordinate variables in netCDF

  • What is a dimension scale ?

    • An HDF5 dataset with additional metadata that identifies the dataset as a “Dimension Scale”

    • Associated with dimensions of HDF5 datasets

    • Meaning of the association is left to applications

  • A Dimension scale can be shared by two or more dataset dimensions

HDF and HDF-EOS Workshop X, Landover MD


Dimension scales example

Dimension scales example

HDF Explorer image

HDF and HDF-EOS Workshop X, Landover MD


Dimension scales example1

Dimension scales example

HDF Explorer image

HDF and HDF-EOS Workshop X, Landover MD


Sample dimension scale functions

Sample dimension scale functions

  • H5DSset_scale: convert dataset to a dimension scale

  • H5DSattach_scale: attach scale to a dimension

  • H5DSdetach_scale: detach scale from a dimension

  • H5DSis_attached: verify if scale attached to dataset

  • H5DSget_scale_name: read name of scale

HDF and HDF-EOS Workshop X, Landover MD


Hdf5packet

HDF5Packet

  • Why:

    • High performance table writing

    • For data acquisition, when there are many sources of data

    • E.g. flight test

  • What:

    • Each row is a “packet”: a collection of fields, fixed or variable length

    • Append only

    • Indexed retrieval

HDF and HDF-EOS Workshop X, Landover MD


Packets in hdf5

Packets in HDF5

Variable-length records

Fixed-length data records

Data

Data

Time

Time

Data

Data

Data

Data

.

.

.

.

.

.

HDF and HDF-EOS Workshop X, Landover MD


Parallel hdf5

Parallel HDF5


Collective i o improvements

Collective I/O improvements

  • Why

    • Collective I/O not available for chunked data

    • Collective I/O not available for complex selections

    • Collective I/O is key to improving performance for parallel HDF5

  • What

    • Collective I/O works for chunked storage

    • Works for irregular selections for both chunked and contiguous storage

HDF and HDF-EOS Workshop X, Landover MD


Parallel h5diff ph5diff

Parallel h5diff (ph5diff)

  • Compares two files in an MPI parallel environment.

  • Compares multiple datasets simultaneously

HDF and HDF-EOS Workshop X, Landover MD


Windows mpich support

Windows MPICH support

  • Windows MPICH support: prototype

HDF and HDF-EOS Workshop X, Landover MD


Tool improvements

Tool improvements


New features for old tools

New features for old tools

  • h5dump

    • Dump data in binary format

    • Faster for files with large numbers of objects

  • h5diff

    • Can now compare dataset regions

    • Parallel ph5diff now available

  • h5repack

    • Efficient data copy using H5Gcopy()

    • Able to handle big datasets

HDF and HDF-EOS Workshop X, Landover MD


New hdf5 tools

New HDF5 Tools

  • h5copy

    • Copies a group, dataset or named datatype from one location to another

    • Copies within a file or across files

  • h5repart

    • Partition file into a family of files

  • h5import

    • Import binary/ascii data into an HDF5 file

  • h5check

    • Verifies an HDF5 file against the defined HDF5 File Format Specification

  • h5stat

    • Reports statistics about a file and objects in a file

HDF and HDF-EOS Workshop X, Landover MD


Thank you

Thank You


Questions comments

Questions/comments?


For more information

For more information

  • Go to http://www.hdfgroup.org/HDF5/

  • Click on “Obtain HDF5 1.8.0 Alpha”

  • Look at table “Information”

HDF and HDF-EOS Workshop X, Landover MD


Acknowledgement

Acknowledgement

This report is based upon work supported in part by a Cooperative Agreement with NASA under NASA NNG05GC60A. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Aeronautics and Space Administration.


  • Login