1 / 61

HDF Update

HDF. HDF Update. Mike Folk National Center for Supercomputing Applications HDF and HDF-EOS Workshop IX December 1, 2005. Outline. Organizational info HDF Software Update Other Activities of Interest. Organizational info. The HDF Team. Frank Baker Christian Chilan Peter Cao

clem
Download Presentation

HDF Update

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDF HDF Update Mike Folk National Center for Supercomputing Applications HDF and HDF-EOS Workshop IX December 1, 2005

  2. Outline • Organizational info • HDF Software Update • Other Activities of Interest

  3. Organizational info

  4. The HDF Team Frank Baker Christian Chilan Peter Cao Vailin Choi Mike Folk Fang Guo Anne Jennings Barbara Jones Quincey Koziol James Laird Raymond Lu John Mainzer Pedro Nunes Elena Pourmal Binh-minh Ribler Eric Shapiro Rishi Sinha Arash Termehchy Kent Yang And all those wonderful folks out there who contribute ideas, requests, bug reports, code, and support.

  5. HDF The HDF Group is Moving

  6. “The HDF Group” = “THG”

  7. THG • Why spin off from U of Illinois? • Creating a sustainable organization • We do more than R&D • THG already exists

  8. How will THG be different from the NCSA HDF Group? • Business model • Location • Staff • THG – NCSA – UIUC relations • Affect on NASA and other affiliation • Intellectual property

  9. HDF Software Update

  10. Major software milestones since Oct. 2004 HDF Java 2.1 HDF Web browser plug-in HDF 4.2r1 HDF5 1.6.4 HDF4-to-HDF5 conversion tools 1.2 HDF Java 2.2 HDF5 1.6.5 Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov 2004 2005

  11. Release highlights

  12. HDF 4.2r1 – February 2005 • Szip compression fixed • Windows • hdiff and hrepack added • Config, build, testing procedures improved • h4fc utility fixed

  13. Mac OS X Fortran IBM xlf v. 8.1 Absoft f95 v. 8.2 AMD Opteron Cray TS IEEE Linux 2.4 Absoft Fortran f95 v. 9.0 PGI C and Fortran Intel C and Fortran HDF 4.2r1 – new compilers and platforms

  14. HDF5-1.6.4 – March 2005 • High-Level (HL) library • Some new C APIs added • Fortran APIs added • HL library now built and installed by default • Library built and tested with SZIP 2.0. • Many changes to improve library performance • Especially for variable length types and metadata cache • H5jam – a new utility • Allows a text file to be added to the "user block" at the beginning of an HDF5 file

  15. Operating systems Solaris 2.8 HPUX B.11.00 Crays T3E and T90 Linux RH 7.* and 8.* Windows 2000 Compilers We use the latest versions of vendors compilers as they become available and drop the previous ones Platforms to be dropped in future releases

  16. Systems Solaris 2.10 Cray X1 Cray XT3 NEC SX6 HP 64-bit (HPUX 11.23) Mac OS 10.4 Compilers gcc 4.* HDF5 Fortran: Leahy, NAG, G95 MPI-2 Platforms to be added

  17. Coming next: Major release HDF5 1.8 • Windows MPICH support: prototype • Integer to float conversions • Will support integer to float conversions during I/O • http://hdf.ncsa.uiuc.edu/RFC/dtype_conv_overflow/Overflow.html • New error-handling API • Dimension scales • Similar to dimension scales in HDF4 • http://hdf.ncsa.uiuc.edu/RFC/H5DimScales/H5dimscale_Specification_1_0-5.pdf

  18. N-bit compression filter • Compact storage for user-defined datatypes. • http://hdf.ncsa.uiuc.edu/RFC/NBitPacking/NBitPacking.html • Offset+size storage filter • Performs a scale and/or offset operation on each data value, truncating the resulting value to a lesser number of bits before storing it. • http://hdf.ncsa.uiuc.edu/RFC/ScaleOffsetCompress/ScaleOffsetCompress.html

  19. Group revisions • Option to access objects according to creation order • Improved performance for groups containing a large number of objects. • http://hdf.ncsa.uiuc.edu/RFC/ReviseGroups/ • Improved metadata cache • New metadata cache improves performance and memory usage in the library. • Apps that access files with a large number of objects should see significant performance improvement and should use less memory.

  20. Data transformation filter • Performs data transformation during I/O operations. • Transform expressed by algebraic formula (e.g. a*x + b) • http://hdf.ncsa.uiuc.edu/HDF5/doc_dev_snapshot/H5_dev/html/RM_H5P.html#Property-SetDataTransform • Ph5diff – parallel h5diff • Compares two files in an MPI parallel environment. • Compares multiple datasets simultaneously. • http://hdf.ncsa.uiuc.edu/RFC/PH5DIFF/

  21. HDFpacket API  • Data collected in “packets” • “Horizontal” view, per time step • Efficient access to fixed- and variable-length records • http://hdf.ncsa.uiuc.edu/RFC/HDF5Packet/Tech_reprt_HDF5Packet.pdf • Possible: HDFtime_history API • Archival, viewing, analysis • “Vertical” view, per parameter

  22. SZIP integration with HDF4 and HDF5 • Development and integration completed • Includes libraries and tools • SZIP documentation web page • http://hdf.ncsa.uiuc.edu/doc_resource/SZIP/ • Examples and performance studies for HDF5

  23. Parallel I/O and chunking • Collective I/O – key to improving performance for parallel HDF5 • Current versions only allow collective I/O for regular selection in contiguous storage • Expanding use of collective IO in HDF5 • For regular selection in chunked storage • For irregular selection for both chunked and contiguous storage

  24. Java and other tools

  25. Tools development • HDF4 • hrepack and hdiff performance improved • H4 to H5 Conversion Tools • Updated to HDF4.2r1, HDF5-1.6.4 • H5jam • New tools to add/remove user block in front of file • H5dump • Faster for files with large numbers of objects • Can dump contents of the boot block • Can dump dataset filters, storage layout, fill value • Parallel h5diff • Enables h5diff to run in parallel

  26. HDF Java Products

  27. HDFView changes • Support for Storage Resource Broker (SRB) • HDF5 object level access to remote files • Display HDF5 compound datatypes with arrays • Create/display HDF5 named datatypes • Create links in HDF5 • Improve ability to manipulate palette • Select row/column for xy plot in the table view

  28. New Functions in Java API • Request an individual object without loading entire structure of file • Send client request to SRB server and receive result from server • Create HDF5 indexing table • Query for HDF5 datasets

  29. HDF Web-browser Plug-in • Extends browser to display HDF4/5 files • A ‘lite” version of HDFView • Analogous to PDF reader • Fewer browsing features • No editing features • Windows Only

  30. HDF Web-browser Plug-in • Not an applet • It is downloaded and installed once • An applet is downloaded with each invocation • http://hdf.ncsa.uiuc.edu/plugins/

  31. HDF-EOS module for HDFView • Developed by HDF-EOS team • Optional module for HDF-EOS files • Reads, displays HDF-EOS grid, swath, etc. • (Generic modules show native HDF5 objects) • Tested with HDFView 2.3 • To do -- get permission to release with HDFView

  32. Future work for Java • Add OPeNDAP client support to HDFview • Seamlessly retrieve data from any OPeNDAP server • Support HDF5 Dimension Scales • Recognize geospatial coordinates • Support for HDF5 Indexing • Create indexing table and query HDF5 datasets • H5Gen • Generate HDF5 file from XML file

  33. Other Activities of Interest

  34. DOE/ASC* “ASC provides the integrating simulation and modeling capabilities and technologies needed …for future design assessment and certification of nuclear weapons and their components” • Massively parallel computing and I/O • Complex data models and big data • HDF5 a standard format for ASC apps * “Advanced Simulation and Computing Program”

  35. Boeing

  36. BoeingHDF5 for flight test data • Commercial (Boeing 787) and military planes • 787 active archive • HDFtime_history • 10 TB per flight-test day • Also post-testing data • Must handle raw, real-time data • Variable-length datatypes/records • High speed ingest • HDFpacket API

  37. Boeing High Level API’s • HDFpacket (see above) • HDFtime_history • Structured records for archive, analysis, viewing • “Vertical” view, per parameter

  38. Object encryption to support access control • For Boeing • Investigated the role of encryption in developing access control • Developed a prototype, now being tested

  39. Indexing

  40. Projection Indexes in HDF5 • Standardize indexing in HDF5 • Make indexes portable • Just a prototype • See Rishi Sinha’s talk

  41. Product model data

  42. Product data exchange – STEP • STEP is an ISO data transfer standard. • Defines characteristics of product throughout its life cycle. • Widely used in design and manufacturing. • Uses EXPRESS data modeling language to describe data. STEP

  43. STEP Limitations • Currently text-based format • Requires all the objects to be in memory • Apps starting to produce very large data volumes • EU looking for a binary equivalent for STEP

  44. HDF5 as binary format for STEP • EU identified HDF5 as best candidate • Prototype in the works • EXPRESS  HDF5 mappings • Convert sample data collections • Workshop at U of Illinois next week. • National Archives also funding HDF study.

  45. Bioinformatics

  46. DNA sequencing workflows • Diverse formats, some proprietary • Highly redundant data • Repeated file processing • Disconnected programs • In-core processing models • Lack of persistence

  47. Multiple Levels of Information SNP Score Contig Summaries Discrepancies Contig Qualities Coverage Depth Trace Reads Aligned bases Read quality Contig Percent match

  48. HDF5 as binary format for bioinformatics

More Related