1 / 52

HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999

HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999. Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign. Topics. I. Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5. I. HDF Overview.

bran
Download Presentation

HDF HDF/HDF-EOS Workshop III Sept. 14-16, 1999

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HDFHDF/HDF-EOS Workshop IIISept. 14-16, 1999 Mike Folk, HDF Group http://hdf.ncsa.uiuc.edu/ National Center for Supercomputing Applications University of Illinois at Urbana-Champaign

  2. Topics I. Overview II. NCSA HDF Activities III. HDF5 IV. HDF4 vs. HDF5

  3. I. HDF Overview

  4. HDF Mission To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery.

  5. What is HDF? • Scientific data file format & supporting software • For images, arrays, tables, other structures • Features • Portability across architectures • I/O library • Files • Efficient I/O • Efficient storage

  6. Why use HDF? • Manage data • Share data • Use software that understands HDF • Improve I/O performance • Improve storage efficiency • Use an open standard

  7. An HDF File: A Collection of Scientific Data Objects HDF file containing four 3-D arrays

  8. Mixing HDF Objects in One File 3-D array group Raster image palette Lat lon temp ---- ---- ----- 12 23 3.1 15 24 4.2 17 21 3.6 16 35 5.7 HDF file Raster image 3-D array Table

  9. HDF Software • Utilities and applications for manipulating, viewing, and analyzing data. • HDF I/O library • High-level, object-specific APIs. • Low-level API for I/O to files, etc. • File or other data source. General Applications } Application Programming Interfaces Low-level Interface HDF file

  10. HDF Applications Software • Free software • NCSA HDF library and utilities • Other software • Commercial/other software that “understands” • all of HDF (Noesys, IDL, HDF Explorer) • certain HDF objects (MATLAB, WebWinds) • certain HDF applications (SHARP, WIM) • http://hdf.ncsa.uiuc.edu/tools.html

  11. What platforms does HDF run on? • Sun: Solaris • SGI: Indy, Power Challenge, Origin, Cray C90, YMP, T3E • HP9000, HP-Convex Exemplar • IBM: RS6000, SP2 • DEC: Alpha/Digital UNIX, OpenVMSVAX: OpenVMS • Intel: Solarisx86, Linux, FreeBSD, Windows NT/98 • PowerPC: Mac-OS University of Illinois at Urbana-Champaign

  12. A Sampling of HDF Users NCSA-affiliated Science teams Visualization, data exch, fast I/O, ... Mathworks, Fortner Software, Format supported by vendors of visResearch Systems Inc., etc. and data analysis software Boeing Space-time change detection in images Distributed Oceanographic Data Remote access to earth science dataSystem (DODS) Army Research Lab Network distributed global memory Center for Analysis & Prediction Fast parallel I/O, portability, of Storms multi-resolution grids TRAPPIST Exchange, analysis & visualization of (Euro consortium) non-destructive testing data

  13. Major User #1: EOSDIS • ESDIS Project • open standard exchange format and I/O library for EOSDIS • EOS applications • HDF requirements • Earth science data types (HDF-EOS, etc,) • User support for scientists, data producers, etc. • Library and file structure improvements • HDF tools, utilities, access software • Software maintenance and QA

  14. Major User #2: ASCI • ASCI Data Models and Formats (DMF) Group • open standard exchange format and I/O library for ASCI • DOE tri-lab ASCI applications • HDF requirements • large datasets (> a terabyte) • ASCI data types, especially meshes • good performance in massive parallel environments • primarily HDF 5

  15. II. NCSA HDF Activities

  16. Java applications • HDF APIs • Basis for tools that access HDF • HDF Viewers • HDF browser/visualizer • HDF4 Data Server Prototype • Lessons learned about remote access to

  17. Remote Data Access • The SDB: Web-based Server-side Data Browser • Java for remote access • WP-ESIP: DODS project • Computational Grids (Globus/GASS)

  18. HDF Standardization • To share files, users must organize them similarly. • HDF user groups create standard profiles • Ways to organize data in HDF files. • Metadata • API • Examples: HDF-EOS, ASCI DMF

  19. HDF-EOS software layers HDF-EOS API General Applications HDF-EOS Applications HDF-EOS profiles Application Programming Interfaces Low-level Interface HDF file

  20. “HDF Configuration Record” (HCR) • To simplify the tasks of defining, comparing, and producing HDF-EOS files • Formal (ODL) descriptions of HDF-EOS objects

  21. HCR of Swath • /* Project XYZ */ • /* First version defined on June 10th, 1998 */ • OBJECT = SWATH • NAME = SCAN1 • OBJECT = Dimension • NAME = GeoTrack • Size = 1200 • END_OBJECT = Dimension • OBJECT = Dimension • NAME = GeoCrossTrack • Size = 205 • END_OBJECT = Dimension • OBJECT = Dimension • NAME = DataX • Size = 2410 • END_OBJECT = Dimension • END_OBJECT = SWATH • END

  22. HCR • HCR Utilities: • Converters: HCR  HDF-EOS • Edit HCR and HDF-EOS • Compare HCR with HDF-EOS file • Current projects: • Extend HCR converters to all of HDF4 • Similar work with HDF5 • XML too

  23. III. HDF5

  24. Why HDF5? • HDF shortcomings exposed by EOSDIS, ASCI and others... • Limits on object & file size (<2GB) • Limited number of of objects (<20K) • Rigid data models • I/O performance • Aging software infrastructure (code entropy)

  25. …new Demands... • Bigger, faster machines and storage systems • massive parallelism, parallel file systems • teraflop speeds, terabyte storage • Greater complexity • complex data structures • complex subsetting • More emphasis on remote & distributed access

  26. … and ASCI Requirements • Compatibility with vector bundle model • Compatibility with MPI-IO • Ability to transform data between memory & storage • Parallel file systems: PIOFS, HPSS, etc.

  27. New HDF5 Features • More scalable • Larger arrays and files • More objects • Improved data model • New datatypes • Single comprehensive dataset object • Improved software • More flexible, robust library • More flexible API • More I/O options

  28. HDF5 data model • Two primary objects • Dataset • multidimensional array of elements • rich variety of datatypes • group • directory-like structure • contains datasets, groups, other objects

  29. Dataset components • multidimensional array • header with metadata • datatype • dataspace • attributes • storage properties

  30. Simple datatypes • The usual scalars: integer & float • user-defined scalars (e.g. 13-bit integers) • variable length (e.g. strings) • pointers to objects or regions of datasets • enumeration • opaque

  31. Compound datatypes • User-defined • Comparable to C structs • Members can be simple or compound types • Members can be multidimensional

  32. Data Spaces • How data are organized to form a dataset • rank • dimensions • Subsetting during I/O operations • What subset of data is to be moved • In-memory organization of data • In-file organization of data

  33. 3 HDF5 dataset: array of records 5 int8 int4 int16 float32 Datatype: Record Dimensionality: 5 x 3

  34. DataspacesReading Dataset into Memory from File File Memory 2D array of integers 3D array of floats Read

  35. Selection: Examples of mappings between file selections and memory selections. (a) A hyperslab from a 2D array to the corner of a smaller 2D array (b) A regular series of blocks from a 2D array to a contiguous sequence at a certain offset in a 1D array (c) A sequence of points from a 2D array to a sequence of points in a 3D array. (d) Union of slabs in file to union of slabs in memory. No. of elements must be equal.

  36. Attributes • Named pieces of data • Stored in a dataset or group header • Operations are scaled­down versions of the dataset operations • Not extendible • No compression • No partial I/O

  37. Property list • Properties of objects or operations • Describe how to create, store, access and transfer data

  38. Some Properties File B Dataset “Fred” File A Data for Fred Metadata for Fred Better subsetting access time; extendable • chunked • compressed • extendable • split file Improves storage efficiency, transmission speed Datasets can be extended in any direction Metadata in one file, raw data in another.

  39. Dataset components Dataset Metadata Data Attributes Dataspace time = 32.4 pressure = 987 temp = 56 Datatype int16 Dim_3=2 Storage properties Dim_2=4 Rank=2 Chunked; compressed Dim_1=5

  40. Groups • Structures for organizing the file • Like Vgroups in HDF4 • Like directories in hierarchical file system • Every file starts with a root group • Groups have attributes

  41. Groups “root” • A mechanism for collections of related objects • Every file starts with a root group • Can have attributes • Like directories in Unix, but a graph, rather than a tree

  42. Groups root Groups and members of groups can be shared

  43. Mounting File A root root mount! mount! File B

  44. Reading & writing with HDF5 • Set properties • Describe the data • datatypes • rank and dimensions • mapping between file and memory • Read/write

  45. Files needn’t be files - Virtual File Layer VFL: A public API for writing I/O drivers Hid_t “File” Handle VFL: Virtual File I/O Layer I/O drivers stdio mpio memory network “Storage” Memory Network Files

  46. HDF5 tools • Current • hdf5ls - lists contents of HDF5 file • h5dumper - higher level view • hdf5hdf4 converter • Future • Convert HDF5 ascii, binary, GIFF, etc • Convert HDF4  HDF5 • Java tools - VisAD, etc. • File/code generation from DDL description • Talking to vendors

  47. Other HDF5 activities • Performance tuning • Object model • Fortran and C++ API • Thread-safe HDF5

  48. IV. HDF4 vs. HDF5

  49. HDF4 Original format and library Compatible with all earlier versions 6 primary objects multidim array of scalars raster image, palette table annotation group Biggest current user: Earth Observing System Data and Info System (EOSDIS) HDF5 - successor to HDF4 New format and library Not compatible with earlier versions 2 primary objects multidim. array of records group Biggest current user: Accelerated Strategic Computing Initiative (ASCI) HDF4 vs. HDF5

  50. HDF4 object types can be derived from HDF5 datasets and groups HDF5 group HDF5 dataset HDF4 Vgroup lat lon temp 12 23 3.1 15 24 4.2 17 21 3.6 23 35 7.2 25 31 6.3 03 04 43 43 43 -3 72 44 50 34 March 15, 1990. 2-dim array of 45 77 34 23 57 Simulation with k=10.0, beta=1.22e3. Calculate 45 67 87 00 45 multi-component the magnitude ... scalars HDF4 SDS HDF4 Vdata n-dim array 1-dim array of scalars of records HDF4 24-bit raster HDF4 8-bit raster

More Related