UML Representation of NPOESS Data Products in HDF5

UML Representation of NPOESS Data Products in HDF5 Denise Dulaigh Chad Johnson William Johnsen, Ph.D. NPOESS Program Raytheon Company Aurora, Colorado

The National Polar-orbiting Operational Environmental Satellite System (NPOESS) • Collects and distributes global satellite sensor data and creates NPOESS Data Products. • NPOESS Data Products are packaged into HDF5 files and delivered. • Delivered data can be read an any platform supported by HDF5. • There are a number of different types of data, discussed in further slides. • To meet Operational Latency requirements, the data is originally produced in granules (data blocks) that allow the system to operate efficiently. • A change in the size of the granules critically affects production efficiency. • Native metadata is associated with granules and aggregated granules. • Geolocation data in separate HDF5 files are referenced from NPOESS Data Product HDF5 files.

NPOESS data – user view • How does our granule size affect users? • The size of the granules may not be the most desirable means to receive, analyze, store, or use the contained data. • The characteristics (metadata) of individual granules need to be retained for analysis and archival purposes. • To increase usability: • NPOESS provides a means to package (aggregate) multiple granules into a single deliverable HDF5 file. • Metadata names are FGDC compliant. • For consistent access to data in the produced HDF5 files, access to data uses hyperslab references.

NPOESS data types • Raw Data Records (RDRs) – Full resolution, unprocessed time-referenced digital sensor data and other data required to compute SDRs. • Temperature Data Records (TDRs) – Geolocated, antenna temperatures with all relevant calibration data counts and ephemeris data, from passive microwave sensors only. • Sensor Data Records (SDRs) – Computed from RDRs; full resolution sensor data that are time referenced, earth located, and calibrated. • Environment Data Records (EDRs) – Data records that contain the environmental parameters or imagery required to be generated as user products, produced by applying an appropriate set of algorithms to SDRs. • Deliverable Intermediate Products (IPs) – Data records that are produced by applying an appropriate set of algorithms to SDRs, but is not on the official list of deliverable EDRs. Currently, Cloud Mask is the only deliverable IP. • Selected data, depending on the user: auxiliary data, ancillary data. • The UML diagram, data structure, and data types of all NPOESS HDF5 data is documented in the Common Data Format Control Book – External (CDFCB-X) ** Formal definitions (not operational) from the NPOESS Glossary

Schematic of an HDF5 Dataset • A multidimensional array of data elements • Header with metadata • Dataspace (intrinsic) • Datatype (intrinsic) • Storage layout • Attributes Dataset Header Array Datatype Data start time = 32.4 Dataspace data type = ‘SDR’ int16 algorithm = ‘1.1’ Datatype Attributes Dim_3=2 2 Rank Dim_2=4 Chunked; compressed Dim_1=5 Storage layout Dimensions

Secondary Data Attributes Root Attributes Schematic of an HDF5 FileAggregated Granule Root File Data Product Secondary Data Data Product Attributes Data Granule 1 Aggregated Granules Granule N Secondary Data Data Product Array Datatype Array Datatype Array Datatype Data Ref 1 Data Ref 2 Data Ref 3 * Data arrays grow in 1 direction only

Secondary Data Attributes Root Attributes Schematic of an HDF5 FileGranule 1 Root File Data Product Secondary Data Data Product Attributes Data Granule 1 Aggregated Granules Granule N Secondary Data Data Product Array Datatype Array Datatype Array Datatype Data Ref 1 Data Ref 2 Data Ref 3 * Data arrays grow in 1 direction only

UML CASE tools to describe HDF5 file layout • To more efficiently communicate and better control our HDF5 structures, we use Computer-Aided Software Engineering (CASE) tools to design the structure. • Since HDF5 data structures have object-oriented characteristics, we use Unified Modeling Language (UML) to describe the nature of our HDF file structure. • HDF5 groups, datasets, and array datatypes are UML classes; HDF5 attributes and references are UML attributes. Stereotypes are used to differentiate HDF5 objects. • UML class derivation (HDF5 group inclusion) and multiplicity (HDF5 file structural relationships) give the user an idea about what to expect before viewing the HDF5 file. • Data is stored in a separate HDF5 group (Data group). • Each array datatype contains one or more hyperslab references to a subset of data in the Data group.

Schematic to UML mapping File Root Attributes Data Product Attributes Dataset Attributes Datatype Data Ref Group – HDF structural element – a Group that also contains a header and data – internal to HDF, used by datasets to reference read functions, included in Dataset – where the NPOESS metadata is stored – hyperslab reference to Data group * Data types are explained on chart 15-16 Dataset Datatype Attributes Data Ref

Example UML HDF File Structure(EDR) • There is a single primary Product Group and zero or more secondary Groups of supporting data. • Each Product Group is accessed from the <<Root>> HDF element. • Multiple Granule datasets may be provided per Product Group, depending on data requested from NPOESS. An aggregation dataset will always exist regardless of the number of Granules. The aggregation dataset will contain aggregated attributes that characterize the set of all Granules. • Each Granule dataset as well as the Aggregation dataset contain a single Array Datatype. • The Array Datatype contains hyperslab references to the Data group. * Data types are explained on chart 15-16

Example Complete HDF5 File Structure (EDR) Root Group Aggregate Dataset Data Product Group Data Group Hidden from view Secondary Group Granule Dataset As applicable * Data types are explained on chart 15-16

Example Complete HDF5 File Structure (EDR) Granule Dataset * Data types are explained on chart 15-16

HDF5 file structure – Example CDFCB-X Table for RDR Product Granule Group (primary data) * Data types are explained on chart 15-16

HDF5 file structure – Example CDFCB-X Table for RDR Auxiliary Data Calibration Coefficients Group (secondary data) * Data types are explained on chart 15-16

HDF5 file structure – HDF5 data type to CDFCB-X cross-reference

HDF5 file structure – HDF5 attribute to CDFCB-X cross-reference

Outstanding considerations • Chunking for SDRs and EDRs • Compression functions and analysis of NPOESS data records • Storage of bit-meaningful data in bit-field data types

Questions/Comments

UML Representation of NPOESS Data Products in HDF5

UML Representation of NPOESS Data Products in HDF5

Presentation Transcript

HDF5 Life cycle of data

Statistics: Representation of Data

Data Representation

Using HDF5 Features with NPOESS data Performance and File Format Issues

Representation of spatial data

Data Representation

Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files

Representation of Data

Representation of data types

Data Representation in Bioinformatics

NPP / NPOESS Product Profile of HDF5

Representation of spatial data

Data Flow in UML

Profile of NPOESS HDF5 Files

Profile of NPOESS HDF5 Files

Representation of Data Structures in OCAML

DATA REPRESENTATION

Profile of National Polar-Orbiting Operational Satellite System (NPOESS) HDF5 Files

Data Flow in UML

Lattice Representation of Data