1 / 12

Unidata Infrastructure for Data Services

Unidata Infrastructure for Data Services. Russ Rew GO-ESSP Workshop, LLNL 2006-06-19. Some Current Unidata Infrastructure Projects. LDM for distributing and processing near real-time data

Download Presentation

Unidata Infrastructure for Data Services

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unidata Infrastructure for Data Services • Russ Rew • GO-ESSP Workshop, LLNL • 2006-06-19

  2. Some Current Unidata Infrastructure Projects • LDM for distributing and processing near real-time data • Integrated Data Viewer (IDV) for testing infrastructure in platform-independent data visualization and analysis • NetCDF C-based interfaces for data access • CFIOlib for a CF conventions API (tomorrow) • NetCDF Java for advanced data access infrastructure • Common Data Model for improving interoperability • NcML for metadata annotation and data aggregation • THREDDS Data Server (TDS) for remote access to archives • GALEON for serving netCDF data through OGC Web Coverage Services (WCS)

  3. LDM-6 for Internet Data Distribution • Implements a peer-to-peer system for reliable, event-driven data distribution • Supports subscriptions to many near real-time data feeds; no data center needed • Data product abstraction is general: model output, observations, text products, satellite data, radar, … • Protocols use persistent connections to achieve low latency • Highly configurable: inject, distribute, capture, filter, and process arbitrary data products • In continuous use by over 160 universities, NOAA, USGS, NASA, internationally, THORPEX global ensembles (TIGGE), … • Candidate for use in new WMO weather information system

  4. IDV (Integrated Data Viewer) • Freely available 100% Java reference application and framework for visualization and analysis of geoscience data • Provides integrated and time synchronized 2-D and 3-D visualizations of model outputs, observed, and remotely sensed data, using U. of Wisc. VisAD • Handles diverse formats and protocols for local and remote access: GRIB, netCDF, OPeNDAP, ADDE, HTTP, GIS, … • Serves as end-to-end test for many Unidata technologies: THREDDS services, Java netCDF, XML bundles, plug-in architecture, interactive collaboration, …

  5. NetCDF’s Niche • Simple data model for scientific datasets • Portable, self-describing data • Appendable, sharable, archivable • Direct access for efficient subsetting • Metadata via attribute conventions such as CF • Flexible remote access via OPeNDAP, HTTP, WCS • Lots of applications: NCO, ncbrowse, ncview, IDV, IDL, MATLAB, ArcGIS, ... • Language interfaces include C, Java, Fortran, C++, Perl, Python, Ruby, ...

  6. NetCDF-3 Data Model File location: Filename create( ), open( ), … DataType char byte short int float double Attribute name: String type: DataType values: 1D array Dimension name: String length: int isUnlimited( ) Variable name: String shape: Dimension[ ] type: DataType array: read( ), … Variables and attributes have one of six primitive data types. A file has named variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One dimension may be of unlimited length.

  7. Some NetCDF-3 Limitations • Only one shared unlimited dimension • No structures, just scalars and multidimensional arrays • No strings, just arrays of characters • Limited numeric types • No ragged arrays or nested structures • Only ASCII characters in names • Changes to file schema can be expensive • Efficient access requires reads in same order as writes • No built-in compression • Only serial I/O • Flat name space limits scalability

  8. NetCDF-4 Features to Address Limitations • Multiple unlimited dimensions • Portable structured types • String type • Additional numeric types • Variable-length types for ragged arrays • Unicode names • Efficient dynamic schema changes • Multidimensional tiling (chunking) • Per variable compression • Parallel I/O • Nested scopes using Groups

  9. Variable name: String shape: Dimension[ ] type: DataType array: read( ), … File location: Filename create( ), open( ), … PrimitiveType char byte short int int64 float double unsigned byte unsigned short unsigned int unsigned int64 string UserDefinedType typename: String Attribute name: String type: DataType values: 1D array Enum Opaque Compound VariableLength Group name: String Dimension name: String length: int isUnlimited( ) NetCDF-4 Data Model (Common Data Access Model) DataType Variables and attributes have one of twelve primitive data types or one of four user-defined types. A file has a top-level unnamed group. Each group may contain one or more named subgroups, variables, dimensions, and attributes. Variables also have attributes. Variables may share dimensions, indicating a common grid. One or more dimensions may be of unlimited length.

  10. NetCDF-4 Architecture NetCDF Java applications NetCDF-3 applications NetCDF-4 applications HDF5 applications • NetCDF-4 uses HDF5 for storage, high performance • Parallel I/O • Chunking for efficient access in different orders, efficient use of compression • Conversion using “reader makes right” approach • Provides simple netCDF interface to subset of HDF5 • Also supports netCDF classic and 64-bit formats NetCDF Java application NetCDF-3 application NetCDF-4 application HDF5 application netCDF Java netCDF-4 netCDF-3 HDF5 Java VM POSIX I/O MPI I/O …

  11. Status of NetCDF-4 • NetCDF-4.0-alpha14 currently available for testing • Files created with alpha release use unsupported artifacts • We’re seeking feedback on performance and functionality • NetCDF-4.0-beta waiting for HDF5 1.8-beta • Will finalize file format, eliminate necessity for artifacts • Expected within a few weeks of HDF5 1.8-beta release, maybe by August 2006 • HDF5 1.8 currently expected by November 2006 • Has enhancements specifically for netCDF-4: variable creation order, Unicode names, dimension scales, on-the-fly numeric conversions • Plans for netCDF-4.1 and beyond on netCDF-4 web site

  12. Summary • Unidata’s LDM-6 implements an event-driven architecture for low-latency data distribution • Unidata’s IDV provides a platform-independent visualization and analysis framework and reference application for integrating data from diverse sources • Unidata’s netCDF-4 software preserves backward compatibility and eliminates many limitations of netCDF-3 with only a modest increase in complexity

More Related