CDM / TDS Point Observation Data John Caron UCAR/Unidata July 02, 2009
Unidata’s Common Data Model • Abstract data model for scientific data • And an implementation • NetCDF-Java library • Selected capabilities are being pushed into netCDF4 C library • Translate file’s “native data model” into higher-level semantic model • “bottom-up” vs “top-down” design
WCS WMS OPeNDAP NetcdfSubset HTTPServer Feature types Coordinate Systems NcML Data Access CDM/TDS Software Stack THREDDS Data Server Netcdf-Java CDM Library File Format Readers NetCDF HDF OPeNDAP USPLN NEXRAD … GEMPAK GINI MCIDAS GRIB BUFR …
CDM CDM Clients and Servers Server (eg TDS) Client (eg IDV) Web Services Remote access File Readers File Readers Local Datasets
Feature Types • Abstract data model for scientific data • And an implementation in the CDM • Specialized APIs • Subset by space/time coordinates • vs subset by index ranges • Intended to scale to large, multifile collections
Feature Types : Gridded Data float data(t,z,y,x); • Grid: multidimensional grid, separable coordinates • Radial: a connected set of radials using polar coordinates collected into sweeps • Swath: a two dimensional grid, track and cross-track coordinates
Feature Types: Point data float data(sample); • Point: measured at one point in time and space • Station: time-series of points at the same location • Profile: points along a vertical line • Station Profile: a time-series of profilesat same location. • Trajectory: points along a 1D curve in time/space • Section: a collection of profilefeatures which originate along a trajectory.
Point Feature Dataset API Iterator<PointData> getData( LatLonRect boundingBox, Date start, Date end);
New Stuff – THREDDS Data Server (TDS) 4.1 • Collections of Point Datasets • Like Aggregation of Gridded Data now • Access via web services • Subset by bounding box, time range and parameter • Deliver NetCDF-CF files • Sept 2009 release (beta)
Configuring the TDS <tdsConfig> <datasetCollection name="Metar Station Data" featureType="Station" path="station/metar"> <collection spec="/data/metar/**/METAR_#yyyyMMdd_HHmm#.nc" recheckEvery="15 min" /> </datasetCollection> </tdsConfig>
Collection Datasets • Creates both a collection dataset and access to the individual files • Allows subsetting by bounding box, time range, parameter • Data management (NRDBMS) • Data stays in original form (BUFR, netCDF, ..) • Possibly external indexes • Integrate with realtime feeds / scouring
WCS WMS OPeNDAP NetcdfSubset HTTPServer Feature types Coordinate Systems NcML Data Access What Web Service Protocol ? THREDDS Data Server … ??? Netcdf-Java CDM Library File Format Readers NetCDF HDF OPeNDAP USPLN NEXRAD … GEMPAK GINI MCIDAS GRIB BUFR …
Which Web Service Protocol ? • NetcdfSubsetService / “CDM Remote” • Binary: NetCDF-CF • ASCII : CSV, ad-hoc XML • Experimental REST interface • OPeNDAP ? • Need subsetting in coordinate space • Dapper Sequences - needs expanding, good fit • Sensor Observation Service (OGC) ? • Good fit • Where does SOS metadata come from? Typically not in the data files. Must be added “by hand” to TDS. • Interoperability in the real world is hard.
What output do your clients want? • NetCDF-CF • Standard in modelling community for gridded data • CSV / Excel • most popular ad-hoc format • GML • Rich metadata, GIS community is adopting • Complex, centralized, “committee design” • promised interoperability yet to be seen • Can it handle structures such as profiles?
Proposed CF Conventions for Point Observation Data(http://cf-pcmdi.llnl.gov/trac/wiki/PointObservationConventions) • Many existing files already store point data in netCDF-3, but not standardized. • CF Convention has 2 simple examples, no guidance for more complex situations • Proposed Conventions in Oct 2008 • Sporadic ongoing discussions • Validate by implementing in CDM • NetCDF-4 format probably much easier
Storing Ragged Arrays in netCDF-3 Rectangularize the Array: use maximum size of the ragged array, use missing values • Works well if avg ~ max • Or if you will store/transmit compressed Linearize the Array: put all elements of the ragged array into a 1D array • Connect using index ranges • Connect using linked lists • Connect by matching field values (relational) • Index join
Conclusions • CDM Point Feature API initial implementation • Framework for many different datasets, formats • Testing and evolving implementation • Nested Table notation provides a flexible way to characterize 1D point datasets • Proposed CF Conventions for Point Data • TDS 4.1 will provide point data subsetting services • Probably support multiple protocols, driven by customer needs