290 likes | 409 Views
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions. John Caron Unidata/UCAR Sep 2007. Contents. Overview THREDDS Data Server Unidata’s Common Data Model. 1) Request. 0) Client. 2) Server. 3) Response. 0) The Client. What functionality is needed ? Scientific User Raw data
E N D
THREDDS, CDM, OPeNDAP, netCDF and Related Conventions John Caron Unidata/UCAR Sep 2007
Contents • Overview • THREDDS Data Server • Unidata’s Common Data Model
1) Request 0) Client 2) Server 3) Response
0) The Client What functionality is needed? • Scientific User • Raw data • Drill down to arbitrary detail • Decision Support • “best effort” Visualization • operational
1) The Request What functionality is possible? • Analogous to SQL language for RDBMS • Implies a Data Model • OGC vs File access APIs • NetCDF/OPeNDAP/HDF5 : index space • WXS : coordinate space • Higher semantic level trumps if no significant extra cost. • File APIs become implementation, not interface
1) WCS Request • Functionality • Subsetting (bounding box, time range, variable) • Optional reprojection/resample • Variants: KML/XML/SOAP+XML/REST • Optional Functionality : 42 flavors • Bad news for interoperability • Is there an elephant to dictate a standards? • Eg IBM chose SQL/Relational model (1984)
2) Server How do I serve my data? • Do I need specialized personnel? • $$, resource consumption, core competency • What are the common requests? • (that I should optimize for)?
3) Response What comes back? • Has to be a representation of the “answer” in the Data Model • WCS allows anything • Cant write a generic client • Communities will form around a small number of variants • No elephants in sight
3) Response : XML vs. binary • Extensibility vs. Efficiency • Binary: netCDF/GeoTIFF/HDF/etc • reflect favorite formats of committee members • Different data models : ideally need a formal mapping (but there arent any yet) • Domain experts can make use • GML closely follows the OGC/ISO data model (WFS requires GML)
3) Response : XML vs. binary • GML is waaaay too complex • Ambitious • OGC/ISO models are complex • Reality is complex • XML Schema is a disaster • Google KML • “visualization format” not “data storage”
THREDDS Data Server HTTP Tomcat Server catalog.xml Application THREDDS Server • WCS • OPeNDAP • HTTPServer • NetcdfSubset NetCDF-Java library configCatalog.xml Datasets IDD Data motherlode.ucar.edu
THREDDS Catalogs • XML over HTTP • Hierarchical listing of online resources (datasets) • Container for arbitrary search metadata • Standard set maps to DC, GCMD, ADN • Unidata/NCAR-CDP • Metadata can be inherited • Design goal: Make it easy for data providers • TDS uses extended version for configuration • Data Access URLS • “Crossing the protocol boundary”
THREDDS OPeNDAP Server • OPenDAP is protocol for remote access to CDM • Current version 2.0; NASA ESE standard • Working on new 4.0 protocol spec • Based on Java-OPeNDAP library • shared development by Unidata/opendap.org • Any CDM dataset can be served • Server4 (Hyrax): • latest version of opendap.org C++ library • THREDDS Catalogs replace dods_dir
THREDDS WCS service • CDM files that have Grid coordinate system • evenly spaced x,y • Allow to subset the dataset by: • Lat/lon or projection bounding box • time and vertical coordinate range • list of Variables • Return formats • GeoTIFF floating point, grayscale • NetCDF/CF-1.0 • No reprojections, resamplings • Uses WCS 1.0, work on WCS 1.1 in progress
NetCDF Subset Service • Experiment with REST style web service • Allow to subset the dataset by: • Lat/lon bounding box • time and vertical coordinate range • list of Variables • NetCDF/CF, XML, CSV (spreadsheet) • Gridded Data • Output is a CF-1.0 netCDF file • Variation of WCS (simplified request protocol) • Grid as Point Datasets (experimental) • Extract vertical profile, time series from one point in model data • Station Data: metars (7 day rolling archive)
Common Data Model HTTP Tomcat Server catalog.xml Application THREDDS Server • WCS • OPeNDAP Then a miracle happens • HTTPServer • NetcdfSubset NetCDF-Java library hostname.edu Datasets IDD Data
THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java architecture NetcdfDataset CoordSystem Builder NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …
Common Data Model File Formats • General: NetCDF, HDF5, OPeNDAP • Gridded: GRIB-1, GRIB-2 • Radar: NEXRAD level II and level III, DORADE, Chinese NEXRAD • Point: BUFR • Satellite: DMSP, GINI • In Progress: NetCDF4, McIdas AREA, NPOESS, NOAA CLASS legacy files, Barrowdale DataBlade, others
Scientific Datatypes Point Trajectory Station Profile Radial Grid Swath Common Data Model Layers Coordinate Systems Data Access
Common Data Model (Data Access Layer)
NetCDF-4 file format • NetCDF-4 C library • 4.0 Beta implements CDM access layer • Persistence format for complete CDM • 4.1: adding Coordinate Systems • Optional layer, focus on CF-1 (libcf) • 4.?: merge OPeNDAP access • NetCDF-Java library will read, maybe write
TDS / NcML aggregation <dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-CONUS_4km"> <netcdf > <aggregation dimName="time" type="joinNew"> <scan location="/data/ldm/pub/satellite/3.9/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf> </dataset>
Scientific DataTypes • Based on datasets Unidata is familiar with • APIs are evolving • How are data points connected? • Intended to scale to large, multifile collections • Intended to support “specialized queries” • Space, Time • Intend to create “standard” NetCDF file encoding conventions
Scientific DataTypes • Grids • Structured • Swath • Unstructured • Point Observation • Unconnected • Station / Time Series • Trajectory • Profile • Radial
Climate and Forecast (CF) Conventions • Conventions for encoding coordinate systems, other semantics in netCDF • Working for 10 years • Version 1.0 in 2003 • Good for gridded data • Current working goups • Point/Station/Trajectory/Profile observations • CRS (map to OGC) • Governance in place • Volunteer: motivated, practical, real
Summary: Unidata’s directions • Client: both Scientific User and Decision Support • Request in coordinate space • WCS is fine, not a big architectural decision • Server: TDS • Files in native format, augmented by indexing/DB • Response: netCDF/CF and GeoTIFF/KML or WMS/JPEG