1 / 48

THREDDS Data Server Unidata’s Common Data Model Background / Summary

THREDDS Data Server Unidata’s Common Data Model Background / Summary. John Caron Unidata/UCAR Mar 2007. THREDDS Data Server. HTTP Tomcat Server. catalog.xml. Application. THREDDS Server. WCS. OPeNDAP. HTTPServer. NetcdfSubset. NetCDF-Java library. motherlode.ucar.edu.

asa
Download Presentation

THREDDS Data Server Unidata’s Common Data Model Background / Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. THREDDS Data ServerUnidata’s Common Data ModelBackground / Summary John Caron Unidata/UCAR Mar 2007

  2. THREDDS Data Server HTTP Tomcat Server catalog.xml Application THREDDS Server • WCS • OPeNDAP • HTTPServer • NetcdfSubset NetCDF-Java library motherlode.ucar.edu Datasets IDD Data

  3. THREDDS Catalogs • XML over HTTP • Hierarchical listing of online resources (datasets) • Container for arbitrary search metadata • Standard set maps to DC, GCMD, ADN • Unidata/CDP • Metadata can be inherited • Design goal: Make it easy for data providers • TDS uses for configuration • Client view vs. server view • Data Access URLS • “Crossing the protocol boundary”

  4. catalog.xml

  5. Motherlode catalog example

  6. THREDDS WCS 1.0 Server • Each (gridded) Dataset is WCS • Each Grid is a Coverage • Return formats • GeoTIFF: floating point, greyscale • NetCDF / CF-1.0 (same as NetcdfSubset Service) • No reprojections, resampling • GALEON 2 • upgrade to WCS 1.1 • Try returning point datasets

  7. THREDDS OPeNDAP Server • Current version 2.0; NASA ESE standard • Working on new 4.0 protocol spec • Based on Java-OPeNDAP library • shared development by Unidata/opendap.org • Any CDM dataset can be served • Server4 (Hyrax): • latest version of opendap.org C++ library • uses THREDDS catalog generation code • THREDDS Catalogs replace dods_dir

  8. Common Data Model HTTP Tomcat Server catalog.xml Application THREDDS Server • WCS • OPeNDAP Then a miracle happens • HTTPServer • NetcdfSubset NetCDF-Java library hostname.edu Datasets IDD Data

  9. THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder ADDE NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …

  10. I/O Service Provider Implementations • General: NetCDF, HDF5, OPeNDAP • Gridded: GRIB-1, GRIB-2 • Radar: NEXRAD level 2 and 3, DORADE, Chinese NEXRAD • Point: BUFR, ASCII • Satellite: DMSP, GINI, McIDAS AREA • In development / tentative • NOAA CLASS legacy files • Barrowdale DataBlade

  11. Scientific Datatypes Point Trajectory Station Profile Radial Grid Swath Common Data Model Layers Coordinate Systems Data Access

  12. NetCDF-4 and Common Data Model (Data Access Layer)

  13. NetCDF-4 C library • 4.0 Beta implements CDM access layer • complete, but waiting for HDF5 release 1.8 to finalize file format (Maybe this month, 1.5 years late!) • Persistence format for complete CDM • 4.1: adding Coordinate Systems • Optional layer, focus on CF-1 (libcf) • 4.?: merge OPeNDAP access (pending funding)

  14. Coordinate Systems UML

  15. NcML: NetCDF Markup Language XML representation of netCDF metadata • Core: netCDF data access model • Coordinate System: general and georeferencing coordinate system • Dataset: redefine, aggregate, subset Luca Cinquini (NCAR/SCD/ESG), John Caron, Ethan Davis, Bob Drach (LLNL), Stefano Nativi (Florence), Russ Rew

  16. NcML • NcML Coordinate Systems further developed into NcML-G by Stefano et al. • NcML Core and Dataset combined into single schema to allow dataset modification • Aggregation: • Union • Syntactic join on (existing or new) outer dimension • Semantic aggregation of (runtime, forecast time) = Forecast Model Run Collection

  17. NcML example <?xml version="1.0" encoding="UTF-8"?> <netcdf xmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" location=“/data/nids/N0R_20041119_2147"> <attribute name=“cdm_datatype" value=“Radial" /> <remove type=“attribute” name=“password" /> <variable name="Reflectivity" orgName=“R34768”> <attribute name="units" value=“dBZ" /> </variable> </netcdf>

  18. TDS / NcML example <datasetScan name="Ocean Satellite Data" path="ocean/sat" dirLocation="R:/tds/netcdf/"> <netcdf> <attribute name="Conventions" value="CF-1.0"/> </netcdf> </datasetScan>

  19. TDS / NcML aggregation <dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-CONUS_4km"> <netcdf > <aggregation dimName="time" type="joinNew"> <scan location="/data/ldm/pub/satellite/3.9/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf> </dataset>

  20. Datasets vs. Files • Must hide actual location of data files on your server • Would like to hide actual file format • Must encapsulate collections of files into logical datasets • Homogenous metadata • Hide arbitrary storage decisions • Minimize number of datasets

  21. Forecast Model Run Collection (FMRC)

  22. Data Model: Sampled Functions Ourphenomena are continuous functions: F: Domain → Range where Domain = subset of space-time (3 spatial, time) (Ε4) Range = Rn (product set of real numbers) Our measurements are sampled functions Domain is a point subset = {p, p єΕ4} M: E4 → Rn

  23. Variables Variable is a container for an Array of values dimensions lat = 64; lon = 128; variables: float temperature( lat, lon); Domain is a set of points in Index space: Temperature : {[0..63] x [0..127]} → R Temperature : I2→ R Variable : Im→ Rn

  24. Coordinate Systems Coordinate Axis : Im→ R {Axis} = Coordinate System : Im→ E4 V: Im→ Rn CS: Im→ E4 V ° CS-1 : E4 → Rn

  25. Scientific Data Types • Trying to go beyond index-space subsetting • Trying to satisfy V ° CS-1 : E4 → Rn • I.e. support subsetting using Space, Time “queries” • Based on datasets Unidata is familiar with • APIs are evolving • Intended to scale to large, multifile collections • Corresponding “standard” NetCDF file format conventions

  26. Datatype Grid PointObs RadialSweep Swath Dataset GridDataset FMRCDataset CollectionOfPointObs StationCollectionOfPointObs StationCollectionOfRadialSweep Implementations

  27. Conclusions • CDM is our implementation data model • Map to data access models such as OGC • Current work is to serve collections instead of individual files. • Dataset is desired level of granularity • Scientific data types are implementations with specialized access

  28. Datatype Collection • GridDataset collection of GridDatatype

  29. THREDDS Catalog.xml Application Scientific Datatypes Datatype Adapter NetCDF-Java version 2.2 architecture NetcdfDataset CoordSystem Builder ADDE NetcdfFile I/O service provider OPeNDAP NetCDF-3 NIDS NcML NetCDF-4 GRIB HDF5 GINI Nexrad DMSP …

  30. Gridded Datatype • Cartesian coordinates • All dimensions are connected • horizontal: lat,lon or projection x,y • time(time) orthogonal 1D • seperable: (x, y) X time X z float gridData(t,z,y,x); float time(t); float y(y); float x(x); float lat(y,x); float lon(y,x); float z(z); float height(t,z,y,x);

  31. GridDatatype methods CoordinateAxis getTaxis(); CoordinateAxis getXaxis(); CoordinateAxis getYaxis(); CoordinateAxis getZaxis(); Projection getProjection(); int[] findXYindexFromCoord( double x_coord, double y_coord); LatLonRect getLatLonBoundingBox(); Array getDataSlice (Range[] …) GridDatatype makeSubset (Range[] …)

  32. Radial Data • Polar coordinates • All dimensionsare connected • Not separate time dimension radialData(radial, gate) : distance(gate) azimuth(radial) elevation(radial) time(radial)

  33. Swath • lat/lon coordinates • not separate time dimension • all dimensionsare connected swathData(line,cell) lat(line,cell) lon(line,cell) time(line) z(line,cell) ??

  34. Unstructured Grid • Pt dimension not connected • Looks the same as point data • Need to specify the connectivity explicitly float unstructGrid(t,z,pt); float lat(pt); float lon(pt); float time(t); float height(z);

  35. Point Observation Data • Set of measurements at the same point in space and time • Point dimension not connected float obs1(pt); float obs2(pt); float lat(pt); float lon(pt); float z(pt); float time(pt); Structure { lat, lon, z, time; v1, v2, ... } obs( pt);

  36. PointObsDataset Methods // Iterator<StructureData> Iterator getData( LatLonRect boundingBox, Date start, Date end);

  37. Time series Station Data Structure { name; lat, lon, z; Structure{ time; v1, v2, ... } obs(*); // connected } stn(stn); // not connected

  38. StationObs Methods // List<Station> List getStations( LatLonRectboundingBox); // Iterator<StructureData> Iterator getData( Station s, Date start, Date end);

  39. Trajectory Data • pt dimension is connected • Collection dimension not connected Structure { lat, lon, z, time; v1, v2, ... } obs(pt); // connected Structure { name; Structure { lat, lon, z, time; v1, v2, ... } obs(*); // connected } traj(traj) // not connected

  40. Profiler/Sounding Station Data Structure { name; lat, lon, time; Structure { z; v1, v2, ... } obs(*); // connected } loc(nloc); // not connected Structure { name; lat, lon; Structure { time, Structure { z; v1, v2, ... } obs(*); // connected } time(*); // connected } stn(stn); // not connected

  41. Data Types Summary • Data access through a standard API • Convenient georeferencing • Specialized subsetting methods • Efficiency for large datasets

  42. CDM Payoff N + M instead of N * M things on your TODO List! File Format #1 Visualization &Analysis NetCDF file File Format #2 OpenDAP Server File Format #N WCS Service Web Service

  43. Next: DataType Aggregation • Work at the CDM DataType level, know (some) data semantics • Forecast Model Collection • Combine multiple model forecasts into single dataset with two time dimensions • With NOAA/IOOS (Steve Hankin) • Point/Station/Trajectory/Profile Data • Allow space/time queries, return nested sequences • Start from / standardize “Dapper conventions”

  44. Forecast Model Collections

  45. Coordinate Systems: implicit/explicit • NetCDF, OPeNDAP, HDF data models do not have explicit coordinate systems • so georeferencing not part of API • Need conventions to specify (eg CF-1, COARDS, etc) • GRIB, HDF-EOS (eg) are explicit • But no uniform API

  46. netCDF-3 Interface netCDF-4 Library HDF5 Library NetCDF-4 C Library NetCDF-4 C Library

  47. Conclusion • Standardized Data Access in good shape • HDF5, NetCDF, OPeNDAP • Write an IOSP for proprietary formats (Java) • But that’s not good enough! • To do: • Standard representations of coordinate systems • Classifications of data types, standard services for them

More Related