1 / 23

Enabling technologies for facilitating access and use of data

Enabling technologies for facilitating access and use of data. Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville , 2010-03-09. CDM. Goal: N + M instead of N * M things on your TODO List.

zeno
Download Presentation

Enabling technologies for facilitating access and use of data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Enabling technologies for facilitating access and use of data Russ Rew and John Caron, Unidata Workshop on Ensuring Access and Trustworthiness of Climate Observations and Models for Society NCDC, Asheville, 2010-03-09

  2. CDM Goal: N + M instead of N * M things on your TODO List File Format #1 Visualization &Analysis NetCDF file File Format #2 Data Server File Format #N Web Service

  3. Common Data Model • What is it? • Capabilities for observational data • Current status

  4. What is it? • Abstract Data Model for scientific data • Implemented by Netcdf-Java library • Core of the THREDDS Data Server • Co-evolving with the CF Conventions

  5. Abstract Data Modelaka Object Model • Data Access Layer • NetCDF / HDF5 / OPeNDAP • subset in index space • Coordinate System Layer • CF, VisAD, HDF-EOS, GRIB • georeferencing • Feature Type Layer • OGC WxS, ISO, CSML, • Subset in coordinate space

  6. Abstract Data Model • Turns a collection of bytes into a collection of objects called features • Eg: Grids, swaths, profiles, radial sweeps • These objects play the same role as a schema does in a database • Defines the things (nouns) and what operations (verbs) are possible

  7. Netcdf-Java library implementation • 100 % pure Java, open source, developed and maintained by Unidata • Object oriented, strongly typed, garbage collected, huge open-source libraries, runtime configurable == highly productive • Many different file formats • Many different coordinate system conventions • Library is used by many other software packages

  8. Netcdf-Java File Formats • General: NetCDF-3, NetCDF-4, HDF5, HDF4, OPeNDAP • Gridded: GRIB-1, GRIB-2, GEMPAK, McIDAS, UAMIV CAMx • Point: BUFR, GEMPAK • Radar: NEXRAD 2&3, DORADE, CINRAD, UF • Satellite: DMSP, GINI, McIDAS, FYSAT, HDF-EOS • Misc: GTOPO, NLDN, USPLN, etc • Write your own IOServiceProvider Java class

  9. Transforms (CF) Projections albers_conical_equal_area, lambert_azimuthal_equal_area, lambert_conformal_conic, mcidas_area, mercator, orthographic, rotated_pole , stereographic (including polar), transverse_mercator, UTM (ellipsoidal), vertical_perspective Vertical Transforms atmosphere_sigma, atmosphere_hybrid_sigma_pressure, ocean_s, ocean_sigma, existing3DField Write your own CoordTransBuilderIF Java class

  10. Used by other applications Integrated Data Viewer, ToolsUI (Unidata) Panoply (NASA) ncBrowse (EPIC/NOAA) Java NEXRAD Viewer (NCDC/NOAA) MyWorld GIS (Northwestern) EDC for ArcGIS, ERRDAP (SFSC/NOAA) Live Access Server (PMEL/NOAA) ncWMS (Reading) Matlab plug-in (USGS)

  11. Core of the THREDDS Data Server Servlet Container catalog.xml Remote Access Client THREDDS Server • WCS • OPeNDAP • HTTPServer • WMS NetCDF-Java library configCatalog.xml Datasets IDD Data motherlode.ucar.edu

  12. THREDDS Data Server (TDS) Web server for scientific data 100% Java - servlet Provides remote data access OPeNDAP Open Geospatial Consortium (OGC) WMS and WCS HTTP file transfer Experimental data access protocols. Infrastructure – not a portal

  13. TDS and NcML • Embed NcML into the TDS configuration catalog • Server serves a virtual dataset defined by NcML • NcML hidden from the client • Can “fix” metadata problems • Can augment metadata • General Aggregations • joinNew, joinExisting, Union • Specialized Aggregations • Forecast Model Run Collection (FMRC) • Point Feature Collections (version 4.2)

  14. TDS / NcMLModify all files in datasetScan <datasetScan name="Ocean Satellite Data" path="/data/ocean/sat/" location= "/data/ncdc/impacts/scenario4b/run1234"> <netcdf> <attribute name=“NCS:Provenence" value=“NCDC assimilation prog4gd from GOES-10"/> </netcdf> </datasetScan>

  15. TDS / NcML aggregation <dataset name="WEST-CONUS_4km Aggregation" urlPath="satellite/3.9/WEST-CONUS_4km"> <netcdfxmlns="http://www.unidata.ucar.edu/schemas/netcdf/ncml-2.2" <aggregation dimName="time" type="joinNew"> <scan location="/data/satellite/WEST-CONUS_4km/" suffix=".gini" /> </aggregation> </netcdf> </dataset>

  16. Co-evolving with the CF Conventions • Implementation of the CF Conventions • Strong feedback (in both directions) between CF and CDM • CF is the recommended way to write datasets • CDM also deals with legacy datasets and other file formats besides netCDF

  17. CF • CF has mostly focused on model gridded data • Driven by IPCC work • Has a general coordinate system model • :coordinates = “lat lon alt time”; • Sufficient for swath, some in-situ data • Current efforts • Radial data (NCAR/EOL) • Discrete Sample data (aka point, in-situ data)

  18. Discrete Sample Data Categorization • Point: measured at one point in time and space • Station: time-series of points at the same location • Profile: points along a vertical line • TimeSeries of Profiles a time-series of profilesat same location. • Trajectory: points along a 1D curve in time/space • Trajectory of Profiles: a collection of profilefeatures which originate along a trajectory.

  19. Proposed Encoding Variations • Rectangular Array • Multidimensional • Single : one feature in the file • Ragged Array – different length features • Contiguous • Non-Contiguous • Flattened

  20. Current CDM Status • Discrete Sample Data proposal • Almost finalized (Caron/Gregory/Hankin) • CDM implementation now in 4.1 • Collections of files to be in 4.2 • Forecast Model Run Collection refactor • Also using Collection • Caching on the server • Scale to much larger collections (NCDC/Nomads) • Scheduled for 4.2

  21. CDM funding status • CDM/THREDDS work competes with many other priorities at Unidata • THREDDS is most used by large data centers (NOAA/NASA/USGS/EPS, EU) • Important (but indirect) benefits to NSF ATM constituency (US academic meteorology) • Unidata is fully committed but not much chance of expanded base funding from NSF

  22. CDM funding status (cont) • Have a proposal in to NSF Cyber-Infrastructure solicitation • Integration of TDS and IDD/LDM data streams • Explore use of Hadoop (Map/Reduce) for very large collections • Need commitment of resource from you • ($$) Custom work when compatible • In-kind contribution == time and attention for CF/CDM from domain experts and engineers

  23. Thank You!

More Related