1 / 22

Best Practices to Promote Data Interoperability

Best Practices to Promote Data Interoperability. Chris Lynnes Joe Glassy Technology Infusion Working Group. Outline. Data interoperability: what and why? Factors affecting data interoperability Implementations that support interoperability. What is Data Interoperability?.

dusty
Download Presentation

Best Practices to Promote Data Interoperability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Best Practices to Promote Data Interoperability Chris Lynnes Joe Glassy Technology Infusion Working Group

  2. Outline • Data interoperability: what and why? • Factors affecting data interoperability • Implementations that support interoperability

  3. What is Data Interoperability? Data interoperability exists when a data user is able to work with (view, analyze, process, etc.) a data provider's science data or model output “transparently,” without having to reformat the data, write special tools to read or extract the data, or rely on specific proprietary software. Quicker data usability, easier portability, more transparency – S. Volz

  4. Illustration: Panoply • DATASET COMPARISON • North American Reanalysis from NCDC • Atmospheric Infrared Sounder (AIRS) from GES DISC PROCEDURE Cut and paste NARR OPeNDAP URL Double-click variable to display Repeat for AIRS

  5. What good is data interoperability? • Makes it easier to write tools that work with many datasets... • ...Which increases the ability to work with multiple datasets together... • ...And promotes user-satisfaction and early experiences with ( {your|my|our} data)... • ...Which enhances a dataset’s life-cycle economics.

  6. There is no single path to interoperability… Factors affecting data interoperability

  7. File Formats • Standard formats • More economical to develop general tools • Format is well documented • APIs* exist • Many datasets enabled by one set of code modules • “Self-describing” formats • Contain embedded metadata to interpret the content, context, and/or structure of the file *Application Programming Interfaces

  8. File Structures • Coordinates: where and named how? • Latitude, longitude • Vertical dimension: altitude, pressure, sigma level, depth, ... • Time • Flat vs. hierarchical • Simple vs. complex

  9. Usage Metadata • Inside file vs. separate file • Easy for users to lose a separate file • A key benefit of self-describing formats • Variable-level metadata • Units • Fill Value • Scale / offset • File-level metadata • Standards (e.g., CF-1, HDF-EOS, ISO 19115)

  10. Grids • Common grids enable dataset comparison, merging, etc. • Reprojection from one grid to another usually loses information • Tradeoff • Most appropriate grid for a dataset vs.... • ...most commonly used grid in the “community” • Keep in mind that the potential community may be much broader than you think

  11. Names and Units • Variable names • Standard names (CF-1) • Unique names within file • Some tools have difficulty with hierarchies having variables with the same name in different branches • Dimension / coordinate names • Latitude, longitude, time, altitude/pressure • Unit names • Standard units • Unit conversion • Note that altitude <-> pressure requires additional information • Filenames • Descriptive filenames: dataset, version, data date/time…

  12. Sidebar: Data Identifiers • Filenames, even descriptive ones, may not be completely reliable as unique identifiers • Identifiers are ideally embedded within the data file • Uniquely identifying datasets and data files helps: • Catalog interoperability • Transparency / provenance • Citation metrics • See Ruth Duerr’s talk on recommendations for unique identifiers for datasets and granules • Future tools may make use of these embedded identifiers: look up references, get related data...

  13. Implementations of Data Interoperability

  14. CF-1 • Climate-Forecast convention • Popular in modeling community • Extending to point and satellite data • Coordinate system: Key for tool usage • Latitude + longitude • Specifications for both regular L3 grids and L2 swaths • Time, vertical • Recognizable via units (e.g. “degrees_north”) • Standard variable names: Key for model incorporation • Most often associated with netCDF • Also applicable in OPeNDAP • Work is underway to apply to HDF5

  15. OPeNDAP • Open-Source Project for a Network Data Access Protocol • Client-Server framework • Standard web (GET) request syntax • Remote fine-grained access to data files • Presents a standard data model and “format” to clients • Supports multiple formats on the back end • HDF, netCDF, ASCII, GRIB, binary • Multiple server implementations • Hyrax, THREDDS, ERDDAP, GDS, Dapper, PyDAP, TSDS... • Client support in many tools • IDV, McIDAS-V, GrADS, Matlab, IDL, Ferret, Panoply

  16. Web Coverage Service • Client-Server framework • Open Geospatial Consortium protocol • Standard web (GET) request syntax • Multiple response formats, including GeoTIFF, netCDF/CF-1 and HDF-EOS • Includes spatial subsetting • BUT: • Client support is still nascent outside GIS community • Some datatypes are difficult or impossible to fit into WCS (e.g., limb-scanning profiles)

  17. Semantic Web • Enables machine recognition of: • names • relationships • Effective for: • Metadata • Small ASCII data • Use of semantic web to make Earth Science data interoperable is still in its experimental phase

  18. Data Tools for Use with Interoperable Data • Panoply • http://www.giss.nasa.gov/tools/panoply/ • IDV • http://www.unidata.ucar.edu/software/idv/ • McIDAS-V • http://www.ssec.wisc.edu/mcidas/software/v/ • GrADS • http://www.iges.org/grads/ • Ferret • http://ferret.wrc.noaa.gov/Ferret/

  19. Summary • Data users benefit from data interoperability • More tools available to handle more datasets • Consider format, structure, grids, metadata and naming • If interoperability cannot be built in at data production, some tools (OPeNDAP, WCS, semantic web) can compensate... • ...IF the metadata and information content of the data are sufficient

  20. Backup Slides

  21. References • Practical Data Interoperability for Earth Scientists http://www.esdswg.org/techinfusion/downloads/pdies/view • Recommendations for Data Level Interoperability http://tiwg.wik.is/Interoperability/Interoperability_Recommendations • HDF http://www.hdfgroup.org/ • HDF-EOS http://hdfeos.org/ • netCDF http://www.unidata.ucar.edu/software/netcdf/ • OPeNDAP: http://www.opendap.org • CF-1 http://cf-pcmdi.llnl.gov/ • Web Coverage Service http://en.wikipedia.org/wiki/Web_Coverage_Service

  22. OPeNDAP URL examples • Get metadata in XML http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ddx • Get data slice in ASCII: http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf.ascii?H2OMMRStd[0:1:44][0:1:29][4:1:5] • Data access URL for clients (IDV, Panoply): http://airspar1u.ecs.nasa.gov/opendap/Aqua_AIRS_Level2/AIRX2RET.005/2010/285/AIRS.2010.10.12.090.L2.RetStd.v5.2.2.0.G10286064818.hdf

More Related