1 / 53

Describing data through metadata and XML: the CSML experience.

Describing data through metadata and XML: the CSML experience. Bryan Lawrence 1. Andrew Woolf 2 , Roy Lowry 3 , Kerstin Kleese van Dam 2 , Ray Cramer 3 , Marta Gutierrez 1 , Siva Kondapalli 3 , Susan Latham 1 , Dominic Lowe 1 , Kevin O’Neill 2 , Ag Stephens 1 … and Michael Burek 4

joylyn
Download Presentation

Describing data through metadata and XML: the CSML experience.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Describing data through metadata and XML: the CSML experience. Bryan Lawrence1 Andrew Woolf2, Roy Lowry3, Kerstin Kleese van Dam2, Ray Cramer3, Marta Gutierrez1, Siva Kondapalli3, Susan Latham1, Dominic Lowe1, Kevin O’Neill2, Ag Stephens1 … and Michael Burek4 1British Atmospheric Data Centre 2CCLRC e-Science Centre 3British Oceanographic Data Centre 4National Center for Atmospheric Research + + ]= +[ + + BADC, BODC, CCLRC, PML and SOC

  2. Motivation The NERC Datagrid Metadata Taxonomy Standards OGC Vision Feature Types CSML Description Prototyping in MarineXML Round-Tripping The importance of CF Feature Type Issues Next Steps NDG Discovery Demo Outline

  3. NCAR Complexity + Volume + Remote Access = Grid Challenge British Atmospheric Data Centre http://ndg.nerc.ac.uk British Oceanographic Data Centre

  4. CSML NCML+CF MOLES THREDDS CLADDIER DIF -> ISO19115 NDG Metadata Taxonomy … not one schema, not one solution!

  5. Things I’m not going to talk about: NDG PKI based secure access to datasets Supporting authentication/authorisation across differing virtual organisations without centralised user or role databases. Built on bilateral trust relationships NDG Discovery Portal already provides access to hundreds of datasets with international scope Open Archives Initiative Harvesting Web Service Interfaces NDG Browse New method of “walking” around datasets. NDG

  6. WFS Server WFS Client HTML GML Data-source organised for custodian’s requirements Community-specific GML application language • TigerGML, LandGML, O&M, XMML, CGI-GML, ADX, GPML, CSML, MarineXML etc OGC Web Feature Service public  private boundary Slide adapted from Simon Cox (AUKEGGS, 2005)

  7. WFS Server B WFS Server WFS Client WFS Server C Standard transfer format allows multiple data sources Slide adapted from Simon Cox (AUKEGGS, 2005)

  8. e.g.: ERA40 re-analysis surface air temperature, 2001-04-27 deegree open-source WMS modified with netCDF connector Overlaid with rainfall from globe.digitalearth.gov WMS server NetCDF + WMS NB: Now using Mapserver for Interoperability experiments

  9. ISO 19101: Geographic information – Reference model …in a defined logical structure… …delivered through services… …and described by metadata. A geospatial dataset… …consists of features and related objects… Standards

  10. Standards • Geographic ‘features’ • “abstraction of real world phenomena” [ISO 19101] • Type or instance • Encapsulate important semantics in universe of discourse • Application schema • Defines semantic content and logical structure of datasets • ISO standards provide toolkit: • spatial/temporal referencing • geometry (1-, 2-, 3-D) • topology • dictionaries (phenomena, units, etc.) • GML – canonical encoding [from ISO 19109 “Geographic information – Rules for Application Schema”]

  11. Feature types Application schema ISO 19109 ISO 19103 ISO 19110 Universe of discourse Standards • Emerging ISO standards • TC211 – around 40 standards for geographic information • Cover activity spectrum: discovery  access  use • Provide a framework for data integration

  12. The importance of governance Information community definedby shared semantics Need community process to manage those semantics (definitions, models, vocabularies, taxonomies, etc.) e.g. CF conventions for netCDF files Role of Feature Type Catalogues [ISO 19110] and registers [ISO 19135] Governance as driver for granularity Remit / interest determines appropriate granularity ref. IOC, IHO, WMO abstract generic highly specialised feature types spectrum Standards <measurement type=“Radiosonde” measurand=“temperature”/> <temperatureProfile/> <Sonde parameter=“temperature”/>

  13. Aims: provide semantic integration mechanism for NDG data explore new standards-based interoperability framework emphasise content, not container Design principles: offload semantics onto parameter type (‘phenomenon’, observable, measurand) e.g. wind-profiler, balloon temperature sounding offload semantics onto CRS e.g. scanning radar, sounding radar ‘sensible plotting’ as discriminant ‘in-principle’ unsupervised portrayal explicitly aim for small number of weakly-typed features (in accordance with governance principle and NDG remit) NDG-A: Climate Science Modelling Language

  14. CSML feature types defined on basis of geometric and topologic structure Climate ScienceModelling Language

  15. CSML feature types examples... ProfileSeriesFeature ProfileFeature GridFeature Climate Science Modelling Language

  16. Climate Science Modelling Language • Application schema • logical structure and semantic content of NDG ‘Dataset’ • Based on GML 3.1

  17. Numerical array descriptors provides ‘wrapper’ architecture for legacy data files ‘Connected’ to data model numerical content through ‘xlink:href’ Three subtypes: InlineArray ArrayGenerator FileExtract (NASAAmes, NetCDF, GRIB) Composite design pattern for aggregation Climate Science Modelling Language

  18. Inline array Array generator Climate Science Modelling Language <NDGInlineArray> <arraySize>5 2</arraySize> <uom>udunits.xml#degreeC</uom> <numericType>float</numericType> <regExpTransform>s/10/9/ge</regExpTransform> <numericTransform>+5</numericTransform> <values>1 2 3 4 5 6 7 8 9 10</values> </NDGInlineArray> <NDGArrayGenerator> <arraySize>10001</arraySize> <uom>udunits.xml#minute</uom> <numericType>float</numericType> <expression>0:5:50000</expression> </NDGArrayGenerator>

  19. File extract Climate Science Modelling Language <NDGNASAAmesExtract> <arraySize>526</arraySize> <numericType>double</numericType> <fileName>/data/BADC/macehead/mh960606.cf1</fileName> <variableName>CFC-12</variableName> </NDGNASAAmesExtract> <NDGNetCDFExtract gml:id="feat04azimuth"> <arraySize>10000</arraySize> <fileName>radar_data.nc</fileName> <variableName>az</variableName> </NDGNetCDFExtract> <NDGGRIBExtract> <arraySize>320 160</arraySize> <numericType>double</numericType> <fileName>/e40/ggas1992010100rsn.grb</fileName> <parameterCode>203</parameterCode> <recordNumber>5</ recordNumber> <fileOffset>289412</fileOffset> </NDGGRIBExtract>

  20. <gml:dictionaryEntry> <om:Phenomenon gml:id="atmosphere_cloud_condensed_water_content"> <gml:description>"condensed_water" means liquid and ice. "Content" indicates a quantity per unit area. The "atmosphere content" of a quantity refers to the vertical integral from the surface to the top of the atmosphere. For the content between specified levels in the atmosphere, standard names including content_of_atmosphere_layer are used.</gml:description> <gml:name codeSpace="http://www.cgd.ucar.edu/cms/eaton/cf-metadata/">atmosphere_cloud_condensed_water_content</gml:name> <gml:name codeSpace="GRIB">76</gml:name> <gml:name codeSpace="PCMDI">clwvi</gml:name> </om:Phenomenon> </gml:dictionaryEntry> CSML Observations

  21. MarineXML Testbed For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’ Phenomena in the XSD must have an associated portrayal Data from different parts of the marine community conforming to a variety of schema (XSD) The FTs can then be translated to equivalent FTs for display in the ECDIS system XSD XML Biological Species S52 Portrayal Library XSD XML Chl-a from Satellite XML Parser MarineGML(NDG) Feature Types XSLT XML XSLT XSLT SENC SeeMyDENC XSD MeasuredHydrodynamics XML XSLT XML XSLT XSLT ECDIS acts as an example client for the data. XSD Data Dictionary XML ModelledHydrodynamics The result of the translation is an encoding that contains the marine data in weakly typed (i.e. generic) Features Features in the source XSD must be present in the data dictionary. XSD Feature described using S-57v3.1Application Schema can be imported and are equivalent to the same features in CSML’ XML S-57v3 GML Slide adapted from Kieran Millard (AUKEGGS, 2005)

  22. MarineXML Testbed Biological sampling station with attributes for the species sampled at each Grid of Chl-a from the MERIS instrument on ENVISAT Predicted and measured wave climate timeseries (height, direction and period) Vectors of currents from instruments Slide adapted from Kieran Millard (AUKEGGS, 2005)

  23. All this requires agreement on standards The Concept of re-using Features Here structured XML is converted to plain ascii text in the form required for a numerical model HTML warning service pages are generated ‘on the fly’ Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts. XML can also be converted to SVG to display data graphically Slide adapted from Kieran Millard (AUKEGGS, 2005)

  24. Status: Initial feature types defined First draft application schema complete Trial software tooling being coded (parser, netCDF instantiation) Initial deployment trial across BODC, BADC datasets Future: Separate out wrapper implementation (array descriptors) Disallow ‘internal’ dictionaries More strongly-typed features? Complex feature (vector measurements) Implicit Ensemble Support Swathes Follow (and pursue!) GML evolution, enhance compliance Expand tooling Climate Science Modelling Language

  25. conceptual model New Dataset Conforms to 101010 UGAS produces <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML Under Development GML app schema GML dataset Application instance parser CSML Round Tripping - 1 Managing semantics

  26. Under Development CF Dataset scanner 101010 CF produces <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML Under Development GML app schema GML dataset Application instance parser CSML Round Tripping - 2 Managing data - 1

  27. Data should be self-describing. No external tables are needed to interpret the file. For instance, CF encodings do not depend upon numeric codes (by contrast with GRIB). The convention should be easy to use, both for data-writers and users of data. The metadata and the semantic meaning encoding though the metadata should be readable by humans as well as easily utilized by programs. Redundancy should be minimised as much as possible (because it reduces the chance of errors of inconsistency when writing data) Climate Forecasting Conventions For using NetCDF to store CF data:

  28. CF consists of: Vocabulary management Semantic concepts (axes, cells etc), and format specific conventions (NetCDF now) CF is at the heart of IPCC data comparison Academic earth system science data exploitation (and archival). CF

  29. CF Exploits 100’s of man years of effort on NetCDF evolution and tools Is one of the means by which we can take NetCDF data and make meaningful feature types. Application schema will depend on CF (a la the registry relationships). CF Needs WMO blessing (to validate the effort of those who work on it as a standards activity). CF

  30. CF Dataset CF Dataset 101010 101010 Define Dataset DECISION PROCESSES <gml:featureMember> <NDGPointFeature gml:id="ICES_100"> <NDGPointDomain> <domainReference> <NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree"> <location>55.25 6.5</location> </NDGPosition> </domainReference> </NDGPointDomain> <gml:rangeSet> <gml:DataBlock> <gml:rangeParameters> <gml:CompositeValue> <gml:valueComponents> <gml:measure uom="#tn"/> <gml:measure uom="#amount"/> <gml:measure uom="#gsm"/> </gml:valueComponents> </gml:CompositeValue> </gml:rangeParameters> <gml:tupleList> XML Add Information GML dataset Managing Data 2 scanner XSLT PUBLISH ISO19115

  31. Metadata extensions and profiles ISO19115 Concept Should apply to all “metadata”

  32. Most important decision to make is: “What is a dataset” Governance Issue Granularity Issue Not to be confused with application schema Can and should exploit external vocabularies Should be extended! D – ISO19115

  33. NumSim.xsd – extension to D

  34. GML Registry Relationships • Feature Types, need • relationships • operations Managed Registries Crucial to Interoperabilty

  35. Catalogues or Registries? The terms catalogue and registry are often used interchangeably, but in OGC speak: “a registry is a catalogue that exemplifies a formal registration process (e.g., such as those described in ISO 19135 or ISO 11179-6).” A registry is typically maintained by an authorized registration authority who assumes responsibility for complying with a set of policies and procedures for accessing and managing metadata items within some application domain.

  36. Exploit Theory of Measurement Work with Simon Cox on relationship between features, coverages, complex (multiple) measurements and CSML. Get “metocean” concepts in future version of GML!

  37. Feature Types (ISO 19110) are the primary language component for a spatial community to define real world phenomena. Although ISO19136 provides a route for the encoding of Feature Types in XML, it does not provide a route-map for defining the features themselves. The level of granularity for Feature Types has to be based on the governance model. If no-one is prepared to put in place a process to define semantics, interoperability has reached its limits. … Marine: MOTIIVE, WMO: ???? Where next with feature types?

  38. Need clear distinction in mind between inter- and intra-operability When to use GML etc, and when not to! Need clear governance principles at a number of levels for Dataset definitions (data provider, e.g. BADC) Feature-type definitions (community, e.g. WMO) Interoperability (frameworks, e.g. OGC) CSML Provides some geometric features based on portrayal Can and will be extended Provides generalisation of specific examples, e.g. METARS WMO should: “stand-up” a reusable feature-type catalogue (but how? Work underway in MOTIIVE etc) Support CF development Work on dictionaries and thesauri and publish them. Exploit work in the academic community Summary Points

  39. Can order responses by Title, Data Centre or Temporal coverage (default random) Choose to return either data or “B-”Metadata Look at DIFs in either HTML or XML

More Related