The nerc metadata gateway a product of the nerc datagrid
Download
1 / 33

The NERC Metadata Gateway: a product of the NERC DataGrid - PowerPoint PPT Presentation


  • 134 Views
  • Uploaded on

The NERC Metadata Gateway: a product of the NERC DataGrid. Bryan Lawrence (on behalf of a big team). +. +. ]=. +[. +. +. BADC, BODC, CCLRC, PML and SOC. Introduction to NERC, the NERC Data Centres, and NCAS The NERC DataGrid Project Key Components:

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' The NERC Metadata Gateway: a product of the NERC DataGrid' - alair


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The nerc metadata gateway a product of the nerc datagrid
The NERC Metadata Gateway: a product of the NERC DataGrid

Bryan Lawrence

(on behalf of a big team)

+

+

]=

+[

+

+

BADC, BODC, CCLRC, PML and SOC


Outline

Introduction to NERC, the NERC Data Centres, and NCAS

The NERC DataGrid Project

Key Components:

Data Tools, Data Discovery, {Access Control}

NDG Information Environment

Key Standards Structures: the ISO Family

From CSML, {MOLES}, DIF to ISO19139 (NumSim)

Distributed Content Search

Why we did it this way

Our Discovery Architecture

NDG Discovery

Now … and

The Future – The “New NERC Metadata Gateway”

ISO19139 Best Practice

Summary

Outline


Some introductions

NERC: The Natural Environment Research Council

The major player in UK environmental research

Is both a funding agency, and a conglomeration of “centres”: internal “research” institutes,

The British Oceanographic Data Centre (BODC) is part of one of the internal institutes.

And external “collaborative” centres, which include:

The Plymouth Marine Laboratory

The National Oceanographic Centre, Southampton

The National Centre for Atmospheric Science, NCAS, mostly embedded in Universities, but part of which is

the British Atmospheric Centre (BADC) which is embedded in the

CCLRC: Council for the Central Laboratories of the Research Councils

Is about to be replaced by a new entity, which might be called the “Large Facilities Research Council”

NERC has seven discipline based designated data centres (including the BODC and BADC), and requires as much integration of data access as possible.

From discovery to utilisation, from genomics to ecology, from oceanography to atmospheric science, from antarctic science to British geology …

Some Introductions


NCAR

Complexity + Volume + Remote Access = Grid Challenge

British Atmospheric Data Centre

http://ndg.nerc.ac.uk

British Oceanographic Data Centre


If it s not obvious

Lots of organisations

Varying membership, and trust internally and between each other is not consistent.

Lots of priorities

Not all organisations are “about” data

Different internal storage structures

Data stored in variety of databases and filesystems.

Some things well documented, but not automated

Some things automated, but information content is sparse …

Integrating data access non-trivial

And none of that includes the important relationships with customers and collaborators!

If it’s not obvious


Key components

Discovery Tools

Discovery Portal

Metadata Search

Direct Links to Data and Services

Data Tools

Slice and Dice

Visualisation

Manipulation

Access Control

Systems are resource limited

Data may access may be restricted by license

Metadata Structures to support all the above

Key Components


Standards landscape

Or two:

ISO TC211 Standards, e.g

ISO 19101: Geographic information – Reference model

ISO 19103: Geographic information – Conceptual schema language

ISO 19107: Geographic information – Spatial schema

ISO 19108: Geographic information – Temporal schema

ISO 19109: Geographic information – Rules for application schema

ISO 19111: Geographic information – Spatial referencing by coordinates

ISO 19115: Geographic information – Metadata

Open Geospatial Consortium Specs

Geographic Markup Language, a toolkit for building data descriptions

WMS, WCS, WFS, WPS: the Web (Map, Coverage, Feature, and Processing) services.

Standards Landscape


Standards

ISO 19101: Geographic information – Reference model

…in a defined logical structure…

…delivered through services…

…and described by metadata.

A geospatial dataset…

…consists of features and related objects…

Standards


Data description standards
Data Description Standards

  • Geographic ‘features’

    • “abstraction of real world phenomena” [ISO 19101]

    • Type or instance

    • Encapsulate important semantics in universe of discourse

    • “Something you can name”

  • Application schema

    • Defines semantic content and logical structure

    • ISO standards provide toolkit:

      • spatial/temporal referencing

      • geometry (1-, 2-, 3-D)

      • topology

      • dictionaries (phenomena, units, etc.)

    • GML – canonical encoding

[from ISO 19109 “Geographic information – Rules for Application Schema”]


Csml climate science modelling language

Fully Featured GML Application Schema, with extensions for

External binary data (Grib, netCDF etc)

Irregular Grids, “Proper” vertical coordinate systems (both activities now on OGC and ISO standards tracks)

V1.0 included seven feature types and provided only “data” modelling.

V1.0 CSML tooling includes a scanner (creates CSML from netCDF files), and a parser (instantiates python objects which can be manipulated scientifically (based on the XML CSML documents).

CSML: Climate Science Modelling Language


Marinexml testbed
MarineXML Testbed

For each XSD (for the source data) there is an XSLT to translate the data to the Feature Types (FT) defined by CSML. The FT’s and XSLT are maintained in a ‘MarineXML registry’

Phenomena in the XSD must have an associated portrayal

Data from different parts of the marine community conforming to a variety of schema (XSD)

The FTs can then be translated to equivalent FTs for display in the ECDIS system

XSD

XML

Biological Species

S52 Portrayal Library

XSD

XML

Chl-a from Satellite

XML Parser

MarineGML(NDG) Feature Types

XSLT

XML

XSLT

XSLT

SENC

SeeMyDENC

XSD

MeasuredHydrodynamics

XML

XSLT

XML

XSLT

XSLT

ECDIS acts as an example client for the data.

XSD

Data Dictionary

XML

ModelledHydrodynamics

The result of the translation is an encoding that contains the marine data in weakly typed (i.e. generic) Features

Features in the source XSD must be present in the data dictionary.

XSD

Feature described using S-57v3.1Application Schema can be imported and are equivalent to the same features in CSML’

XML

S-57v3 GML

Slide adapted from Kieran Millard (AUKEGGS, 2005)


The concept of re using features

All this requires agreement on standards

The Concept of re-using Features

Here structured XML is converted to plain ascii text in the form required for a numerical model

HTML warning service pages are generated ‘on the fly’

Here the same XML is converted to the SENC format used in a proprietary tool for viewing electronic navigation charts.

XML can also be converted to SVG to display data graphically

Slide adapted from Kieran Millard (AUKEGGS, 2005)


Csml round tripping 1

conceptual model

New Dataset

Conforms to

101010

UGAS

produces

<gml:featureMember>

<NDGPointFeature gml:id="ICES_100">

<NDGPointDomain>

<domainReference>

<NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree">

<location>55.25 6.5</location>

</NDGPosition>

</domainReference>

</NDGPointDomain>

<gml:rangeSet>

<gml:DataBlock>

<gml:rangeParameters>

<gml:CompositeValue>

<gml:valueComponents>

<gml:measure uom="#tn"/>

<gml:measure uom="#amount"/>

<gml:measure uom="#gsm"/>

</gml:valueComponents>

</gml:CompositeValue>

</gml:rangeParameters>

<gml:tupleList>

XML

V1.0

(Python, Complete)

GML app schema

GML dataset

Application

instance

parser

CSML Round Tripping - 1

Managing semantics


Csml round tripping 2

V1.0

V2 in development

CF Dataset

scanner

101010

CF

produces

<gml:featureMember>

<NDGPointFeature gml:id="ICES_100">

<NDGPointDomain>

<domainReference>

<NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree">

<location>55.25 6.5</location>

</NDGPosition>

</domainReference>

</NDGPointDomain>

<gml:rangeSet>

<gml:DataBlock>

<gml:rangeParameters>

<gml:CompositeValue>

<gml:valueComponents>

<gml:measure uom="#tn"/>

<gml:measure uom="#amount"/>

<gml:measure uom="#gsm"/>

</gml:valueComponents>

</gml:CompositeValue>

</gml:rangeParameters>

<gml:tupleList>

XML

V1.0

V2 in development

GML app schema

GML dataset

Application

instance

parser

CSML Round Tripping - 2

Managing data - 1


Csml2 structure affords behaviour

ISO 19123 coverage class

‘Affordance’ modelled with UML <<type>>

CSML2: Structure “Affords” Behaviour

Moving beyond GML, but staying in the ISO Frame!


Csml2 related to new ogc observations and measurements spec
CSML2: Related to new OGC Observations and Measurements Spec

An Observation is an Event whose result is an estimate of the value of some Property of the Feature-of-interest, obtained using a specified Procedure


Managing data 2

CF Dataset

CF Dataset

101010

101010

Define Dataset

DECISION

PROCESSES

<gml:featureMember>

<NDGPointFeature gml:id="ICES_100">

<NDGPointDomain>

<domainReference>

<NDGPosition srsName="urn:EPSG:geographicCRS:4979" axisLabels="Lat Long" uomLabels="degree degree">

<location>55.25 6.5</location>

</NDGPosition>

</domainReference>

</NDGPointDomain>

<gml:rangeSet>

<gml:DataBlock>

<gml:rangeParameters>

<gml:CompositeValue>

<gml:valueComponents>

<gml:measure uom="#tn"/>

<gml:measure uom="#amount"/>

<gml:measure uom="#gsm"/>

</gml:valueComponents>

</gml:CompositeValue>

</gml:rangeParameters>

<gml:tupleList>

XML

Add Information

GML dataset

Managing Data 2

scanner

XSLT

PUBLISH

ISO19115


The most important decision

What is a dataset?

The Most Important Decision

Granularity too coarse: can’t find what you want – not enough information exposed.

Granularity too fine: can’t find what you want – buried in unordered results.


Distributed query

Options:

Harvest or Crawl

Distribute Query to known targets versus harvest from known targets and do local query

Timeliness versus Responsiveness

Decision:

NDG Discovery based on Open Archives Initiative Protocol for Metadata Harvesting

Additional Partners include NCAR, MPI-WDCC, TPAC, UK-MDIP

Distributed Query


Discovery metadata usage
Discovery Metadata Usage

XML: Metadata store: can support a limited variety of different xml schema provided WS-interface understands them (need unique xquery for each method, schema pair)


Metadata formats

Currently Supporting

NASA Global Change Master Directory: Directory Interchange Format (DIF)

Experimenting with:

Vanilla ISO19139

Dublin Core

UK Gemini V1 format

Will support following ISO profiles for harvest:

(eventually) UK Gemini profile

WMO profile

IOC profile

(whenever) US FGDC profile

ALL SIMULTANEOUSLY: XML Database plus appropriate xqueries

Metadata Formats



Numsim example
NumSim Example

NumSim Example





Within record

Scrolling Down

Within Record


New interfaces
New Interfaces

Simple

Advanced

  • Issues:

  • Times (forecast, paleo etc)

  • BBOX (near poles and dateline)

  • Semantic Vocabulary matching (exploiting a new NDG web-service providing thesaurus content, and ontology mapping)

(No CSS as yet)



Iso19139

Background:

Designed to exploit as much as possible of the xml-schema machinery

Not designed for Humans!

Advice:

Use in conjunction with a clear concept of why it’s being used:

Decide on dataset granularity, and use other metadata schema to describe how to use content (“A” metadata; e.g. an application schema of GML).

Devise a profile with utility then: restrict, restrict, restrict. Document. Register.

ISO19139


On restriction
On Restriction

  • ISO19139 is also about INTEROPERABILITY!

  • Don’t follow the ISO19139 advice and produce a new schema!

  • Ensure that your profile instances are valid vanilla ISO19139

  • Restrict content out-of-band, e.g. schematron, etc.

  • Agree on how you’re going to deploy ISO19139


On extension
On Extension

  • ISO19139 is also about INTEROPERABILITY!

  • Do follow the ISO19139 advice and produce a new schema!

  • Do what you need for your community, but:

  • Design so that code expecting ISO19139 instances can parse yours!

  • Make it easy for third party code to ignore your content!


Summary
Summary

  • NDG dealing with heterogeneous environment

  • Successful deployment of OAI with discovery metadata

    • (There are some issues differentiating between model simulations and ordering response sets)

  • Directly linking to and exploiting GML application schema

  • Web Service backends make deployment easier.

  • Communities need to be very careful how they deploy ISO19139


ad