Metadata Concepts / Use
Download
1 / 27

Metadata Concepts / Use in Climate Research - PowerPoint PPT Presentation


  • 88 Views
  • Uploaded on

Metadata Concepts / Use in Climate Research. Stephan Kindermann , Martina Stockhause German Climate Computing Center (DKRZ) Hamburg, Germany. Overview. Metadata descriptions: sources, usage  data level, preservation level, model level, domain knowledge level

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Metadata Concepts / Use in Climate Research ' - moses


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Metadata Concepts / Use

in Climate Research

Stephan Kindermann, Martina Stockhause

German Climate Computing Center (DKRZ)

Hamburg, Germany


Overview
Overview

Metadata descriptions: sources, usage

 data level, preservation level, model level, domain knowledge level

Metadata standards, IT-principles


Metadata descriptions sources usage
Metadata descriptions: sources, usage

(I) Data Description Level:

source: model run output

format: gib, netcdf3/4 container formats (including basic metadata)

metadata homogenization(„Climate and Forecast Convention (CF)“

conformance, CMOR2 compliance, controlled vocabs)

usage: analysis tools, data access script, data search

( „linked data principle“)

(II) Data Preservation Level:

target: legacy data centers (e.g. WDCC)

format: internal DB, various external formats, e.g. ISO 19139, DIF, ..

usage: long term data storage and access, citation e.g. using DOIs


Metadata descriptions sources usage1
Metadata descriptions: sources, usage

(IIl) Model Description Level:

source: Researcher interviews, online questionnaire

format: CIM(Climate Metadata for Climate Modelling Digital Repositories - Metafor FP7)

Con-CIM: UML, APP-CIM: XSD + vocabs)

usage: model intercomparison, scientific portals,

information space browsing / search

(lV) Semantic Annotion Level:

source: data metadata, model metadata, domain knowledge metadata

format: OWL (RDF)

usage: user navigation in portals, „faceted search“ etc.

deployments: Earth System Grid CMIP5 portal, IS-ENES portal


Short background info
.. Short Background Info ..

The Fifth Coupled Model Intercomparison Project (CMIP5)

– Sponsored by the WMO WGCM

– Quality Controlled Data to (eventually) appear in the IPCC Data Distribution Centre…

• World Wide Data Management Infrastructure building effort, consistent wflow from producers to consumers...

In Numbers:

~2 petabytes of CMIP5 requested output

~1 petabyte of CMIP5 “replicated”

output

– Which will be replicated at BADC

& DKRZ, to arrive in 2010/2011!

~10 TB of land-biochemistry (from the

long term experiments alone).

Simulations:

~90,000 years

~60 experiments

~20 modelling centres using

~30 major(*) model configurations

~2 million output datasets

~10's of petabytes of output


B metadata standards it principles
B) Metadata standards, IT principles

(I) Data Description Level:

Metadata

File naming convention based on CVs building uniform URIs (DRS, Data Reference Syntax)

Data

Activity/Product/Institute/Model/Exp/frequ/realm/Variable/ensemble

Grib, netcdf

data containers

10`s of PBytes

Data servers

MD catalogue

servers

 Enabling „linked data“

wget http://server.org/Activity/Product/../ensemble


B metadata standards it principles1
B) Metadata standards, IT principles

(II) Data Preservation Level:

WDCC Metadata Concept

OAI-PMH

ISO 19139

CERA GUI

IS-ENES Portal

  • Scalability

  • Sustainability

search API

Common

CV

CERA2 DB schema

OWL conceptual model

  • Flexibility

  • User friendly GUIs

QC, DOI assignment, ..

Tape Archive


B metadata standards it principles2
B) Metadata standards, IT principles

(III) Model Description Level:

Metafor FP7 project: Common Information Model (CIM)

  • Formal metadata model of the climate modelling process

  • It includes descriptions of the experiments being undertaken, the

    simulations being run in support of these experiments, the software models and tools being used to implement the simulations and the data generated by the software.

  • CMIP5 use case: CV collection, CMIP5 questionnaire


Metafor cim overview
Metafor CIM overview

CONCIM (UML)

Automatic translation

ISO, Geographic Markup Language (GML) series

APPCIM (XSD)

CMIP5 portal(s)

IS-ENES portal

Metafor catalogue

CIM Instances(interliked XML files)



Automatic XML  RDF translation

ESG OWL instances

IS-ENES1 portal

CMIP5 gateway(s)

1Infrastructure for the European Network for Earth System Modelling


Con cim overview
(CON)CIM Overview

Quality

Shared

ISO

Grids

Software

(hierarchical model components,

Coupled together)

Data

Activity: simulations in

support of experiments


(IV) Semantic Annotation Level

B) Metadata standards, IT principles

Portal(s)

ESG Gateways

RDF

CIM

XML

OWL ontologies:

http://ontologies.ucar.edu/owl

Data

object

XML

Triple

Store

IS-ENES

Portal

Content

Management

System

Community

content

RDF

Triple

Store

Rel.

DB

Evolving OWL model


THREDDS

Data Server

Metafor / CIM

Questionnaire

MD on

model+simulation

MD on data

MD Quality

Checks L2

Data Quality

Checks L2

QC DB

MetadataRepository

CMIP5 Quality Control

Files

Data Metadata

CIM Metadata

Data

in prescribed

DRS Syntax

Information MD

Quality MD

Data MD


THREDDS

Data Server

Metafor / CIM

MD on

model+simulation

+data+quality

MD on data

QC DB

Data Quality

Checks L3

double check,

cross checks

CMIP5 STD-DOI Publication

TIB:DOIRegistrationAgency

Data

Data Node

Metadata

DOI Target Pageaccess todata and

metadata

Filesystem

STD-DOI

Catalogue

QualityMD

Data MD

InformationMD

Longterm

Archive

STD-DOI MD

Information MD

WDCC:DOI Publication Agent


(IV) Semantic Annotation Level

B) Metadata standards, IT principles

Portal(s)

ESG Gateways

RDF

CIM

XML

OWL ontologies:

http://ontologies.ucar.edu/owl

Data

object

XML

Triple

Store

IS-ENES

Portal

Content

Management

System

Community

content

RDF

Triple

Store

Rel.

DB

Evolving OWL model



2010-07-07 16:49:13 INFO triplestorefill.utility Adding item

<ComponentModel at /test7/echam> with ID echam at

http://localhost:8080/test7/echam

2010-07-07 16:49:13 INFO triplestorefill.sesameconnector Storing RDF...

(1118 byte)

2010-07-07 16:49:13 INFO triplestorefill.sesameconnector RDF data:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

@prefix owl: <http://www.w3.org/2002/07/owl#> .

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

@prefix dc: <http://purl.org/dc/elements/1.1/> .

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

@prefix isenes: <http://www.enes.org/isenes#> .

isenes:echam rdf:type isenes:ComponentModel .

isenes:echam foaf:page <http://plone.dkrz.de/test7/echam> .

<http://plone.dkrz.de/test7/echam> foaf:topic isenes:echam .

isenes:echam dc:title "ECHAM" .

isenes:echam rdfs:label "ECHAM" .

isenes:echam rdfs:comment "Global circulation model" .

isenes:dkrz isenes:isResponsibleFor isenes:echam .

isenes:echam isenes:hasResponsible isenes:dkrz .

isenes:joachim-biercamp rdfs:label "Joachim Biercamp" .

isenes:joachim-biercamp rdf:type foaf:Person .

isenes:dkrz rdfs:label "DKRZ" .

isenes:dkrz rdf:type foaf:Organization .

isenes:joachim-biercamp isenes:isMemberOf isenes:dkrz .

isenes:dkrz isenes:hasMember isenes:joachim-biercamp .

isenes:dkrz dc:title "DKRZ" .

isenes:joachim-biercamp foaf:mbox "[email protected]"

„save“

Triple Store


B from a user s perspective
(B) From a user`s perspective

Bildchen: Plone seite mit „related info“ portlet


B from a user s perspective1
(B) From a user`s perspective

Bildchen: Plone Seite nach Klick auf „related“ link: faceted search


Summary

  • international CMIP5 / IPCC effort is key driver for collection

  • / standardization of CVs, Metadata,

  • conceptual models (Ontologies)

  • Metadata mainly used for

  • model intercomparison, uniform data search / access

  • + data processing

  • Prepare for Climate Impact

    Community use cases !!


Workshop reminder
..workshop reminder..

- Usage and quality of descriptive keyword type of metadata used in your domain to manage data.

- Types of usages of this metadata (management, retrieval, research statistics, machine processing, etc).

- The standards used for your metadata descriptions (structure, elements, vocabularies).

- Adherence to common IT principles (explicit syntax, registered semantics, use of PIDs, etc).

- Compliance with the recommendations to be found in the report of the e-IRG task force on Data Management http://www.e-irg.eu/publications/e-irg-task-force-reports.html

..therefore we would like the presenters to focus on a few points allowing all of us to draw conclusions at the end:



Producers: providers of models, tools, model results, HPC ecosystem, Grid .., community

Motivation

  • Consumers: ENES community, impact community

Portal

E-infrastructure

components

Governance

Agreements,

Commitments,

Sociology,..

Virtual Earth System Modeling Resource Centre

CMIP5/AR5/+

data services

Ticketing

AAI

Collaboration

Metadata (CIM,..)

Protocols

APIs


IS-ENES vERC Portal ecosystem, Grid .., community

Requirement

E-Infra component

Technology used

(A) Community info presentation (models, tools, descriptions,..)

Content Management Sytem (CMS, Collab.Tool)

Plone + IS-ENES „content-types“

Project Management / Ticketing Tool

Redmine

(B) Community development support

Zope/Plone plugin(s)

(C) Data portal to AR5 archives

Web Framework

(external) Metafor service(s)

(external) ESG-gateway

(D) CIM metadata

Web service (proxies)

Python info collector based using Atom, OAI-PMH,.. protocols

(E) External content / metadata collection

Info (XML) harvester

„Cross-selling“

Semantic interlinking

(F) Additional value provisioning

RDF triple store

(Sesame)


ad