Data integration and
1 / 24

Data integration and Linked Data - PowerPoint PPT Presentation

  • Uploaded on

Data integration and Linked Data. Tatiana Tarasova University of Amsterdam [email protected] 03/09/12. 1. Outline. 1. Task 4.2 objectives 2. Use Case of the ENVRI data integration 3. Linked Data 4. Linked Data for ENVRI 5. Benefits of Linked Data 6. UvA needs. Task 4.2. objectives.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Data integration and Linked Data' - shyla

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Data integration andLinked Data

Tatiana Tarasova

University of Amsterdam

[email protected]




1. Task 4.2 objectives

2. Use Case of the ENVRI data integration

3. Linked Data

4. Linked Data for ENVRI

5. Benefits of Linked Data

6. UvA needs

Task 4 2 objectives
Task 4.2. objectives

Harmonise, integrate and publish data from the ENVRI Research Infrastructures to facilitate multidisciplinary scientific research.

Use case
Use Case

Study the correlation between the concentration of CO2 in the air and the ocean temperature during the Iceland Volcano eruption in 2010.




good quality

level 2









Authorized IP access

FTP catalogues





Envri data integration requirements
ENVRI data integration: requirements

Find a solution that addresses both structural and semantical data heterogeneity and

is universal for all the RIs

re-uses the existing RIs' technological solutions

re-uses the existing data resources like code lists, thesauri and ontologies

ensures data provenance traceability

Linked data uris http rdf 15
Linked Data:URIs, HTTP, RDF [15]




Mace Head is anobservatory .

Subject – predicate – object .

prefix icos: <http:/>

prefix rdf: <>

prefix rdfs: <>

prefix vsto: <>

Linked data publishing workflow
Linked Data publishing workflow





Analyze envri data structure
Analyze ENVRI data: structure


(parameter, unit of measure,

instrument, provider, ...)




(time, lat/long, elevation)

Analyze envri data data
Analyze ENVRI data: data

DATASET “CO2 concentration measured by Mace Head”


Parameter: CO2

Unit of measure:ppm

Observatory: Mace Head

Provider: ICOS

Observed value:392.049

Time: 2010.01.01

Lat/Long: 53.3261/-9.9836

Elevation: 25 m

Model envri data the data cube vocabulary
Model ENVRI data:the Data Cube vocabulary

The Data Cube vocabulary [1] provides a generic framework to encode collections of observations. The core classes of Data Cube are: DataSet, Dimension, DataStructureDefinition. The core properties are: DimensionProperty, AttributeProperty, MeasurePorperty.

This vocabulary was developed for the statistical domain and based on the SDMX standard [2].

Examples: the UK local government payments [3,16], the UK Environmental Agency sampling water monitoring [4].

Publish envri data structure
Publish ENVRI data:structure

@prefix qb: <> .

@prefix icos: <> .

@prefix vsto: <> .

@prefix time: <> .

@prefix geo: <> .

_:structure rdf:type qb:DataStructureDefinition ,

qb:component _:cs1 , _:cs2 , _:cs3, _:cs4, _:cs5, _:cs6, _:cs7 .

_:cs1 qb:measurevsto:hasContainedParameter .

_:cs2 qb:attributemuo:measuredIn .

_:cs3 qb:attributevsto:isFromInstrument .

_:cs4 qb:dimensiongeo:lat .

_:cs5 qb:dimensiongeo:long .

_:cs6 qb:dimensiontime:inXSDDateTime .

_:cs7 qb:dimensionsweet_spaceExtent:hasHeight .

Publish envri data dataset
Publish ENVRI data:dataset

icos:co2-mhd rdf:type qb:DataSet ;

rdfs:comment "Dataset with CO2 concentration

measured by the Mace Head observatory" ;

qb:structure _:structure ;

vsto:hasContainedParametersweet_chemCompound:CO2 ;

muo:measuredIn muo:ppm ;

vsto:isFromInstrumenticos:flask ;

dcterms:publisher <> ;

dcterms:source …

dcterms:contributor ...

Publish envri data observations
Publish ENVRI data:observations

icos:observation0 rdf:type qb:Observation ;

rdfs:label “observation” ;

rdfs:comment "Observation of the dataset

'CO2 concentration measured by Mace Head'" ;

qb:dataSeticos:co2-mhd ;

geo:lat "53.3261" ;

geo:long "-9.9036" ;

time:inXSDDateTime “2010-01-01T00:00:00Z” ;

sweet_spaceExtent:hasHeight “25” ;

muo:numericalValue “391.318” .

Use envri data get all parameters
Use ENVRI data:get all parameters

## Cross-dataset query based on the common ontology.

SELECT ?dataset ?parameter ?parameterName


?dataset vsto:hasContainedParameter ?parameter .

?parameter rdfs:label ?parameterName .}

## The query returns all the parameters independently of the dataset for which they were defined.

Use envri data answering the use case i
Use ENVRI data: answering the use case I

  • ## the query returns observations' values, the measured parameters,

  • ## the datasets which contain the observations and filters the time of the

  • ## measurements to April, 2010

    SELECT ?parameterName ?value ?time ?dataset WHERE {

    ?obsmuo:numericalValue?value ;

    qb:dataSet ?dataset ;

    ?datasetvsto:hasContainedParameter?parameter ;

    ?parameter rdfs:label ?parameterName ;

    time:inXSDDateTime?time .

    FILTER (?time >= '2010-04-01T00:00:00Z'^^xsd:dateTime

    and ?time <= '2010-05-01T00:00:00Z'^^xsd:dateTime) .}

Use envri data answering the use case ii
Use ENVRI data:answering the use case II

## The query returns all the measurements with their parameters independently of the dataset for which they were defined.

Linked data benefits
Linked Data benefits

  • The Time Ontology [5]

  • The Geo WGS84 based Vocabulary [6]

  • The Measurement Units Ontology (MUO) [7]

  • The Open Provenance Model (OPM) [8]

  • The Semantic Web for Earth and Environmental Terminology (SWEET) [9]

  • The Virtual Solar-Terrestrial Observatory Ontology (VSTO) [10] extends SWEET

  • ...

  • Data Cube provides structural data interoperability.

  • Semantical interoperability can be addressed by extending Data Cube with the existing ontologies:

Linked data benefits contd
Linked Data benefits contd

Linked Data is universal, i.e. different data formats can be transformed into Linked Data, e.g., CSV, TSV, relational data, XML.

Linked Data complements the existing technological solutions.

Linked Data allows to describe data provenance.

What do we need
What do we need?

Description of the Iceland Volcano Use Case, including the workflow and the datasets involved.

All data and metadata!

Domain ontologies to encode ENVRI specific terms, e.g., unit of measurements, geospatial dimension, different realms: atmosphere, volcanoes, plate tectonics, etc.

Thank you
Thank you!


References i
References I

[1] The RDF Data Cube Vocabulary

[2] Statistical Data and Metadata Exchange

[3] The Combined Online Information System

[4] The Environmental Agency sampling water monitoring's site

[5] The OWL Time Ontology

[6] The Basic GEO Vocabulary

[7] The Measurement Units Ontology

References ii
References II

[8] The Open Provenance Model Vocabulary

[9] The SWEET ontologies

[10] The Virtual Solar-Terrestrial Observatory Ontology

[11] The Argo Data Management site

[12] The Google Refine plug-in for Data Cube

[13] The Digital Enterprise Research Institute's site

References iii
References III

[14] The Virtuoso Open Source Edition's site

[15] The Linked Data principles

[16] The mashup application of the UK Linked Data COINS dataset