Data integration and
Download
1 / 24

Data integration and Linked Data - PowerPoint PPT Presentation


  • 111 Views
  • Uploaded on

Data integration and Linked Data. Tatiana Tarasova University of Amsterdam [email protected] 03/09/12. 1. Outline. 1. Task 4.2 objectives 2. Use Case of the ENVRI data integration 3. Linked Data 4. Linked Data for ENVRI 5. Benefits of Linked Data 6. UvA needs. Task 4.2. objectives.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Data integration and Linked Data' - shyla


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Data integration andLinked Data

Tatiana Tarasova

University of Amsterdam

[email protected]

03/09/12

1


Outline
Outline

1. Task 4.2 objectives

2. Use Case of the ENVRI data integration

3. Linked Data

4. Linked Data for ENVRI

5. Benefits of Linked Data

6. UvA needs


Task 4 2 objectives
Task 4.2. objectives

Harmonise, integrate and publish data from the ENVRI Research Infrastructures to facilitate multidisciplinary scientific research.


Use case
Use Case

Study the correlation between the concentration of CO2 in the air and the ocean temperature during the Iceland Volcano eruption in 2010.


Challenges
Challenges

platform

observatory

good quality

level 2

year

?

month

flask

date

TSV

CSV

day

Authorized IP access

FTP catalogues

CO2

concentration

Ocean

temperature


Envri data integration requirements
ENVRI data integration: requirements

Find a solution that addresses both structural and semantical data heterogeneity and

is universal for all the RIs

re-uses the existing RIs' technological solutions

re-uses the existing data resources like code lists, thesauri and ontologies

ensures data provenance traceability


Linked data uris http rdf 15
Linked Data:URIs, HTTP, RDF [15]

vsto:Observatory

icos:MHD

rdf:type

Mace Head is anobservatory .

Subject – predicate – object .

prefix icos: <http:/example.envri.org/icos/>

prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>

prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>

prefix vsto: <http://escience.rpi.edu/ontology/vsto/2/0/vsto.owl#>


Linked data publishing workflow
Linked Data publishing workflow

Analyze

Model

Publish

Use


Analyze envri data structure
Analyze ENVRI data: structure

METADATA

(parameter, unit of measure,

instrument, provider, ...)

OBSERVATIONS

DATASET

DIMENSIONS

(time, lat/long, elevation)


Analyze envri data data
Analyze ENVRI data: data

DATASET “CO2 concentration measured by Mace Head”

OBSERVATION “392.049”

Parameter: CO2

Unit of measure:ppm

Observatory: Mace Head

Provider: ICOS

Observed value:392.049

Time: 2010.01.01

Lat/Long: 53.3261/-9.9836

Elevation: 25 m


Model envri data the data cube vocabulary
Model ENVRI data:the Data Cube vocabulary

The Data Cube vocabulary [1] provides a generic framework to encode collections of observations. The core classes of Data Cube are: DataSet, Dimension, DataStructureDefinition. The core properties are: DimensionProperty, AttributeProperty, MeasurePorperty.

This vocabulary was developed for the statistical domain and based on the SDMX standard [2].

Examples: the UK local government payments [3,16], the UK Environmental Agency sampling water monitoring [4].


Publish envri data structure
Publish ENVRI data:structure

@prefix qb: <http://purl.org/linked-data/cube#> .

@prefix icos: <http://example.envri.org/icos/> .

@prefix vsto: <http://escience.rpi.edu/ontology/vsto/2/0/vsto.owl#> .

@prefix time: <http://www.w3.org/2006/time#> .

@prefix geo: <http://www.w3.org/2003/01/geo/wgs84_pos#> .

_:structure rdf:type qb:DataStructureDefinition ,

qb:component _:cs1 , _:cs2 , _:cs3, _:cs4, _:cs5, _:cs6, _:cs7 .

_:cs1 qb:measurevsto:hasContainedParameter .

_:cs2 qb:attributemuo:measuredIn .

_:cs3 qb:attributevsto:isFromInstrument .

_:cs4 qb:dimensiongeo:lat .

_:cs5 qb:dimensiongeo:long .

_:cs6 qb:dimensiontime:inXSDDateTime .

_:cs7 qb:dimensionsweet_spaceExtent:hasHeight .


Publish envri data dataset
Publish ENVRI data:dataset

icos:co2-mhd rdf:type qb:DataSet ;

rdfs:comment "Dataset with CO2 concentration

measured by the Mace Head observatory" ;

qb:structure _:structure ;

vsto:hasContainedParametersweet_chemCompound:CO2 ;

muo:measuredIn muo:ppm ;

vsto:isFromInstrumenticos:flask ;

dcterms:publisher <http://www.icos-infrastructure.eu/> ;

dcterms:source …

dcterms:contributor ...


Publish envri data observations
Publish ENVRI data:observations

icos:observation0 rdf:type qb:Observation ;

rdfs:label “observation” ;

rdfs:comment "Observation of the dataset

'CO2 concentration measured by Mace Head'" ;

qb:dataSeticos:co2-mhd ;

geo:lat "53.3261" ;

geo:long "-9.9036" ;

time:inXSDDateTime “2010-01-01T00:00:00Z” ;

sweet_spaceExtent:hasHeight “25” ;

muo:numericalValue “391.318” .


Use envri data get all parameters
Use ENVRI data:get all parameters

## Cross-dataset query based on the common ontology.

SELECT ?dataset ?parameter ?parameterName

WHERE {

?dataset vsto:hasContainedParameter ?parameter .

?parameter rdfs:label ?parameterName .}

## The query returns all the parameters independently of the dataset for which they were defined.


Use envri data answering the use case i
Use ENVRI data: answering the use case I

  • ## the query returns observations' values, the measured parameters,

  • ## the datasets which contain the observations and filters the time of the

  • ## measurements to April, 2010

    SELECT ?parameterName ?value ?time ?dataset WHERE {

    ?obsmuo:numericalValue?value ;

    qb:dataSet ?dataset ;

    ?datasetvsto:hasContainedParameter?parameter ;

    ?parameter rdfs:label ?parameterName ;

    time:inXSDDateTime?time .

    FILTER (?time >= '2010-04-01T00:00:00Z'^^xsd:dateTime

    and ?time <= '2010-05-01T00:00:00Z'^^xsd:dateTime) .}


Use envri data answering the use case ii
Use ENVRI data:answering the use case II

## The query returns all the measurements with their parameters independently of the dataset for which they were defined.


Linked data benefits
Linked Data benefits

  • The Time Ontology [5]

  • The Geo WGS84 based Vocabulary [6]

  • The Measurement Units Ontology (MUO) [7]

  • The Open Provenance Model (OPM) [8]

  • The Semantic Web for Earth and Environmental Terminology (SWEET) [9]

  • The Virtual Solar-Terrestrial Observatory Ontology (VSTO) [10] extends SWEET

  • ...

  • Data Cube provides structural data interoperability.

  • Semantical interoperability can be addressed by extending Data Cube with the existing ontologies:


Linked data benefits contd
Linked Data benefits contd

Linked Data is universal, i.e. different data formats can be transformed into Linked Data, e.g., CSV, TSV, relational data, XML.

Linked Data complements the existing technological solutions.

Linked Data allows to describe data provenance.


What do we need
What do we need?

Description of the Iceland Volcano Use Case, including the workflow and the datasets involved.

All data and metadata!

Domain ontologies to encode ENVRI specific terms, e.g., unit of measurements, geospatial dimension, different realms: atmosphere, volcanoes, plate tectonics, etc.


Thank you
Thank you!

Questions?


References i
References I

[1] The RDF Data Cube Vocabulary http://www.w3.org/TR/vocab-data-cube/

[2] Statistical Data and Metadata Exchange http://sdmx.org/

[3] The Combined Online Information System http://data.gov.uk/resources/coins

[4] The Environmental Agency sampling water monitoring's site http://environment.data.gov.uk/lab/bwq-web.html

[5] The OWL Time Ontology http://www.w3.org/TR/owl-time/

[6] The Basic GEO Vocabulary http://www.w3.org/2003/01/geo/

[7] The Measurement Units Ontologyhttp://idi.fundacionctic.org/muo/muo-vocab.html


References ii
References II

[8] The Open Provenance Model Vocabulary http://openprovenance.org/

[9] The SWEET ontologies http://sweet.jpl.nasa.gov/ontology/

[10] The Virtual Solar-Terrestrial Observatory Ontologyhttp://www.vsto.org/

[11] The Argo Data Management sitehttp://www.argodatamgt.org/

[12] The Google Refine plug-in for Data Cube http://refine.deri.ie/qbExport

[13] The Digital Enterprise Research Institute's sitehttp://www.deri.ie/


References iii
References III

[14] The Virtuoso Open Source Edition's site http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/

[15] The Linked Data principleshttp://www.w3.org/DesignIssues/LinkedData.html

[16] The mashup application of the UK Linked Data COINS dataset http://wheredoesmymoneygo.org/


ad