Metadata Common Vocabulary
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Sérgio Bacelar ( [email protected] ) Statistics Portugal PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back. Sérgio Bacelar ( [email protected] ) Statistics Portugal. Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Lisbon, 11 – 13 March, 2009. Definitions.

Download Presentation

Sérgio Bacelar ( [email protected] ) Statistics Portugal

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


S rgio bacelar sergio bacelar ine pt statistics portugal

Metadata Common Vocabularya journey from a glossary to an ontology of statistical metadata, and back

Sérgio Bacelar ([email protected])

Statistics Portugal

Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS)

Lisbon, 11 – 13 March, 2009


Definitions

Definitions

SDMX and SDMX Content-Oriented Guidelines (COG)

Metadata Common Vocabulary (MCV)

Concepts and related definitions used in structural and reference metadata of international organizations and national data producing agencies.

Content Oriented Guidelines = MCV+ Cross Domain Concepts (subset of MCV) + Statistical Subject-matter Domains

Last version (2009): 397 terms.

Goal: uniform understanding of standard metadata concepts.


Essnet on sdmx

ESSnet on SDMX

  • Objective

    • Further development of SDMX

      • Further development and improvement of the SDMX Content-oriented Guidelines

      • Metadata Task Force on SDMX (Statistics Portugal)

      • WP Proposal: MCV Ontology

  • Metadata Common Vocabulary (MCV)

    • Semantic univocity  design of a conceptual model of the domain

    • Detecting eventual inconsistencies, redundancies or incompleteness of the glossary

    • Lack of structure, flat list, non-hierarchic relations between terms

    • No semantic relations between terms


Conceptual system

Conceptual system

Building a glossary implies usually a previous design of a conceptual model of the respective domain.

  • Proposal for a revision of MCV

    • Starting with the existent terms and definitions

    • creating semantic relations between terms based on the definitions of the MCV terms

      • (bottom-up or middle-out strategy):

        • Goal: reveal the latent conceptual system, detecting eventual structural incongruence or redundancies.


Conceptual system and concept map

Conceptual system and Concept Map

  • Main goals

    • find redundancies, inconsistencies, omissions, terms belonging to other domains different from statistical metadata (justified by the complex and interdisciplinary nature of metadata).

    • To find omitted terms (important and relevant), is necessary to analyze the definitions of the concepts.

      • Bearing this in mind we built a “Concept Map” representing about 20% of the terms in MCV (draft version).

      • A concept map is a diagram showing the relationships among terms/concepts. Concepts are connected with labeled arrows, in a downward-branching hierarchical structure.

      • Visualization (graphical): difficult since there is a great number of terms and relations.


S rgio bacelar sergio bacelar ine pt statistics portugal

Concept Map (partial view)


S rgio bacelar sergio bacelar ine pt statistics portugal

Concept Map (partial view)


S rgio bacelar sergio bacelar ine pt statistics portugal

Terms and relations between MCV terms/concepts


S rgio bacelar sergio bacelar ine pt statistics portugal

Using Resource Description Framework (RDF)

RDF is a framework for representing information in the Web.

RDF is particularly concerned with meaning.

RDF is a collection of triples, each one consisting of a subject, a predicate and

an object: e.g. “MetadataExchangeis-a DataAnd MetadataExchange”


Middle range solution

Middle range solution

Using SKOS (Simple Knowledge Organization System)

- currently developed within the W3C framework

Bridging technology between “chaos” and more rigorous logical formalism of ontology languages (like OWL).

It is an application of the Resource Description Framework (RDF) providing a model for expressing the basic structure and content of concept schemes such as thesauri.


Skos example concept data

SKOS example: concept -data

<rdf:RDF

...........

<skos:Concept rdf:about=http://www.mycom/#data>

<skos:definition>Characteristics or information, usually numerical, that are collected through observation</skos:definition>

<skos:prefLabel>data</skos:prefLabel>

<skos:altLabel></skos:altLabel>

<skos:broader rdf:resource="http://www.my.com/#information"/>

<skos:related rdf:resource="http://www.my.com/#Characteristic"/> <skos:scopeNote>Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means (Economic Commission for Europe of the United Nations (UNECE), "Terminology on Statistical Metadata", Conference of European Statisticians Statistical Standards and Studies, No. 53, Geneva, 2000).</skos:scopeNote>

</skos:Concept>

</rdf:RDF>


Ontologies

Ontologies

Ontology = explicit formal specifications of the terms in the domain (statistical metadata) and relations among them. It is a model of reality in the world (created using an iterative design)

Using an editing and modeling system of ontologies like Protégé (open source software in http://protege.stanford.edu )


Ontologies reasoning

Ontologies reasoning

It is essential to provide tools and services (reasoners) to help users answer queries over ontologies and classes and instances, e.g.:

find more general/specific classes;

retrieve individual matching an existing query

ex. Is there any survey with trimestralfrequency that uses any classification system and has a dissemination format as an on-line database?


Ontologies methodology

Ontologies - methodology

Developing an ontology:

1. Defining classes

2. Arranging classes in a taxonomic hierarchy (classes and subclasses)

3. Defining slots (same as roles or properties)

4. Describing allowed values for these slots (facets, role restrictions)

5. Filling in the values for slots for instances (individuals)


Ontology classes

Ontology - Classes

Just a first try to build an ontology of statistical metadata: main classes created from MCV

(According to SDMX Content-Oriented Guidelines: Framework, Draft March 2006, p.6)

1. General metadata (derived from ISO, UNECE and UN documents);

2. Metadata describing Statistical methodologies;

3. Metadata describing Quality assessment;

4. Terms referring to Data and metadata exchange (SDMX information model and data structure definitions, etc.).


S rgio bacelar sergio bacelar ine pt statistics portugal

Classes and subclasses (Protégé)


S rgio bacelar sergio bacelar ine pt statistics portugal

Classes and subclasses


S rgio bacelar sergio bacelar ine pt statistics portugal

Classes and subclasses

Quality


S rgio bacelar sergio bacelar ine pt statistics portugal

Properties

(e.g. “Quality according to Eurostat, has a dimension called relevance”)

Class

relevance

Property


Codification ontology web language owl

Codification - Ontology Web Language (OWL)

…………………..

<owl:Ontology rdf:about="">

<rdfs:comment

>Metadata Common Vocabulary (MCV) ontology.</rdfs:comment>

</owl:Ontology>

………………………

// Object Properties

<!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#uses -->

<owl:ObjectProperty rdf:about="#uses">

<owl:inverseOf rdf:resource="#isUsedBy"/>

</owl:ObjectProperty>

………………………..

// Classes

<!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#ComputerAssistedInterviewing -->

<owl:Class rdf:about="#ComputerAssistedInterviewing">

<rdfs:subClassOf rdf:resource="#DataCollection"/>

</owl:Class>


Conclusion

Conclusion

Since Ontology is a very strict, rigorous and formal language to represent knowledge, mapping a glossary like Metadata Common Vocabulary into a Statistical Metadata Ontology can help to reduce eventual inconsistencies, incompleteness and lack of structure;

This may facilitate harmonization of concepts describing data (semantic univocity) to the SDMX users.


  • Login