Metadata Common Vocabulary a journey from a glossary to an ontology of statistical metadata, and back. Sérgio Bacelar ( [email protected] ) Statistics Portugal. Joint UNECE/Eurostat/OECD Work Session on Statistical Metadata (METIS) Lisbon, 11 – 13 March, 2009. Definitions.
SDMX and SDMX Content-Oriented Guidelines (COG)
Metadata Common Vocabulary (MCV)
Concepts and related definitions used in structural and reference metadata of international organizations and national data producing agencies.
Content Oriented Guidelines = MCV+ Cross Domain Concepts (subset of MCV) + Statistical Subject-matter Domains
Last version (2009): 397 terms.
Goal: uniform understanding of standard metadata concepts.
Building a glossary implies usually a previous design of a conceptual model of the respective domain.
RDF is a framework for representing information in the Web.
RDF is particularly concerned with meaning.
RDF is a collection of triples, each one consisting of a subject, a predicate and
an object: e.g. “MetadataExchangeis-a DataAnd MetadataExchange”
Using SKOS (Simple Knowledge Organization System)
- currently developed within the W3C framework
Bridging technology between “chaos” and more rigorous logical formalism of ontology languages (like OWL).
It is an application of the Resource Description Framework (RDF) providing a model for expressing the basic structure and content of concept schemes such as thesauri.
<skos:definition>Characteristics or information, usually numerical, that are collected through observation</skos:definition>
<skos:related rdf:resource="http://www.my.com/#Characteristic"/> <skos:scopeNote>Data is the physical representation of information in a manner suitable for communication, interpretation, or processing by human beings or by automatic means (Economic Commission for Europe of the United Nations (UNECE), "Terminology on Statistical Metadata", Conference of European Statisticians Statistical Standards and Studies, No. 53, Geneva, 2000).</skos:scopeNote>
Ontology = explicit formal specifications of the terms in the domain (statistical metadata) and relations among them. It is a model of reality in the world (created using an iterative design)
Using an editing and modeling system of ontologies like Protégé (open source software in http://protege.stanford.edu )
It is essential to provide tools and services (reasoners) to help users answer queries over ontologies and classes and instances, e.g.:
find more general/specific classes;
retrieve individual matching an existing query
ex. Is there any survey with trimestralfrequency that uses any classification system and has a dissemination format as an on-line database?
Developing an ontology:
1. Defining classes
2. Arranging classes in a taxonomic hierarchy (classes and subclasses)
3. Defining slots (same as roles or properties)
4. Describing allowed values for these slots (facets, role restrictions)
5. Filling in the values for slots for instances (individuals)
Just a first try to build an ontology of statistical metadata: main classes created from MCV
(According to SDMX Content-Oriented Guidelines: Framework, Draft March 2006, p.6)
1. General metadata (derived from ISO, UNECE and UN documents);
2. Metadata describing Statistical methodologies;
3. Metadata describing Quality assessment;
4. Terms referring to Data and metadata exchange (SDMX information model and data structure definitions, etc.).
(e.g. “Quality according to Eurostat, has a dimension called relevance”)
>Metadata Common Vocabulary (MCV) ontology.</rdfs:comment>
// Object Properties
<!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#uses -->
<!-- http://www.semanticweb.org/ontologies/2008/8/MCV.owl#ComputerAssistedInterviewing -->
Since Ontology is a very strict, rigorous and formal language to represent knowledge, mapping a glossary like Metadata Common Vocabulary into a Statistical Metadata Ontology can help to reduce eventual inconsistencies, incompleteness and lack of structure;
This may facilitate harmonization of concepts describing data (semantic univocity) to the SDMX users.