1 / 15

Vocabulary management: a foundation for semantic interoperability through ontology development

GO-ESSP, Paris, May 2007. Vocabulary management: a foundation for semantic interoperability through ontology development. Roy Lowry British Oceanographic Data Centre. Presentation Outline. SeaDataNet Project SeaDataNet Metadata Evolution Controlled Vocabularies and SeaDataNet

Download Presentation

Vocabulary management: a foundation for semantic interoperability through ontology development

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. GO-ESSP, Paris, May 2007 Vocabulary management: a foundation for semantic interoperability through ontology development Roy Lowry British Oceanographic Data Centre

  2. Presentation Outline • SeaDataNet Project • SeaDataNet Metadata Evolution • Controlled Vocabularies and SeaDataNet • Mappings and Ontologies

  3. SeaDataNet Project • SeaDataNet in a Nutshell • Combine over 40 oceanographic data centres across Europe into a single interoperable data system • Approach is to adopt established standards and technologies wherever possible • Two phases: • One brings 12 centres together with centralised metadata and distributed data as files. Due in autumn 2008 • Two introduces data virtualisation, aggregation, cutting and 30 more centres. Due in 2010 • The major problem facing the project is heterogeneous legacy content

  4. SeaDataNet Metadata Evolution • In the beginning there was plaintext • In the 1980s the enlightened few saw the value and potential of metadata • Most data suppliers saw metadata as an unnecessary waste of their valuable time • The enlightened in the MAST Data Committee and SeaSearch aimed to change this by making metadata creation as easy as possible • Plaintext is much easier to create than structured metadata so metadata formats based on plaintext fields such as EDMED were promoted and populated

  5. SeaDataNet Metadata Evolution • Problem is that whilst humans have the intelligence to read and understand plaintext it is of very limited use to a computer • Consider some example EDMED plaintext parameter descriptions from a knowledge management viewpoint: • A wide variety of chemical and biological parameters • CTD data • Amplitude de l'echo retrodiffuse • Cu, Zn, Fe, Pb, Cd, Cr, Ni in biota • MACR0-MEIOFAUNA,SED BIOCHEMISTRY,ZOOPLANKTON, CILIATES,BACT CELLS,BACT BIOMASS,LEUCINE UPT,PRIM. PROD,METABOL, COCCOLITH • Consequently, the chances of these data sets being discovered by conventional parameter searching is virtually zero

  6. SeaDataNet Metadata Evolution • Plaintext kills interoperability stone dead • In 2005 Taco de Bruin asked me to build Antarctic Portal DIFs from Dutch EDMED entries • DIF is a structured format representing each parameter thus: <Parameters> <Category>EARTH SCIENCE</Category> <Topic>Oceans</Topic> <Term>Salinity/Density</Term> <Variable>Salinity</Variable> </Parameters> • How does one derive this automatically from the plaintext ‘CTD data’, ‘T/S’, ‘temp+salin’, ’temperature and salinity’ and all the other variants in EDMED? • Needless to say, Taco is still waiting…..

  7. SeaDataNet Metadata Evolution • SeaDataNet has inherited thousands of EDMED dataset descriptions from SeaSearch • SeaDataNet wants to establish interoperability with other metadata repositories • The key to EDMED’s evolution to interoperability is to replace plaintext descriptions by keywords from controlled vocabularies

  8. Vocabularies • Controlled vocabularies featured in the legacy metadata inherited by SeaDataNet • However • Content governance was total anarchy • Decisions made by individuals – even students • Terms were set up and used with inadequate thought about their meaning and formal definitions were conspicuous by their absence • Technical governance wasn’t much better • No formal maintenance or versioning • Vocabularies delivered on an ad-hoc basis as CSV files on FTP servers or web sites • Data models differed from one vocabulary to the next

  9. Vocabularies • All this has now changed for SeaDataNet • Content governance • SeaDataNet internal vocabularies are governed by the Technical Task Team • Vocabularies with wider implications are governed by SeaDataNet and MarineXML Vocabulary Content Governance Group (SeaVoX) e-mail list • Technical governance • Vocabulary Server technology developed by NERC DataGrid adopted • Oracle back end with automated versioning and audit trail maintenance • Web Service API front end • Clients using the API are available • http://vocab.ndg.nerc.ac.uk/client/vocabServer.jsp (BODC) • http://seadatanet.maris2.nl/v_bodc_vocab/welcome.asp (Maris)

  10. Vocabularies • The importance of controlled vocabularies to SeaDataNet will increase as we stitch together disparate data sources for real • BODC client currently exposes 69 lists • Maris client exposes 27 of these that are particularly relevant to SeaDataNet • These numbers will grow as vocabulary harmonisation within SeaDataNet and between SeaDataNet and the wider community progresses

  11. Mappings and Ontologies • The availability of heavily populated, well-managed controlled vocabularies provide the basis for interoperability within SeaDataNet • However, this leaves important unaddressed issues: • Creating standard content from partner legacy systems • Applying new standards to SeaSearch legacy content • External interoperability with metadata formats such as DIF and ISO19115 profiles • The solution is ontology development

  12. Mappings and Ontologies • Basic bottom-up ontologies are simply sets of lists with the relationships between their entries • These are being built between legacy, internal and external lists • Draft maps produced between parameter vocabularies from BODC, GCMD and CF using SKOS relationships (5-6000 relationships) • These provide the basis for automatic creation of SeaDataNet and GCMD metadata from BODC and CF file aggregations

  13. Mappings and Ontologies • Ontology building won’t solve all SeaDataNet’s interoperability problems • The only cure for legacy plaintext is manual standardisation • Fortunately SeaDataNet has many partners to share this burden

  14. Some Conclusions • Plaintext is a destroyer of interoperability • Vocabularies need rigorous management if they are to effectively support interoperability • Multiple controlled vocabularies covering a common domain topic whilst undesirable is repairable by mapping/ontology building • Establishing interoperability in a scenario with legacy content is 10% inspiration and 90% perspiration

  15. That’s All Folks Thank you for your attention Any questions?

More Related