1 / 22

Controlled Vocabularies and SMC 4 LRT Semantic Mapping in CMDI

2013-05-17 - Utrecht Matej Ďu r č o, ICLTT, Vienna. Controlled Vocabularies and SMC 4 LRT Semantic Mapping in CMDI. Activities : CLARIN taskforce – within SCCTC building on CLAVAS - Vocabulary Alignment Service for CLARIN

tangia
Download Presentation

Controlled Vocabularies and SMC 4 LRT Semantic Mapping in CMDI

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 2013-05-17 - UtrechtMatej Ďurčo, ICLTT, Vienna ControlledVocabularies andSMC4LRT Semantic Mapping in CMDI

  2. Activities: • CLARIN taskforce – within SCCTC building on CLAVAS - Vocabulary Alignment Service for CLARIN • DARIAH jointtaskforceVCC1/Task 5: Data federation and interoperability and VCC3/Task3: Reference Data Registries (and external partners). goal: establish a service providing controlled vocabularies and reference data for the DARIAH (and CLARIN) community. • SMC –SemanticMapping Component a module in the CMD-Infrastructure goal: „semanticsearch“ = enhancethesearch in theheterogeneousdatacollection (of CMDI) a) byexploitingtheshareddatacategories (SMC on schemalevel) b) byexpressingthedata in RDF (SMC on instancelevel) Context

  3. Context II - CLARIN-AT • CCV – CLARIN Center Vienna CenterProfile CMD record http://clarin.aac.ac.at/ccv/index.html expected ready by: 2013-06 Infrastructure services: • CLARIN Metadata Repository • SMC – Semantic Mapping Component • SMC-Browser • Controlled Vocabularies engagement in CLARIN + DARIAHtask forces

  4. 4 conceptualization sketchfrom 2009 Old vision

  5. 5 • Metadata Generation, Curation • Data-Enrichment / Annotation • Data Analysis • Search (Query Expansion, autocomplete, facets etc. ) • needed for CMD2RDF- provide identifiers for entities(- provide equivalencies between concepts/entities from different vocabularies (concept schemes). ? like equivalencies in Wikipedia (page for Johann Wolfgang Goethe):GND: 118540238 | LCCN: n79003362 | NDL: 00441109 | VIAF: 24602065) Potential usagesfor CV

  6. RelatedActivities • DARIAH Schema Registry + Crosswalk Registry • LT-World @DFKIfull-blown ontology with People, Projects, Organisations, Events, LR integration would have to happen at another level (RDF/LOD). • CoNE – Control of Named Entities @MPDL/eSciDochttp://colab.mpdl.mpg.de/mediawiki/Control_of_Named_Entities • EATS - Entity Authority Tool Set @New Zealand Electronic Text Centre (NZETC).http://eats.readthedocs.org/en/latest/ • TextGrid • http://www.dnb.de/DE/Standardisierung/LinksAFS/linksafs_node.html • FRBR - Functional Requirements for Bibliographic Records RDA - Resource Description and Accesshttp://metadaten-twr.org/ - Technology Watch Report: Standards in Metadata and Interoperability (last entry from 2011)

  7. 7 • Data Categories / Concepts - ISOcat • Languages - ISO-639 • Countries - countrycodes • Persons - GND, VIAF, dbpedia? • Organizations - GND, VIAF, dbpedia? • Schlagwörter/Subjects - GND, LCSH • ResourceTypology -  • Tagsets!? (withmappingsbetween tags) AAT - international Architecture and Arts ThesaurusGND - Gemeinsame Norm Datei(DNB)GTAA - Gemeenschappelijke Thesaurus AudiovisueleArchieven (Common Thesaurus [for] Audiovisual Archives)VIAF - Virtual_International_Authority_File CandidateVocabularies

  8. 8 • export closed+simple DCs(perhaps even better to manually select) • Third party applications use- ISOcatfor explain() function - CLAVAS for value(/entity)-lists ISOcatand CLAVAS

  9. informedqueryinput information about available data categories and values for those categoriescan be used as base for a complex query-input widgetwith context-sensitive autocomplete however this rather only as fallbackto autocomplete based on actual data

  10. 10 CMD  RDF • Semantic Mapping on instance levelexpress MD records in RDF (for LOD)=> bind also values in MD fields to concepts • Modelling aspects • CMD Specification • Data Categories • CMD instances: - Identifier, Provenance, Hierarchy, - Components, Elements, - Values, Literal Values, Mapping to entities – Vocabularies => CLAVAS • Ontological Relations usednamespaces

  11. Onestepwhen (pre)processingincomingnew MD-sets • Express MD-Records as RDF-triples: • Identify potential target Domain Ontologies/Vocabularies • Create inverted Index: • Definelookupfunction: • Enrichdatasetwithnewfacts: • Property-values of Metadata-Records are linked to individuals of domain ontologies Approach – Individuals/Instance Level <#mdrecord #property “string-value”> label → entity lookup(category, string-value) → <external-entity, measure> <#mdrecord #property #external-entity>

  12. ResourceType, Format, AnnotationLevelType → mapto: isocat-DataCategories(Profiles: Metadata, Morphosyntax, ...) • Genre, Topic, Subject → mapto: Taxonomies, Library Classificationsystems(LCSH, DDC, Dornseiff,...) • Project, Institution, Person, Publisheropen controlledvocabularies (real entities) → mapto: CLAVAS-organisations, LT-World (perhapsothers: LCCN, DBPedia?) Candidate Categories/Properties

  13. InstallcurrentOpenSKOSatCCV – CLARIN Center Vienna • synchronize 3 currentdatasetsvia OAI-PMH withsisterinstanceatMeertensalso totestthesynchronizationprocess (andimplications) • CMD2RDF • „specialgroupsvocabularies“ in CLARIN-AT • Plant names • Instruments Next Steps

  14. Explanationsto SMC andCMDI Appendix

  15. metadatafields in (completely) different profilesbut boundto (the same) datacategories (ConceptLinks) • usethislinkagewhensearching in thedatai.e. allowtheusertosearch • „in thedatacategory“ • in a MD field but also all relatedfieldsfromotherprofiles • Multiple mappinglevels: • just mapping based on the ConceptLink resolvable via ComponentRegistrydifferent elements pointing to the same DatCat • use equivalence relations between DatCats from Relation Registry • use equivalence relations also between Container DatCats • use also other relations in Relation Registry (subClassOf, almostSameAs, …) • apply selected (user defined) relation sets from Relation Registry Semantic Mapping (schemalevel) - concept

  16. components and elements in CMD profiles are bound to data categories • the CMD records reference their profiles • in Relation Registry data categories are related to each other in separate (possibly overlapping/contradicting) relation sets CMDI linking

  17. separate CMDI module • relies on informationfromComponentRegistry, DCR, RelationRegistry • isusedbyMetadata Repository / Service / Browser • Task: resolution: dcrIndex↔ cmdIndex dcrIndex :: (abstract) data category defined in DCRcmdIndex :: path to a field in a MDRecord • (different from - queryexpansion: CQL(datcat) → CQL(cmdIndex[]) - querytranslation: e.g. CQL → XPath Semantic Mapping Component

  18. resourceName isocat:DC-2544 • CorpusProfile.Corpus.Metadata.Name • CorpusProfile.Corpus.SourceList.Source.Name • collection.GeneralInfo.Name • Session.Name • imdi-corpus.Corpus.Name • ToolService.GeneralInfo.Name • GTRP.Collection.GeneralInfo.Name • DIDDD.Collection.GeneralInfo.Name • Soundbites.Collection.GeneralInfo.Name • DynaSAND.Collection.GeneralInfo.Name BUT: • CMD Element: „Name“ • http://www.isocat.org/datcat/DC-2544 • http://www.isocat.org/datcat/DC-2536 • http://www.isocat.org/datcat/DC-4160 • http://www.isocat.org/datcat/DC-4176 • http://www.isocat.org/datcat/DC-4180 • http://purl.org/dc/elements/1.1/rights • http://purl.org/dc/elements/1.1/contributor • http://www.isocat.org/datcat/DC-2454 • http://www.isocat.org/datcat/DC-2557 • … Examples of DCR use in CMD metadata

  19. languageID isocat:DC-2482 • LrtInventoryResource.LrtCommon.Languages.ISO639.iso-639-3-code • Session.MDGroup.Content.Content_Languages.Content_Language.Id • Session.MDGroup.Actors.Actor.Actor_Languages.Actor_Language.Id • Session.Resources.WrittenResource.LanguageId • ToolService.Documentation.DocumentationLanguages.Language.ISO639.iso-639-3-code • ToolService.Tool.Documentation.DocumentationLanguages.Language.ISO639.iso-639-3-code • GTRP.Collection.DocumentationLanguages.Language.ISO639.iso-639-3-code • DIDDD.Collection.DocumentationLanguages.Language.ISO639.iso-639-3-code • DynaSAND.Collection.DocumentationLanguages.Language.ISO639.iso-639-3-code • languageName isocat:DC-2484 • ToolService.Documentation.DocumentationLanguages.Language.LanguageName • ToolService.Tool.Documentation.DocumentationLanguages.Language.LanguageName • GTRP.Collection.DocumentationLanguages.Language.LanguageName • DIDDD.Collection.DocumentationLanguages.Language.LanguageName • DynaSAND.Collection.DocumentationLanguages.Language.LanguageName • dct:language • OLAC-DcmiTerms.language • metadataLanguage isocat:DC-2543 • CorpusProfile.Corpus.Metadata • dominantLanguage isocat:DC-2468 • Session.MDGroup.Content.Content_Languages.Content_Language.Dominant • sourceLanguage isocat:DC-2494 • Session.MDGroup.Content.Content_Languages.Content_Language.SourceLanguage • targetLanguage isocat:DC-2499 • Session.MDGroup.Content.Content_Languages.Content_Language.TargetLanguage Examples of DCR use in CMD metadata II implementationLanguage isocat:DC-3798 - ToolService.Tool.Implementation.implementationLanguage

  20. as of 2012-05 DCR usage in Component Registry Components structure

  21. SMC Browser Explore the Component Metadata Framework Profile specifications from Component Registryvisualized as interactive graphs statistics (about reuse of Components) http://clarin.aac.ac.at/smc-browser/ TODO • feed with statisticsof the instance data • add relations from RELcat • add operations on graphs(intersection, difference, …)

  22. SMC Browser Explore the Component Metadata Framework

More Related