1.06k likes | 1.21k Views
Publishing research information as Linked Data Proposal of Recommendations. Miguel- Ángel Sicilia. EuroCRIS meeting. February 2012. ROADMAP. Introduction & Motivation Stakeholders Example Architecture Basic Principles of the LD Exposure CERIF Ontology
E N D
Publishing research information as Linked DataProposal of Recommendations Miguel-Ángel Sicilia EuroCRIS meeting. February 2012
ROADMAP • Introduction & Motivation • Stakeholders • Example Architecture • Basic Principles of the LD Exposure • CERIF Ontology • Recipes for the CERIF LD Exposure • CERIF Model Extension • Key Use Case • Demo • Bootstrapping • Issues and challenges • Conclusions
A point of departure? • “CERIF and Linked Data are similar, complementary approaches. However, there are significant differences in the way they encode relationships. EXRI-UK reviewed these approaches against higher education needs and recommended that CERIF should be the basis for the exchange of research information in the UK. CERIF is currently better able to encode the rich information required to communicate research information, and has the organisational backing of EuroCRIS, ensuring it is well-managed and sustainable. • EXRI-UK final report, http://www.jiscinfonet.ac.uk/infokits/research/exri-uk
XML data interchange generate parse send/reception RIS Database (CERIF) RIS Database (CERIF)
WebAPI WebAPI WebAPI WebAPI A B C D Limitations (from a linked data perspective) Adapted from: Christian Bizer: The Web of Linked Data (26/07/2009) Shortcomings • APIs provide proprietary interfaces (even though CERIF XML standardizes the interchange format) • Aggregators are based on a fixed set of data sources. (not necessarily, but require some registry of providers). • You can notset hyperlinks neitherbetween RIS entities (projects, people, organizations, publications) descriptions nor from them to other data or terminologies. Aggregator (harvester or query client)
The linked data approach Adapted from: Christian Bizer: The Web of Linked Data (26/07/2009) • Use RDF to provide CERIF metadatabasedonthe XML mapping • Add links using different kinds of relations rel(mapping of CERIF link entities?). • Connect to terminologies using some Classification (cls). (an extension of keywords in CERIF?) • Link to other LOD datasets instead of repeating information. rel RDF RDF RDF RDF RDF cls cls RDF RDF RDF RDF RDF RDFlink RDFlinks RDFlinks RDFlinks B A DBpedia C D Terminology server
Browsing & integrating Adapted from: Christian Bizer: The Web of Linked Data (26/07/2009) Data integrator (combines Information of several cfPers, cfProj or cfOrgUnit, e.g. for analyzing country or call outcomes) Data integrator (combines information for a given cfPers, cfProj or cfOrgUnit) Browser RI Term RI RI Term RI Term RI RI Term typedlinks typedlinks typedlinks typedlinks A E C D B
A proposal • Higher Education Institutions (HEI) or R&D institutions • Funding bodies (FB) • Research Authorities (RA) • Researchers • Research information Enterprises (RIE) • General public • Enterprises • …which are their critical use cases and their “killer apps”?
Strategies for publish linked data • ALTERNATIVES FOR THE EXPOSURE OF LINKED DATA • Providing a endpoint for enquiries • Serving Static RDF Files • Serving RDF Embedded in HTML Files • Serving LD from RDF Triple Stores • Serving LD by wrapping Web APIs • Serving LD from Relational Databases • FACTORS AFFECTING THE DECISION • How much data do you want to serve? • How is your data currently stored? • How often does your data change?
Internet Navigator File Favourites Help URL: http://cris.myorganization.org Project PAPERS Linked Data-The Story So Far [PDF] de igeex.bizT Berners-Lee - International Journal on Semantic Web and …, 2009 - igi-global.com Citadopor 294 - Artículosrelacionados - Las 19 versiones - Importar al BibTeX Back! RIS architecture RIS Database (CERIF) RIS Application Server
Internet Navigator File FaouritesHelp Project PAPERS relacionados - Las 19 versiones - Importar al BibTeX Back! RIS-LD architecture RIS Application Server RIS Database (CERIF) <http://cris.myOrganization.org:2020/resource/projects/Organic.Edunet> a cerif:Project ; rdfs:label "Multilingual Federation of Learning Repositories"@en-uk ; cerif:acronym "Organic.Edunet" ; cerif:endDate "2010-09-30"^^xsd:date ; cerif:internalIdentifier "ff808181300cf99e01300d1a355f0003" cerif:isLinkedByOrganisationUnit D2R Server
URI scheme published by D2R • http://cris.myorg.org/resource/RESOURCE_ID • LD Identifier for a given resource • http://cris.myorg.org/data/RESOURCE_ID • Resource description of a given resource in RDF (N3) • http://cris.myorg.org/page/RESOURCE_ID • Resource description of a given resource in HTML
Internet Navigator File Favourites Help URL: http://mashup.org Opening our cerif datasets RIS-LD mashup <http://cris.myOrganization.org:2020/resource/projects/Organic.Edunet> a cerif:Project ; rdfs:label "Multilingual Federation of Learning Repositories"@en-uk ; cerif:acronym "Organic.Edunet" ; cerif:endDate "2010-09-30"^^xsd:date ; cerif:internalIdentifier "ff808181300cf99e01300d1a355f0003" cerif:isLinkedByOrganisationUnit
Benefits of our architecture • Exposure of Liked Data without altering the current research information system (non-intrusive) • Linked Data interface:RDF descriptions of individual resources stored in DB over the HTTP protocol • SPARQL endpoint (the SQL of Linked Data) • Traditional HTML interface: web pages describing resources • Simple way of interchanging data on the Web • Create new third party applications using open linked data from RIS systems
General Principles of the LOD approach • Use URIs as names for things. • Use HTTP URIs so that people can look up those names. • When someone looks up a URI, provide useful information, using the standards (RDF*, SPARQL) • Include links to other URIs. so that they can discover more things.
Re-using of well-known terms • We need an ontology for the CERIF model elements • "Do not reinvent the wheel" • Data can be consumed by applications that may be tuned to well-known vocabularies • Foster interoperability between different datasets
self-described and consistent terms • Logical entities are translated into RDF classes and their attributes into RDF properties • CF prefixes are not necessary for ontology terms • Instead, URI namespaces • Properties and Classes self-described • rdfs:label (title case capitalized version of the property/class) • rdfs:comment (a plain text description of the
URI Design • Essential to enable interoperability and understanding • Create human-readable and memorable URIs • Avoid using artificial primary keys • Discover URIs using similarity heuristics • Follow a similar schema/pattern for URIs • http://cris.myorg.org/resource/ENTITYNAME/ENTITYID • Example for a identifier for the EU project “Virtual Open Access ...” hosted at University of Athens • http://cris.aua.gr/resource/projects/VOA3R
Where do RI datasets live? • Higher ed or R&D institutions maintain repositories centred on Pers, cfOrgUnit (internal) and sometimes cfProj and emphasizing cfResPubl, cfResPat. • Funding bodies are centred around cfProj, cfOrgUnit (mostly legal bodies, not internal) and cfFundProg and related. • Bibliographic and citation databases focus on cfResPubl, cfResPatand in general provide poor support for cfPers and cfOrgUnit.
Distributed datasets • Research Information is distributed • Frequently, there is duplicated information in different RIS systems. • ID for VOA3R Project in the University of Athens dataset* • http://cris.aua.gr/resource/projects/VOA3R • ID for VOA3R Project in the University of Alcaládataset* • http://cris.uah.es/resource/projects/VOA3R • No Problem: a same concept can be identified by different URIs in Linked Data • Using owl:sameAspredicate * Assuming that there is a corporate RIS available in http://cris.....
CERIF Ontology CERIF Ontology http://eurocris.org/cerif CERIF Semantic Vocabulary Other vocabularies http://eurocris.org/semcerif
EuroCRIS website for publishing ontologies Currentversion at http://spi-fm.uca.es/neologism/
Visual Representation of the ontologies Currentversion at http://spi-fm.uca.es/neologism/cerif
Recipes for the CERIF LD Exposure MULTIPLE LANGUAGE FEATURES
Recipes for the CERIF LD Exposure SEMANTIC FEATURES
CERIF Semantics document From a PDFdocumentwiththeCERIFsemantics…
CERIF Semantic vocabulary …To a RDFVocabularywiththe roles and classificationterms Currentversion at http://spi-fm.uca.es/neologism/semcerif
CERIF using external vocabularies. • The predicates cerif:classification and cerif:role enable to use external vocabularies to enrich our data CERIF Ontology cerif:role cerif:classification CERIF Semantic Vocabulary Other vocabulary 1 Other vocabulary N ...
Recipes for the CERIF LD Exposure ADDITIONAL FEATURES
CERIF additional features • The current CERIF model contains Dublin Core and Formalised Dublin Core entities and attributes. • We will use external vocabularies through cerif:role and cerif:classification properties • Avoiding the need of storing and publishing entities related to any terminology.
Recipes for the CERIF LD Exposure BASE ENTITIES
CERIF base entity PROJECT • Project Acronym (cfProj.cfAcro) will be part of the resource identifier (ID) • http://cris.myorganization.org/resource/projects/ID
PREFIXES used in examples • # Bult-on prefixes • @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . • @prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . • @prefix owl: <http://www.w3.org/2002/07/owl#> . • # External vocabularies • @prefix foaf: <http://xmlns.com/foaf/0.1/> . • @prefix dc: <http://purl.org/dc/elements/1.1/> . • @prefix dcterms: <http://purl.org/dc/terms/> . • @prefix bibo: <http://purl.org/ontology/bibo/> . • # CERIF • @prefix cerif: <http://eurocris.org/cerif#> . • @prefix semcerif: <http://eurocris.org/cerif#> .
Description of a cerif PROJECT (I) <http://cris.myOrganization.org/resource/projects/VOA3R> a cerif:Project ; rdfs:label"Repositorio de Agricultura y Acuicultura de accesoabierto virtual"@es-es , "Virtual Open Access Agriculture & Aquaculture Repository"@en-uk ; dc:title"Repositoriode Agricultura y Acuicultura de accesoabierto virtual"@es-es , "Virtual Open Access Agriculture & Aquaculture Repository"@en-uk ; cerif:title"Repositorio de Agricultura y Acuicultura de accesoabierto virtual"@es-es , "Virtual Open Access Agriculture & Aquaculture Repository"@en-uk ; cerif:internalIdentifier"ff8080812ddb916a012ddb9170b60001" ; cerif:acronym"VOA3R" ;
Description of a cerif PROJECT (II) dcterms:abstract"The general objective of the VOA3R project is to improve the spread of European agriculture and aquaculture research results by using an innovative approach to sharing open access research products. "@en-uk; cerifs:abstract"The general objective of the VOA3R project is to improve the spread of European agriculture and aquaculture research results by using an innovative approach to sharing open access research products. "@en-uk ; foaf:homepage<http://voa3r.eu/> ; cerif:uri <http://voa3r.eu/> ; cerif:startDate"2010-06-01"^^xsd:date ; cerif:endDate "2013-05-31"^^xsd:date ;
CERIF base entity ORGANISATION UNIT • Organisation Acronym (cfOrgUnit.cfAcro) will be part of the resource identifier (ID) • http://cris.myorganization.org/resource/organisationUnits/ID