scientific rdf databases n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Scientific RDF Databases PowerPoint Presentation
Download Presentation
Scientific RDF Databases

Loading in 2 Seconds...

play fullscreen
1 / 49

Scientific RDF Databases - PowerPoint PPT Presentation


  • 94 Views
  • Uploaded on

Scientific RDF Databases. Michael Mertens K.U.Leuven. Outline. Introduction to RDF RDF Databases Advantages for scientific R&D In practice Criticism. Outline. Introduction to RDF RDF Databases Advantages for scientific R&D In practice Criticism. Introduction.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Scientific RDF Databases' - cera


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
scientific rdf databases

Scientific RDF Databases

Michael Mertens

K.U.Leuven

outline
Outline
  • Introduction to RDF
  • RDF Databases
  • Advantages for scientific R&D
  • In practice
  • Criticism
outline1
Outline
  • Introduction to RDF
  • RDF Databases
  • Advantages for scientific R&D
  • In practice
  • Criticism
slide4

Introduction

RDF: Resource Description Framework

  • Originally: metadata data model
  • Now: General method for conceptual description for web resources (Semantic Web)
slide5

Introduction

>Semantic Web

  • Traditional Web in 2009:
  • Sharing documents
  • URL as retrieval mechanism
  • HTML standard format
  • Hypertext links

Image taken from “The Emerging Web of Linked Data”, Chris Bizer

slide6

Introduction

>Semantic Web

  • Data on the web
    • HTML describes documents and links between them
    • Semantic web:
      • Publish data in RDF, OWL, XML, ..
      • Describe arbitrary things: people, books, events, ..
      • Link between these concepts
      • Machine-readable, web-accessible databases
slide7

Introduction

> Semantic Web > Linked Data

  • Tim-Berners Lee: LINKED DATA
  • Connected structured data
  • 3 simple principles:
    • URLs for conceptual things
    • Returns useful data about that thing
    • Relationships link to other URLs
slide8

Introduction

> Semantic Web > Linked Data > Example

  • Before: Scientific data usually not shared
  • Pharmaceutical Drug Discovery
    • A lot of spread out data
      • Drug Bank, ClinicalTrial.gov, Health Care and Life Science
    • Genomics data, Protein data, ..
  • A question nobody examined before:

“What Proteins are involved in signal transduction AND are related to pyramidal neurons?”

Example taken from “Tim Berners-Lee on the next Web”

slide9

Introduction

> Semantic Web > Linked Data > Example

  • The web: 223,000 hits, 0 results

Example taken from “Tim Berners-Lee on the next Web”

slide10

Introduction

> Semantic Web > Linked Data > Example

  • Linked Data: 32 hits, 32 results

DRD1, 1812 adenylate cyclase activation

ADRB2, 154 adenylate cyclase activation

ADRB2, 154 arrestin mediated desensitization of G-protein coupled … DRD1IP, 50632 dopamine receptor signaling pathway

DRD1, 1812 dopamine receptor, adenylatecyclase activating pathway

DRD2, 1813 dopamine receptor, adenylatecyclase inhibiting pathway

GRM7, 2917 G-protein coupled receptor protein signaling pathway

GNG3, 2785 G-protein coupled receptor protein signaling pathway

GNG12, 55970 G-protein coupled receptor protein signaling pathway

DRD2, 1813 G-protein coupled receptor protein signaling pathway

ADRB2, 154 G-protein coupled receptor protein signaling pathway

CALM3, 808 G-protein coupled receptor protein signaling pathway

HTR2A, 3356 G-protein coupled receptor protein signaling pathway

DRD1, 1812 G-protein signaling, coupled to cyclic nucleotide second… SSTR5, 6755 G-protein signaling, coupled to cyclic nucleotide second…

MTNR1A, 4543 G-protein signaling, coupled to cyclic nucleotide …

HTR6, 3362 G-protein signaling, coupled to cyclic nucleotide second …

GRIK2, 2898 glutamate signaling pathway

GRIN1, 2902 glutamate signaling pathway

GRIN2A, 2903 glutamate signaling pathway

GRIN2B, 2904 glutamate signaling pathway

ADAM10, 102 integrin-mediated signaling pathway

GRM7, 2917 negative regulation of adenylatecyclase activity

LRP1, 4035 negative regulation of Wnt receptor signaling pathway

ADAM10, 102 Notch receptor processing

ASCL1, 429 Notch signaling pathway

HTR2A, 3356 serotonin receptor signaling pathway

ADRB2, 154 transmembrane receptor protein tyrosine kinase … PTPRG, 5793 transmembrane receptor protein tyrosine kinase … EPHA4, 2043 transmembrane receptor protein tyrosine kinase … NRTN, 4902 transmembrane receptor protein tyrosine kinase … CTNND1, 1500 Wnt receptor signaling pathway

Example taken from “Tim Berners-Lee on the next Web”

slide11

Introduction

> Semantic Web > Linked Data > Example

PREFIX go: <http://purl.org/obo/owl/GO#>

PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>

PREFIX owl: <http://www.w3.org/2002/07/owl#>

PREFIX mesh: http://purl.org/commons/record/mesh/

SELECT ?genename ?processname

WHERE

{ graph http://purl.org/commons/hcls/pubmesh

{ ?paper ?p mesh:D017966.

?article sc:identified_by_pmid ?paper.

?gene sc:describes_gene_or_gene_product_mentioned_by ?article.}

graph <http://purl.org/commons/hcls/goa>

{ ?protein rdfs:subClassOf ?res.

?res owl:onProperty ro:has_function.

?res owl:someValuesFrom ?res2.

?res2 owl:onProperty ro:realized_as.

?res2 owl:someValuesFrom ?process.

graph <http://purl.org/commons/hcls/20070416/classrelations>

{{?process <http://purl.org/obo/owl/obo#part_of> go:GO_0007166}

union

{ ?process rdfs:subClassOf go:GO_0007166 }}

?protein rdfs:subClassOf ?parent.

?parent owl:equivalentClass ?res3.

?res3 owl:hasValue ?gene.}

graph <http://purl.org/commons/hcls/gene>

{ ?gene rdfs:label ?genename }

graph <http://purl.org/commons/hcls/20070416>

{ ?process rdfs:label ?processname}}

Related to Pyramidal Neurons

Part of Signal Transduction

Used 4 sources

Example taken from “Tim Berners-Lee on the next Web”

slide12

Introduction

> Semantic Web > Linked Data

slide13

Introduction

> Semantic Web > Linked Data

slide14

Introduction

> Semantic Web > Linked Data

  • What do we need?
    • Identifiers: URIs
    • Linking mechanism: HTTP
    • Vocabulary: Web Ontology Language (OWL)
    • Serialization: RDF/XML
slide15

Introduction

> Semantic Web > Linked Data

  • Identifiers: URIs
    • Use of HTTP URL
    • Link to “Resources”
    • Possibly many documents per resource
    • Shift to non-information resources:

http://dbpedia.org/resource/London

HTML: http://dbpedia.org/page/London

RDF: http://dbpedia.org/data/London.rdf

N3: http://dbpedia.org/data/London.ntriples

slide16

Introduction

> Semantic Web > Linked Data

  • Linking mechanism: HTTP
    • Accessible through generic data browsers
    • Allowing to be crawled by search engines
    • Connecting different sources
    • In contrast, Web APIs use different interfaces
slide17

Introduction

> Semantic Web > Linked Data

  • Vocabulary: Web Ontology Language (OWL)
    • Knowledge representation language
    • Designed to be interpreted by computers
    • Describes data, based on individuals (classes) and property assertions (relationships)

<owl:Class rdf:ID="Money">

<rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>

</owl:Class>

<owl:DatatypeProperty rdf:ID="currency">

<rdfs:domain rdf:resource="#Money"/>

<rdfs:range rdf:resource="http://www.w3.org/2001/XMLSchema#string"/>

</owl:DatatypeProperty>

slide18

Introduction

> Semantic Web > Linked Data

  • Vocabulary: Web Ontology Language (OWL)
    • Knowledge representation language
    • Designed to be interpreted by computers
    • Describes data, based on individuals (classes) and property assertions (relationships)
    • URIs about the same thing: ‘owl:sameAs’
slide19

RDF: Resource Description Framework

  • Based on triples
    • Subject, predicate, object
  • Resources identified by URI
  • URIs allow to look up RDF information
  • RDF information links to other URIs

< http://dbpedia.org/resource/London,

http://dbpedia.org/ontology/country,

http://dbpedia.org/resource/United_Kingdom >

slide22

RDF: Resource Description Framework

This looks a lot like XML..

Why don’t we just use XML??

slide23

RDF vs XML

RDF: <Page, author, Name>

XML:

<document href=“Page”>

<author>Name</author>

</document>

<document>

<details>

<uri>Page</uri>

<author>Name</author>

</details>

</document>

<author>

<uri>Page</uri>

<name>Name</name>

</author>

...

slide24

RDF: Serialization

  • RDF/XML: proposed by W3C
  • N3 or Turtle: human-readability

<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"

xmlns:dc="http://purl.org/dc/elements/1.1/">

<rdf:Description rdf:about="http://en.wikipedia.org/wiki/Tony_Benn">

<dc:title>Tony Benn</dc:title>

<dc:publisher>Wikipedia</dc:publisher>

</rdf:Description>

</rdf:RDF>

@prefix dc: <http://purl.org/dc/elements/1.1/>.

<http://en.wikipedia.org/wiki/Tony_Benn>

dc:title "Tony Benn";

dc:publisher "Wikipedia".

outline2
Outline
  • Introduction to RDF
  • RDF Databases
  • Advantages for scientific R&D
  • In practice
  • Criticism
slide26

RDF Databases

  • Also called “Triple Store”
  • Data in the form of triples:

Subject – predicate – object

  • Dominant query language: SPARQL

PREFIX abc: <nul://sparql/exampleOntology#> .

SELECT ?capital ?country

WHERE {

?x abc:cityname ?capital ;

abc:isCapitalOf ?y.

?y abc:countryname ?country ;

abc:isInContinent abc:Africa.

}

slide27

RDF Databases

  • Built on W3C’s “Linked Data”
  • Subset of “Graph databases”
  • Nodes (entities), edges (relationships), properties

Directed, labeled graph structure

(Predicate URI as label)

slide28

Graph View

Image taken from w3.org

slide29

RDF Databases

  • Only standarised NoSQL database
  • In contrast to normal RDBMS:
    • Very flexible data model
      • Do not require fixed table schema
    • Information as most basic building blocks
      • Enabling improvement on data-intensive operations
  • Examples: Ebay, Facebook, digg, ..
slide30

RDF Databases

  • Scalable: Distributed design
  • Self-Documenting Data
    • Vocabulary identified in OWL or RDFS definitions
    • Allows multiple schemata
  • Open
    • Discover new data sources at run-time
  • Often weak consistency guarantees
    • Solved with additional middleware
slide31

RDF Databases

Limitations of Relational Databases:

  • Not directly visible to web-agents
  • Primary-foreign key relationships
    • Meaning is implicit, unspecified semantics
  • No relationships across seperate databases
  • Parent-child relationship are not natural
    • “Self-joins” for each level in hierarchy
outline3
Outline
  • Introduction to RDF
  • RDF Databases
  • Advantages for scientific R&D
  • Criticism
  • In practice
advantages for scientific r d
Advantages for Scientific R&D
  • Studies continue to show that research in all fields is increasingly collaborative
  • Example: genomic research
    • Complex data distributed over many datasets
      • Entrez Gene (EG), Gene Ontology (GO), Swiss_Prot, GenBank, ..
advantages for scientific r d1
Advantages for Scientific R&D
  • Problem = Lack of well defined standards
    • Integration Nightmare:
      • data scattered, different formats, lacking information
      • synonyms, ambiguity
    • Changing models:
      • maintenance not feasible
    • Understanding and reasoning
      • need for connecting ontologies
  • Challenge: Syntatic and Semantic heterogeneity
slide35

Integration of Databases

> Challenges

  • Localization of resources
    • Identify relevant webresources
  • Data formats
    • Resources are represented in HTML, TXT, images, ..
  • Synonyms
    • Researchers can name their own data differently
slide36

Integration of Databases

> Challenges

  • Ambiguity
    • E.g. “insulin” can represent a drug, protein, gene, ..
  • Relations
    • One-to-one / One-to-many between identifiers
  • Granularity
    • Can cause missing data, ..
slide37

Integration of Databases

> Approaches

  • Data Warehouse Approach
    • Translate data in one local database
    • Eliminate unavailability & slow response
    • Allow data processing and optimalization
    • Maintenance problem
      • evolution of content and structure
    • Examples: BioWarehouse, Biozon, DataFoundry
slide38

Integration of Databases

> Approaches

  • Federated Database Approach
    • Translate queries for individual sources
    • Easier to maintain (e.g. Adding new source)
    • Poor performance
    • Examples: BioKleisli, DiscoveryLink, QIS
slide39

Integration of Databases

> Approaches

  • Semantic Web Approach
    • No need to map data models
    • Rely on standarized ontologies
    • Less work, better performance
    • But only if sources comply
outline4
Outline
  • Introduction to RDF
  • RDF Databases
  • Advantages for scientific R&D
  • In practice
  • Criticism
in practice
In Practice
  • Scientists need:
    • Access to data
    • Ability to utilize data
    • Handle uncertainty
in practice1
In Practice
  • Linked Open Data:
    • “We all need the same databases, for different decisions or applications”
    • Complements data in internal/licensed sources
    • Stimulates cross scientific sharing
slide43

Examples

  • Biological data: Human Genome Project
    • Increase in web-accessible databases
      • GenBank, Gene Ontology, UniProt, PhenoDB, ..
    • Integration is key problem
    • Increase in RDF availability
slide44

Examples

  • YeastHub
    • Registration of web-accessible database
      • Metadata according to Dublin Core standards using RSS1.0 to describe an ontology
    • Data Conversion
      • XML or RDB to RDF conversion
        • (eg Unique ID = RDF ID , rest of columns are properties)
    • Data Integration
      • Ad hoc RDF queries
      • Form-based queries (supervised)
outline5
Outline
  • Introduction to RDF
  • RDF Databases
  • Advantages for scientific R&D
  • In practice
  • Criticism
slide46

Criticism

  • Feasability
    • Human behavior and personal preferences
  • ‘Database hugging’
    • Organizations tend to keep data for themselves
  • Censorship and Privacy
slide47

Criticism

  • Published data reusable in research?
    • Requires:
      • Provenance information
      • Quality
      • Attribution
      • Consistency
      • ...
    • Out-of context data fails to respect scientific research methodology
slide48

References

  • Bringing Web 2.0 to bioinformatics2008, Zhang Zhang, Kei-Hoi Cheung and Jeffrey P. Townsend
  • Semantic web approach to database integration in life sciences2006, Kei-Hoi Cheung, Andrew K. Smith, Kevin Y.L. Yip, Christopher J.O. Baker and Mark B. Gerstein
  • Integrating large biomedical knowledge resources with RDF2007, Satya S. Sahoo, Olivier Bodenreider, Kelly Zeng, Amit Sheth
  • RDF/RDFS-based Relational Database Integration2006, Huajun Chen ,  Zhaohui Wu ,  Heng Wang ,  Yuxin Mao
slide49

Discussion

  • Has anyone ever worked with linked (RDF) data before? What are your experiences?
  • Will the semantic web grow to become the Giant Global Graph?
  • Why haven’t RDF databases taken off like Relational Databases?