life science knowledge collider n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Life Science Knowledge Collider PowerPoint Presentation
Download Presentation
Life Science Knowledge Collider

Loading in 2 Seconds...

play fullscreen
1 / 20

Life Science Knowledge Collider - PowerPoint PPT Presentation


  • 132 Views
  • Uploaded on

Life Science Knowledge Collider. Vassil Momtchev (Ontotext). Presentation Outline. Life Sciences Domain Integration Problems Pathway and Interaction Knowledge Base Linked Life Data LifeSKIM Application to Show Case Platform. Andy Law’s First Law.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Life Science Knowledge Collider' - colm


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
life science knowledge collider

Life Science Knowledge Collider

Vassil Momtchev

(Ontotext)

presentation outline
Presentation Outline
  • Life Sciences Domain Integration Problems
  • Pathway and Interaction Knowledge Base
  • Linked Life Data
  • LifeSKIM Application to Show Case Platform

ESTC

andy law s first law
Andy Law’s First Law

“The first step in developing a new genetic analysis algorithm is to decide how to make the input data file format different from all pre-existing analysis data file formats.”

ESTC

the problem
The problem!
  • The data is supported by different organizations
  • The information is highly distributed and redundant
  • There are tons of flat file formats with special semantics
  • The knowledge is locked in vast data silos
  • There are many isolated communities which could not reach cross-domain understanding

ESTC

andy law s second law
Andy Law’s Second Law

“The second step in developing a new genetic analysis algorithm is to decide how to make the output data file format incompatible with all pre-existing analysis data file input formats.”

ESTC

pikb overview
PIKB Overview
  • Stands for Pathway and Interaction Knowledge Base (PIKB)
  • Interactions in the cell unveil the molecular mechanisms
    • Which molecular function or a biological process is affected after the admission of given drug?
    • What is the involvement of chemical compounds to a specific biological process or disease?
  • The work is developed in context LARKC and it is refined with AstraZeneca researcher
  • The use case of “Semantic Integration for Early Clinical and Drug Development” will be assessed with clinical data of AstraZeneca

ESTC

slide8

LARKC Project

  • “Web Scale and Style Reasoning”
  • Giving up 100% correctness:
    • trading quality for size
    • often completeness is not needed
    • sometimes even soundness is not needed

logic

precision (soundness)

Semantic Web

IR

recall (completeness)

ESTC

pikb objectives
PIKB Objectives
  • Easily integrate pathway and interaction data from different sources
  • Allow straightforward updates of the information
  • Provide scientists with computational support to conceptualize the breath and depth of relationships between data
  • Scale up to billions of statements

ESTC

pikb data sources
PIKB Data Sources

Type of data sources

Database name

Entrez-Gene

Uniprot

iProClass

GeneOntology GeneOntology

NCBI Taxonomy

BioGRID, NCI, Reactome, BioCarta, KEGG, BioCyc

Sometimes we need to ask far more questions efficiently:

Give me all proteins which interacts in nucleus and are annotated with repressor and have at least one participants that is encoded by gene annotated with specific term and is located in chromosome X? Filter the results for Mammalia organisms!

  • Gene and gene annotations
  • Protein sequences
  • Protein cross references
  • Gene and gene product annotations
  • Organisms
  • Molecular interaction and pathways

Give all terms more specific than “cell signaling” (e.g., synaptic transmission, transmission of nerve impulse)

List all primates sub categories?

Give me all human genes which are located in X chromosome?

List all protein identifiers encoded by gene IL2?

Give me all human proteins associated with endoplasmic reticulum?

List all articles where protein Interleukin-2 is mentioned?

Give me all interactions of cell division protein kinase?

List me all cross references to a protein Interleukin-2?

ESTC

possible solutions
Possible Solutions
  • Classical data-integration with:
    • data warehouses
    • federation middleware frameworks
    • database middleware technology
  • Not really...
    • Mapping works efficiently on a small scale
    • Different design paradigm can be a real challenge
    • Direct mapping usually does not work
    • No standard way to integrate textual information

ESTC

our approach
Our Approach
  • Convert all data sources to RDF representation (if not already distributed)
  • Collide the data to scalable semantic repository
  • Apply light-weight reasoning to specify formal interpretations of the data (e.g., remove redundancy)
  • Derive new implicit knowledge

ESTC

try to visualise it
Try to Visualise it

urn:biogrid:Interaction

urn:uniprot:Protein

urn:uniprot:FBgn0068575

urn:biogrid:FBgn0068575

rdf:type

sameAs

rdf:type

urn:pubmed:15904

rdf:seeAlso

rdf:type

urn:intact:Interaction

urn:uniprot:Q709356

hasParticipant

Use relationships to derive new implicit knowledge

Resolve the syntactic differences in the identifiers

interactsWith

sameAs

rdf:type

interactsWith

urn:biogrid:15904

hasParticipant

urn:uniprot:P104172

urn:intact:1007

sameAs

rdf:seeAlso

urn:biogrid:FBgn00134235

urn:uniprot:FBgn00134235

These are only examples resource names

ESTC

linked life data overview
Linked Life Data Overview
  • Platform to automate the process:
    • Infrastructure to store and inferences
    • Transform the structured data sources to RDF
    • Provide web interface to access the data
  • Currently operates over OWLIM semantic repository
  • LinkedLifeData - PIKB statistics:
    • Number of statements: 1,159,857,602
    • Number of explicit statements: 403,361,589
    • Number of entities: 128,948,564
  • Publicly available at: http://www.linkedlifedata.com

ESTC

lifeskim application
LifeSKIM Application
  • A platform offering software infrastructure for:
    • automatic semantic annotation of text
    • ontology population
      • Store the extracted facts and reason on top of them
  • Semantic indexing and retrieval of content
  • Query and navigation involving structured knowledge
  • Based on Information Extraction (i.e. text-mining) technology

ESTC

how lifeskim searchers better
How LifeSKIM Searchers Better?
  • LifeSKIM can match a query

Documents about interleukin 6 (interferon, beta 2) where is connected to apoptosis of neutrophils .

  • With a document containing

…. the same effect was not observed for IFNB2, IL-8 and TNF-alpha…….. …. is induced neutrophil programmed cell death by apoptosis……

ESTC

how lifeskim searchers better1
How LifeSKIM Searchers Better?

The classical IR could not match:

  • interleukin 6 with a HGF; HSF; BSF2; IL-6; IFNB2

Interleukin 6 is a an entity in Entrez-Gene with GeneID: 3569, and HGF; HSF; BSF2; IL-6; IFNB2 are aliases for the same gene entity.

  • apoptosis of neutrophilswith neutrophil apoptosis; programmed cell death of neutrophils by apoptosis; programmed cell death, neutrophils; neutrophil programmed cell death by apoptosis;

GeneOntology thesaurus adds the above list of terms as part of apoptosis of neutrophils term.

ESTC

thanks
Thanks

AstraZeneca

  • Bosse Andersson
  • Elisabet Söderhielm
  • Kaushal Desai

Ontotext

  • Deyan Peychev
  • Georgi Georgiev
  • OWLIM team
  • KIM team

The development of PIKB and Linked Life Data is partially funded by FP7 215535 LarKC

ESTC