Rdf based on integration of pathway database and gene ontology
This presentation is the property of its rightful owner.
Sponsored Links
1 / 25

RDF based on Integration of Pathway Database and Gene Ontology PowerPoint PPT Presentation


  • 133 Views
  • Uploaded on
  • Presentation posted in: General

RDF based on Integration of Pathway Database and Gene Ontology. SNU OOPSLA LAB. 2005 DongHyuk Im. Contents. Introduction Pathway Database Enzyme Database Gene Ontology Related Works Our Approach Supporting Function Data Transformation Integration of KEGG, Enzyme, Gene Ontology

Download Presentation

RDF based on Integration of Pathway Database and Gene Ontology

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Rdf based on integration of pathway database and gene ontology

RDF based on Integration of Pathway Database and Gene Ontology

SNU OOPSLA LAB.

2005

DongHyuk Im


Contents

Contents

  • Introduction

    • Pathway Database

    • Enzyme Database

    • Gene Ontology

  • Related Works

  • Our Approach

    • Supporting Function

    • Data Transformation

    • Integration of KEGG, Enzyme, Gene Ontology

    • Querying using SeRQL


Pathway

Pathway?

  • Most chemical reaction mechanisms are translated from a compound(substrate) to a compound(product) by enzyme acting

  • Importance

    • to comparison and analyze pathways in order to understand the process of creating compounds and the evolutive relevance between organisms

    • Drug Discovery


Pathway1

Pathway

Map : Glycolysis / Gluconeogenesis

Map : Aquifex aeolicus


Enzyme database

Enzyme Database

  • EC number

  • Recommended name

  • Alternative names(if any)

  • Catalytic activity

  • Cofactors (if any)

  • Pointers to the SWISS-PORT entrie(s) that correspond to the enzyme (if any)

  • Pointers to disease(s) associated with a deficiency of the enzyme (if any)


Enzyme hierarchy

Enzyme Hierarchy

[*]

  • Four levels

    • EC number

    • Ex) 1.1.1.1 is a member of the top level group [1]

    • The leftmost number identifies the highest level

    • [2.4.2.3] – [2.4.2.4](sibling) : similar reactions in pathway

[1]

[2]

[3]

[2.1]

[2.2]

[2.3]

[2.2.1]

[2.2.2]

[2.2.3]

[2.2.2.1]

[2.2.2.2]

[2.2.2.3]


Gene ontology

Gene Ontology


Rdf based on integration of pathway database and gene ontology

KEGG


Rdf based on integration of pathway database and gene ontology

KEGG

  • To computerize all aspects of cellular functions in terms of the pathway of interacting molecules or genes

  • To maintain gene catalogs for all organisms and link each gene product to a pathway component

  • To organize a database of all chemical compounds in the cell and link each compound to a pathway component

  • To develop computational technologies for pathway comparison, reconstruction, and analysis


Why rdf integration

Why RDF Integration?

  • Pathway data model : DAG

    • RDF is a good model for representing pathway

      • RDF data model : DAG

  • Need integration of multiple knowledge sources available from internet : one of the major problems in biologists

    • RDF is a good model for same standard

  • Enzyme, GO : hierarchy structure

    • RDF is a good model for representing hierarchy structure

  • GO annotation is important

    • Enzymes(proteins) in certain pathway need GO annotation


Related works

Related Works

  • KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res.

  • YeastHub: a semantic web case for integrating data in the life science domain, 2005, Bioinformatics

  • LIGAND: database of chemical compounds and reactions in biological pathways, 2002, Nucleic Acids Res.

  • Gene Ontology: tool for the unification biology, the Gene Ontology Consortium, 2000, Nature Genetics.


Our system s supporting

Our System’s Supporting

  • KEGG

    • Search compound

    • Path prediction

    • Search Enzyme

  • Our system’s function to add

    • Integration Query (pathway+enzyme+GO)

      • Relaxation Query using GO hierarchy

      • Searching pathway using enzyme information


Search compounds

Search Compounds

target

Compound : C00668


Pathway prediction tool

Pathway Prediction Tool

compound

Relaxation query using enzyme hierarchy


Search enzyme

Search Enzyme

Enzyme : 5.3.1.9


From pathway to gene ontology

From Pathway to Gene Ontology

Select enzyme


Data translation for integration

Data Translation for Integration

GENOS Storage

XSLT

KGML Data

KEGG RDF Data

Adding GO ID

Enzyme RDF Data

GO RDF Data

XSLT : http://www.w3.org/2005/02/13-KEGG/


Kegg rdf data 1 2

KEGG RDF Data(1/2)

Gene entry

<k:entry>

<Gene rdf:nodeID="_1">

<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/aae#aq_186"/>

<k:reaction rdf:resource="http://www.w3.org/2005/02/13-KEGG/rn#R00710"/>

<k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?aae+aq_186"/>

<k:graphics><Rectangle k:name="aldH1" k:fgcolor="#000000"

k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/>

</k:graphics>

</Gene>

</k:entry>

Enzyme entry

<k:entry>

<Enzyme rdf:nodeID="_3">

<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/ec#1.2.1.5"/>

<k:graphics>

<Rectangle k:name="1.2.1.5" k:fgcolor="#000000"

k:bgcolor="#FFFFFF" k:x="170" k:y="1039" k:width="45" k:height="17"/>

</k:graphics>

</Enzyme>

</k:entry>

No information

Compound entry

<k:entry>

<Compound rdf:nodeID="_4">

<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00033"/>

<k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?compound+C00033"/>

<k:graphics>

<Circle k:name="C00033" k:fgcolor="#000000"

k:bgcolor="#FFFFFF" k:x="102" k:y="971" k:width="8" k:height="8"/>

</k:graphics>

</Compound>

</k:entry>


Kegg rdf data 2 2

KEGG RDF Data(2/2)

Relation

<k:relation>

<ECrel>

<k:entry1 rdf:resource="_42"/>

<k:entry2 rdf:resource="_48"/>

<compound rdf:resource="_88"/>

</ECrel>

</k:relation>

Reaction

<k:reaction reversible="" rdf:about="http://www.w3.org/2005/02/13-KEGG/rn#R00710">

<k:substrate rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00084"/>

<k:product rdf:resource="http://www.w3.org/2005/02/13-KEGG/cpd#C00033"/>

</k:reaction>


How to process kegg pathway

How to Process KEGG Pathway

  • Problem

    • GENOS(Sesame) does not support multiple graph

    • KEGG data consists of multiple documents

      • Ex) map00010.rdf, aae00010.rdf …

  • Solution

    • Using namespace, we can distinguish maps

    • When Storing pathway data, pathway’s map name is added as a namespace in resource table of GENOS


Processing pathway data

Processing Pathway Data

<k:Pathway k:org="aae" k:number="00010" k:title="Glycolysis / Gluconeogenesis">

….

….

<k:entry>

<Gene rdf:nodeID="_1">

<k:name rdf:resource="http://www.w3.org/2005/02/13-KEGG/aae#aq_186"/>

<k:reaction rdf:resource="http://www.w3.org/2005/02/13-KEGG/rn#R00710"/>

<k:link rdf:resource="http://www.genome.jp/dbget-bin/www_bget?aae+aq_186"/>

<k:graphics><Rectangle k:name="aldH1" k:fgcolor="#000000"

k:bgcolor="#BFFFBF" k:x="170" k:y="1018" k:width="45" k:height="17"/>

</k:graphics>

</Gene>

</k:entry>

conflict

triples table

of GENOS

resources

table of GENOS


Integrating databases

Integrating Databases

Enzyme number

GO ID


Relaxation querying using serql

Relaxation Querying using SeRQL

E1

subclassof

subclassof

E1.*

C2

C1

E1.*

SeRQL

SELECT C1,C2

FROM Path_EXP

WHERE E1 LIKE “1.*"

Dewey order

Ex. 1.1 and 1.2 are childrens of 1

use Prefix


Considering performance

Considering Performance

KEGG : Pathway List

aae:aq_018path:aae03010

aae:aq_020path:aae03010

aae:aq_021path:aae00400

….

….

….

….

eco:b1236path:eco00052

eco:b1236path:eco00500

eco:b1236path:eco00520

….

using genes_index

Genes

Map


Schedule

Schedule

  • Implementation (~11/30)

    • Integrated Databases

    • Query Processor for pathway

    • Simple UI (Web :JSP)

  • Complete Paper (~12/10)


  • Login