Qiong cheng georgia state university joint work with robert harrison gsu alexander zelikovsky gsu
Download
1 / 31

Fast Alignments of Metabolic Networks - PowerPoint PPT Presentation


  • 151 Views
  • Uploaded on

Qiong Cheng Georgia State University Joint work with Robert Harrison (GSU) Alexander Zelikovsky (GSU). Fast Alignments of Metabolic Networks. Outline. Metabolic pathway & pathways model Background in metabolic network alignments Enzyme similarity Topology similarity

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Fast Alignments of Metabolic Networks' - lilika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Qiong cheng georgia state university joint work with robert harrison gsu alexander zelikovsky gsu

Qiong Cheng

Georgia State University

Joint work with

Robert Harrison (GSU)

Alexander Zelikovsky (GSU)

Fast Alignments of Metabolic Networks


Outline
Outline

  • Metabolic pathway & pathways model

  • Background in metabolic network alignments

    • Enzyme similarity

    • Topology similarity

  • Optimal alignment problem formulation- graph homo-homeomorphism

  • Computation model & solution for multi-source tree pattern

  • Experiments and analysis

  • Our software

  • Future work


Metabolic pathway pathways model
Metabolic pathway & pathways model

Metabolic pathways model

2.7.1.13

1.1.1.34

substrate

1.1.1.49

3.1.1.31

enzyme

substrate

product

1.1.1.44

enzyme

product

2.7.1.13

1.1.1.49

1.1.1.34

1.1.1.44

3.1.1.31

A portion of pentose phosphate pathway

  • Metabolic pathway


Alignments of metabolic pathways
Alignments of metabolic pathways

match

match

match

Mismatch/Substitute

  • Pattern P : query pathway

    Text T : pathway in database

  • Enzyme similarity and pathway topology similarity together represent the similarity of pathway functionality.


Enzyme similarity

=

Δ[X, Y]

log2c(X, Y)

=

=

=

Enzyme similarity

Enzyme D = d1 . d2 . d3 . d4

  • EC (Enzyme Commission) notation

  • Calculation of enzyme-to-enzyme dissimilarity score Δ[X, Y]

1) By the lowest common upper class distribution

e.g. X=1.1.1.39 Y=1.1.1.44

the lowest common upper class of X and Y is 1.1.1

c(X, Y ) = #({1.1.1.*})

2) By tight reaction property

Enzyme X = x1 . x2 . x3 . x4

Enzyme Y = y1 . y2 . y3 . y4

Δ[X, Y] = 1

=

=

=

=

=

Δ[X, Y] = 10

Δ[X, Y] = +∞

otherwise


Topology similarity
Topology similarity

Sv in VT

Δ(v, fv(v))

f

Text

Embedding - Subgraph isomorphism

  • gene duplication and function sharing

    • = vertex collapsing

    • 1+2=Graph homomophism

  • enzyme insertions

    • = edge subdividing

    • l -fine per insertion

    • 1+3=Approximate graph homeomorphism (Pinter et al 2005 )

Pattern

  • enzyme deletion

    • = bypass deletion : send vertex to b (Kelly et al 2005)

    • 1+3+4= graph homeomorphism

  • subpath deletion

    • = strong deletion : send vertex to d (Yang et al 2007) (1+5)

1+2+3+4+5 = graph homo-homeo morphism

=

l Se in ET (|fe(e)|-1)


Types of topology in alignments

A

B

C

D

A

X

D

A

A

B

X

B

C

D

Types of topology in alignments

  • Linear topology

(Forst & Schulten[1999], Chen & Hofestaedt[2004];)

  • Tree topology

(Pinter [2005] o(|VG|2|VT|/log|VG|+|VG||VT|log|VT|) )

  • Arbitrary topology

Mapping : Linear pattern  Graph (Kelly et al 2004) ( o(|VT|i+2|VG|2) )

Exhaustively search

(Sharan et al 2005 ( o(i!) o(|VT|i+2|VG|2) ), Yang et al 2007 ( o(2|VG||VG|2) )


Optimal alignment problem formulation graph homo homeo morphism
Optimal alignment problem formulation-graph homo-homeo morphism

  • Given:

    • a metabolic pathway P =<VP, EP> (Pattern) and

    • a metabolic network T =<VT, ET> (Text)

  • Find minimum cost alignment f : P  T so that

  • fv : every vertex in VP is mapped to a vertex in VT U {b,d};

  • fl : every path lP across vertices in fv-1(VT) is mapped to path lT

  • Minimize cost(f)=∑u in VP Δ(u, fv(u))+ λ∑l (|fl(l)|-1)

Efficient solution for optimal network alignment of multisource tree to arbitrary graph


Alignment operations and cost
Alignment operations and cost morphism

  • Matches of enzymes between pattern and text -

    Cost(match of u->fv(u))=0

  • Mismatches of enzymes - Cost(mismatch of u -> fv(u))=Δ(u, fv(u))

  • Insertions of text enzyme to pattern - Cost(insertion of v under fl)=λ

  • Deletions of pattern enzyme -Cost(deletion of u under f)= Δ(u, b / d)

  • 1) Bypass deletion 2) Strong deletion 3) Week deletion


Notation of computation model

ignoring direction morphism

Notation of Computation Model

  • A multi-source tree is a directed graph, whose underlying undirected graph is a tree.

  • Insertions of pattern vertex = deletion of text vertices between v and vj

    h(v, vj) = #(hops between v and vj)

  • Cost of deletion of text vertices between v and vj= λ X h(v, vj)

  • Assume that child’ contribution to their parent’s mapping are independent to another child’s contribution

    Assume that pattern root vertex can not be deleted


Computation model for multi source tree pattern

min(cost( morphismui, vj)+ λh(v,vj))

vj

B(ui, v)

strongD(ui)

min(weakD(u, ui, uik) + cost(uik, vj)+ λh(v,vj))

uik

A(u, v)

Computation model for multi-source tree pattern

u

v

Three possibilities of the contribution of the child

ui to the parent u’s mapping (u->v):

1. ui is mapping to vj (vj is a descendent of v)

2. ui is strong deleted: strongD(ui)

3. ui is bypass deleted: weakD(u, ui, uik)

ui

vj

uik

Text T

Pattern P

cost(u,v)=Δ(u,v)+∑ min

ui


Recurrence relation for the network alignment
Recurrence relation for the network alignment morphism

min(A(ui, vj)+ λh(v,vj))

vj

A(u,v)=Δ(u,v)+∑ B(ui, v)

ui

min(weakD(u, ui, uik) + λh(v,vj) + A(uik, vj))

uik

Base cases:

A(u,v)=min(Δ(u,v), strongD(u)) when vertex u is leaf

B(u,v)= ∞

Recurrence equation:

strongD(ui)

B(ui, v)=min


Solution
Solution morphism

  • Preprocessing:

  • Transitive closure of text T

  • Pattern graph ordering

  • Calculate the penalties of pattern vertex strong deletion

  • Calculate the penalties of pattern vertex weak deletion

  • Dynamic Programming + Adaption of Dijkstra

  • Runtime for DP solution with Fibonacci heaps:

  • O(|VP|(|ET| + |VT|log|VT|)).


Preprocessing of text graph

A morphism

B

D

C

E

F

D

B

1

Text T

1

1

2

3

A

2

C

3

2

Transitive closure

1

1

1

F

2

E

Transitive Closure of T : T*

Preprocessing of text graph

Transitive closure of T is graph T*=(V, E*), where E*={(i,j): there is i-j-path in T}


Pattern graph ordering

a morphism

b

c

d

Pattern P

Ordering

a

c

d

b

Pattern graph ordering

  • Construct ordered pattern P’

  • DFS traversal

  • Processing order in opposite way

Ordered pattern P’

  • Each edge ei in P’ is the unique edge connecting vi

  • with the previous vertices in the order


Penalties for pattern vertex deletions
Penalties for pattern vertex deletions morphism

strongD(u) = Δ(u,d) + ∑Δ(ui*,d)

Ui*

weakD(u, ui, v) =∑ (∑ strongD(ui,j)+Δ(ui,b))

Ui

Ui,j


Dynamic programming
Dynamic programming morphism

Create two dynamic table A and B

C

D

a

b

E

c

F

d

H

Pattern P

B’(v,f(u))=min(A(v,y)+ lh(f(u),y))

y={des(f(u)}

Text T

B

A

Arbitrary order

Arbitrary order

5

4

4

  • Fill A and B from bottom to up

5

13

3

3,E

3,E

  • Track back

2,H

1,H

3,H

10

10

10

1

A(u,f(u))=Δ(u,f(u))+ Schild v of uB’(v,f(u))


Adaption of dijkstra
Adaption of dijkstra morphism

a

b

C

D

c

E

F

A

C

D

E

F

H

B

C

D

E

F

H

d

d

d

10

10

1

10

Pattern P

Text T

H, 0

C, 9

F, 9

D, ∞

E, 9

C, 9

D, ∞

E, 2

C, 3

C, 9

F, 1

D, ∞

D, ∞

E, 9

H

for each x ЄVT

insert (x, A(v,x)- l ) into Q

B(v,x)  ∞

while Q is not empty

delete from Q item (y,k) with the minimum key k

for each (x,y) ЄET

if B(v,x) >k+ l B(v,x)k+ l

if key of x >k+ l

decrease key of x in Q with k+ l

1

3

2

1

2

3

  • Runtime for priority queue Q with Fibonacci heaps: O(|ET| + |VT|log|VT|).

  • Total Runtime : O(|VP|(|ET| + |VT|log|VT|))


Handling cycles
Handling cycles morphism

e

a

b

e

c

d

a

b

c

d

  • DP does not work when pattern has cycles

  • “Fix” images for some pattern vertices and

  • reduce to acyclic case

  • Find Minimum Feedback vertex set F(P):

    • VP-F(P) is acyclic

    • NP-complete but easy to be approximate

  • Runtime is increased by factor O(VT |F(P)|)

  • Total Runtime : O(|VT||F(P)||VP|(|ET| + |VT|log|VT|))


Statistical significance

a morphism

b

a

b

c

d

c

d

Statistical significance

  • Randomized P-Value computation

  • Random degree-conserved graph generation:

    • Reshuffle nodes

Reshuffle

edge

  • Reshuffle edges


Experiments applications
Experiments & applications morphism

  • All-against-all mappings among S. cerevisiae, B. subtilis, T. thermophilus, and E.coli

  • Identifying conserved pathways

    • 24 pathways that are conserved across all 4 species

    • 18 more pathways that are conserved across at least three of these species

  • Significant deletions

  • Resolving ambiguity

  • Discovering pathways holes


Significant deletion
Significant deletion morphism

Pattern: Aspartate superpathway in E. coli Text: Lysine biosynthesis in T. thermophilus

Mapping result: unmatched vertices are deleted.

We show the solid conserved subpath and dotted deleted subpath in pattern.

The dotted subpath produce biotin which is not required for text organism.


Resolving ambiguity
Resolving Ambiguity morphism

  • Mapping of glutamate degradation VII pathways from B. subtilis to T. thermophilus (p<0.01). The shaded node reflects enzyme homology.

  • Similar corresponding enzymes 1.2.4.2 and 2.3.1.61 with the similar function to 1.2.4.- and 2.3.1.- can be found in T. thermophilus

  • 1.2.4.2 and 2.3.1.61 has been reported in B. subtilis


Pathway holes find and fill
Pathway holes: find and fill morphism

  • Hole = missing enzyme in pathway description (in database)

  • Finding holes is a difficult task: comparison can help

Aligning of formaldehyde oxidation V pathway in B. subtilis to formy1THF biosynthesis pathway in E. coli

  • 3.5.1.10 is missing from B. subtilis but exists in B. clausii which is close to B. subtilis


Our software
Our software morphism

  • http://alla.cs.gsu.edu:8080/MinePW/pages/gmapping/GMMain.html


Reference
Reference morphism

Q. Cheng, A. Zelikovsky, Network Mapping of Metabolic Pathways, Analysis of Complex Networks: From Biology to Linguistics, Wiley-VCH 2009

Q. Cheng, P. Berman, R. Harrison and A. Zelikovsky, "Fast Alignments of Metabolic Networks ", Proc. of IEEE International conference on Bioinformatics and Biomedicine (BIBM 2008), pp 147-152  

Q. Cheng, D. Kaur, R. Harrison, and A. Zelikovsky, "Mapping and Filling Metabolic Pathways ", RECOMB Satellite Conference on Systems Biology 2007   

Q. Cheng, R. Harrison, and A. Zelikovsky, "Homomorphisms of Multisource Trees into Networks with Applications to Metabolic Pathways", Proc. of IEEE 7-th International Symposium on BioInformatics and BioEngineering (BIBE'07) 

Q. Cheng, R. Harrison, and A. Zelikovsky. "MetNetAligner: a web service tool for metabolic network alignments". Bioinformatics 2009 (To appear)


Future work
Future work morphism

Refine the method of filling pathway holes and improve the performance

Discover critical metabolic elements/modules/motifs

Describe evolution of metabolic pathways

Integrate with genome database


Acknowledgments
Acknowledgments morphism

GSU Molecular Basis of Disease (MBD) fellowship

Peter Karp

Oleg Rokhlenko

Florian Rasche

Amit Sabnis, Dipendra Kaur

Kelly Westbrooks, Irina Astrovskaya, Stefan Gremalschi, Jingwu He, Dumitru Brinza,Weidong Mao ,Nisar Hudewale



Bio map
Bio-Map morphism

GENOME

protein-gene interactions

PROTEOME

protein-protein interactions

METABOLISM

Bio-chemical reactions

Citrate Cycle


Comparison on different methods
Comparison on different methods morphism

Alignment of tree pathways from different species with optimal homomorphism (HM) and optimal network alignment (NA). Average number of mismatches and gaps are reported on common statistically significant matched pathways.


ad