Nemofinder dissecting genome wide protein protein intractions with meso scale network motifs
Download
1 / 41

NeMoFinder: Dissecting genome-wide protein-protein intractions with meso-scale network motifs - PowerPoint PPT Presentation


  • 242 Views
  • Uploaded on

NeMoFinder: Dissecting genome-wide protein-protein intractions with meso-scale network motifs. Mike Yuan. Outline of this presentation. Introduction to PPI Introduction to Graph Mining Related work Problem statement Details of the NeMoFinder algorithm Summary References .

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'NeMoFinder: Dissecting genome-wide protein-protein intractions with meso-scale network motifs' - ferdinand


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Nemofinder dissecting genome wide protein protein intractions with meso scale network motifs l.jpg

NeMoFinder: Dissecting genome-wide protein-protein intractions with meso-scale network motifs

Mike Yuan


Outline of this presentation l.jpg
Outline of this presentation intractions with meso-scale network motifs

  • Introduction to PPI

  • Introduction to Graph Mining

  • Related work

  • Problem statement

  • Details of the NeMoFinder algorithm

  • Summary

  • References


Protein interactions l.jpg
Protein Interactions intractions with meso-scale network motifs

A Protein may interact with:

  • Other proteins

  • Nucleic Acids

  • Small molecules


Finding protein partners l.jpg
Finding Protein Partners intractions with meso-scale network motifs


Motivation l.jpg
Motivation intractions with meso-scale network motifs

  • Important for biological functions

  • To understand the function of a protein, we need to find its interacting partners


Slide6 l.jpg

Graph Theory intractions with meso-scale network motifs

Vertex (node)

Cycle

Edge

-5

Directed Edge (Arc)

Weighted Edge

10

7

Molecular interaction networks are mapped as graphs


Slide7 l.jpg

The protein protein interaction network… intractions with meso-scale network motifs


Graph mining l.jpg
Graph mining intractions with meso-scale network motifs

  • Methods for Mining Frequent Subgraphs

  • Mining Variant and Constrained Substructure Patterns

  • Applications:

    • Graph Indexing

    • Similarity Search

    • Classification and Clustering


Why graph mining l.jpg
Why Graph Mining? intractions with meso-scale network motifs

  • Graphs are ubiquitous

    • Chemical compounds (Cheminformatics)

    • Protein structures, biological pathways/networks (Bioinformactics)

    • Program control flow, traffic flow, and workflow analysis

    • XML databases, Web, and social network analysis

  • Graph is a general model

    • Trees, lattices, sequences, and items are degenerated graphs

  • Complexity of algorithms: many problems are of high complexity


Graph graph everywhere l.jpg
Graph, Graph, Everywhere intractions with meso-scale network motifs

from H. Jeong et al Nature 411, 41 (2001)

Aspirin

Yeast protein interaction network

Co-author network

Internet


Graph pattern mining l.jpg
Graph Pattern Mining intractions with meso-scale network motifs

  • Frequent subgraphs

    • A (sub)graph is frequent if its support (occurrence frequency) in a given dataset is no less than a minimum support threshold

  • Applications of graph pattern mining

    • Mining biochemical structures

    • Program control flow analysis

    • Mining XML structures or Web communities

    • Building blocks for graph classification, clustering, compression, comparison, and correlation analysis


Example frequent subgraphs l.jpg
Example: Frequent Subgraphs intractions with meso-scale network motifs

GRAPH DATASET

(A)

(B)

(C)

FREQUENT PATTERNS

(MIN SUPPORT IS 2)

(1)

(2)


Frequent subgraph mining approaches l.jpg
Frequent Subgraph Mining Approaches intractions with meso-scale network motifs

  • Apriori-based approach: if a graph is frequent, all of its subgraphs are frequent ─ the Apriori property

    • AGM/AcGM: Inokuchi, et al. (PKDD’00)

    • FSG: Kuramochi and Karypis (ICDM’01)

    • PATH#: Vanetik and Gudes (ICDM’02, ICDM’04)

    • FFSM: Huan, et al. (ICDM’03)

  • Pattern growth approach

    • MoFa, Borgelt and Berthold (ICDM’02)

    • gSpan: Yan and Han (ICDM’02)

    • Gaston: Nijssen and Kok (KDD’04)


Problem statement l.jpg
Problem Statement intractions with meso-scale network motifs

  • PPI network G=(V,E)

    _ each vertex represents a unique protein

    _ each edge between vA and vB indicates there is an interaction between A and B

  • Network motif

    _frequently occurring subgraph pattern in a network

  • fg is the number of occurrences of a subgraph g, g is repeated if fg>F.

  • fg_randi is the frequency of g in a randomized network Grandi, for 1 ≤ i ≤ N, N is the number of the randomized networks. sg is the number of times fg≥fg_randi, g is unique if its sg >S.

  • Network motif discovery algorithm


Problem statement cont l.jpg
Problem Statement (cont) intractions with meso-scale network motifs

  • Motivation of NeMoFinder- existing research has following limitations:

    _Number of network motifs candidates increases exponentially

    _Interesting network motifs are repeated and unique and Apirori algorithms are not applicable

    _The graph isomorphism problem is an NP problem

  • NeMoFinder

    _ a network motif discovery algorithm to discover repeated and unique meso-scale network motifs in a large PPI network


Key procedures l.jpg
Key procedures intractions with meso-scale network motifs

  • Example graph G

  • Find repeated trees

  • Use repeated trees to partition a network into a set of graphs

  • Introduce graph cousins to facilitate the candidate generation and frequency counting processes.


Step1 discover repeated subgraphs l.jpg
Step1. Discover Repeated Subgraphs intractions with meso-scale network motifs

  • Step1.1 find repeated size-k trees

  • Eg. Size 2 to size 5 trees

    t2 t3 t4_1 t4_2

    t5_1 t5_2 t5_3


Step1 discover repeated subgraphs cont l.jpg
Step1. discover repeated subgraphs (cont) intractions with meso-scale network motifs

  • ft2 = 7, ft3 = 13, ft4_1 = 6, ft4_2 =17, ft5_1=1, ft5_2 = 5, ft5_3 = 7.

  • T2 = {t2}, T3 = {t3}, T4 ={t4_1, t4_2} and T5 = {t5_2, t5_3}.


Step 1 2 use repeated size k trees to partition graph l.jpg
Step 1.2 Use repeated size-k trees to partition graph intractions with meso-scale network motifs

  • Occurrences of t4_1 in G.


Step 1 2 use repeated size k trees to partition graph cont l.jpg
Step 1.2 Use repeated size-k trees to partition graph (cont) intractions with meso-scale network motifs

  • Occurrences of t4_2 in G.


Step1 2 use repeated size k trees to partition graph cont l.jpg
Step1.2 Use repeated size-k trees to partition graph (cont) intractions with meso-scale network motifs

  • Set of graphs GD4

    G4_1 G4_2 G4_3

    G4_4 G4_5


Step 1 3 perform graph join operation to find repeated size k graphs l.jpg
Step 1.3: perform graph join operation to find repeated size-k graphs

  • Generate 3-edge subgraphs from size-4 trees

    t4_1 h1 h2

    t4_2 h3 h4 h5


Step 1 3 perform graph join operation to find repeated size k graphs cont l.jpg
Step 1.3: perform graph join operation to find repeated size-k graphs (cont)

  • Examples for graph join operations for subgraphs

    t4_1 h2 g1_2

    t4_2 h3 g1_1

  • fg1_1 = 2 and fg1_2 = 5


Step 1 3 perform graph join operation to find repeated size k graphs cont24 l.jpg
Step 1.3: perform graph join operation to find repeated size-k graphs (cont)

  • Use subgraphs obtained to generate subgraphs

    g1_2 h6 h7

  • Graph join operations for subgraphs

    g1_2 h6 g2

  • f(g2)<2, algorithm stops


Algorithm1 nemofinder l.jpg
Algorithm1 NeMoFinder size-k graphs (cont)

1: Input: G - PPI network;N - Number of randomized networks;K - Maximal network motif size;F - Frequency threshold;S - Uniqueness threshold;

2: Output: U - Repeated and unique network motif set;

3: D ← ∅;

4: for motif-size k from 3 to K do

5: T ← FindRepeatedTrees(k);

6: GDk ← GraphPartition(G, T)

7: D ← D  T;

8: D’ ← T;

9: i ← k;

10: while D’≠∅ and i ≤ k × (k − 1)/2 do

11: D’ ← FindRepeatedGraphs(k,i,D’);

12: D ← D D’;

13: i ← i + 1;

14: end while

15: end for

Step1: Discover repeated subgraphs

Step 1.1: Find repeated size-k trees

Step 1.2: use repeated size-k trees to partition graph

Step 1.3: perform graph join operation to find repeated size-k graphs


Algorithm1 nemofinder cont l.jpg
Algorithm1 NeMoFinder (cont) size-k graphs (cont)

16: for counter i from 1 to N do

17: Grand ← RandomizedNetworkGeneration();

18: for each g  D do

19: GetRandFrequency(g,Grand);

20: end for

21: end for

22: U ← ∅;

23: for each g D do

24: s ← GetUniqunessValue(g);

25: if s ≥ S then

26: U ← U  {g};

27: end if

28: end for

29: return U;

Step 2: Determine subgraph frequency in randomized networks

Step 3: Compute uniqueness of subgraphs


Algorithm steps cont l.jpg
Algorithm Steps (cont) size-k graphs (cont)

  • Step 2: Determine subgraph frequency in randomized networks

    _Generate randomized networks Grandi(1≤i≤N)

    _check the frequency of the subgraphs in each of the randomized networks Grandi

  • Step 3: Compute uniqueness of subgraphs

    _ Based on frequencies in the input PPI network and the randomized networks

    _fg_randiis the frequency of g in a randomized network Grandi, for 1 ≤ i ≤ N, N is the number of the randomized networks. sg is the number of times fg≥fg_randi, g is unique if its sg >S.


Find repeated subgraphs l.jpg
Find repeated subgraphs size-k graphs (cont)

Algorithm 2 FindRepeatedGraphs(k, i,D’)

1: Input: D’ - Set of repeated subgraphs with k vertices and

i − 1 edges;

2: Output: D’’ - Set of repeated subgraphs with k vertices and

i edges;

3: C ← CandidateGeneration(k, i, D’);

4: D’’ ← FrequencyCounting(k, i, C);

5: return D’’;


Candidate generation using graph cousins l.jpg
Candidate generation using graph cousins size-k graphs (cont)

  • Represent subgraphs by adjacency matrices

  • Code(M): a sequence formed by linking the lower triangular entries of M in the following order: m1,1m2,1m2,2…mn,1mn,2…mn,n

  • Transform adjancy matrix into canonical adjacency matrix (CAM) which has the maximal code

  • Definition of subCAM of a graph

    _ A matrix obtained by setting the last edge entry in CAM(g) to 0.


Candidate generation using graph cousins cont l.jpg
Candidate generation using graph cousins (cont) size-k graphs (cont)

  • Definition of cousin

    _ Given two subgraphs g and h, if subCAM(g) = subCAM(h), then h is a cousin of g.

  • Three types of cousin relationship between g and h:

    _ Type I: Direct Cousin h is isomorphic to a subgraph g’ which has the same number of vertices and edges as g, and g’ ≠g;

    _ Type II: Twin Cousin h is isomorphic to subgraph g;

    _Type III: Distant Cousin h is a disconnected subgraph.


Candidate generation using graph cousins cont31 l.jpg

0 size-k graphs (cont)

0

1

1

0

0

1

1

0

0

0

0

1

1

0

0

0

0

0

0

Candidate generation using graph cousins (cont)

  • Adjacency matrices for the graphs in figure 6

    t4_1 h1

    h2


Candidate generation using graph cousins cont32 l.jpg
Candidate generation using graph cousins (cont) size-k graphs (cont)

  • Adjacency matrices for the graphs in figure 6

    t4_2 h3

    h4h5


Candidate generation using graph cousins cont33 l.jpg
Candidate generation using graph cousins (cont) size-k graphs (cont)

  • Observations of above example

    _h1 is a type 1 direct cousin of t4_1

    _h2 is a type 3 distant cousin of t4_1

    _h3 is a type 2 twin cousin of t4_2

    _h4 is a type 1 direct cousin of t4_2

    _h5 is a type 3 distant cousin of t4_2


Candidate generation using graph cousins cont34 l.jpg
Candidate generation using graph cousins (cont) size-k graphs (cont)

Algorithm 3 CandidateGeneration(k, i,D’)

1: Input: D’ - Set of repeated subgraphs with k vertices and i − 1 edges;

2: Output: C - Set of candidates with k vertices and i edges;

3: C ← ∅;

4: for each g  D do

5: H ← GetCousin(g);

6: for each h  H do

7: g’ ← join(g, h);

8: C ← C  {g};

9: end for

10: end for

11: return C;

Step 1: Find set of cousins

Step2: join g with cousins to form new subgraph


Frequency counting l.jpg
Frequency counting size-k graphs (cont)

  • Leveraging properties of the different types of cousins

    _Lx: set of graphs in GDk embedding x

    _If type of h=type I direct cousin of g, g’ is subgraph obtained by g and h, then Lg’= Lg ∩ Lh, fg’= |Lg ∩ Lh|

    _if type of h = Type III distant cousin,then fg’= |Lg ∩ Lh|

    _if type of h = Type II twin cousin

    then fg’ =CheckAllOccurances(g)

    _Lt4_1 ={G4_1,G4_2,G4_3,G4_5},

    Lh2 = {G4_1,G4_2,G4_3,G4_4,G4_5}

    Lg1_2= Lt4_1∩ Lh2 ={G4_1,G4_2,G4_3,G4_5}, fg1_2=4>2


Frequency counting36 l.jpg
Frequency counting size-k graphs (cont)

Algorithm 4 FrequencyCounting(k, i,C)

1: Input: GDk - Set of graphs generated by partitioning G with size-k repeated trees;

C - Set of subgraph candidates with k vertices and i edges;

F - Frequency threshold;

2: Output: D’’ - Set of repeated subgraphs with k vertices and i edges;

3: D’’ ← ∅;

4: for each g’  C do

5: Get the join parameter of g’: g and h;

6: Lg ← set of graphs in GDk embedding g;

7: Lh ← set of graphs in GDk embedding h;

8: if fg < F or fh < F then

9: fg’ ← 0;

10: else if type of h = Type I direct cousin then

11: fg’ ← |Lg ∩ Lh|

12: else if type of h = Type III distant cousin then

13: fg’ ← |Lg ∩ Lh|

14: else if type of h = Type II twin cousin then

15: fg’ ← CheckAllOccurances(g);

16: end if

17: if fg’ > F then

18: D’’ ← D’’  {g’};

19: end if

20: end for

21: return D’’;

Case h is direct cousin

Case h is distant cousin

Case h is twin cousin


Summary l.jpg
Summary size-k graphs (cont)

  • NemoFinder-an efficient network motif discovery algorithm to discover larger-sized repeated and unique network motifs in PPI networks.

  • Use repeated trees to partition network into graphs

  • Graph cousins for candidate generation and frequency counting


References 1 l.jpg
References (1) size-k graphs (cont)

  • T. Asai, et al. “Efficient substructure discovery from large semi-structured data”, SDM'02

  • C. Borgelt and M. R. Berthold, “Mining molecular fragments: Finding relevant substructures of molecules”, ICDM'02

  • D. Cai, Z. Shao, X. He, X. Yan, and J. Han, “Community Mining from Multi-Relational Networks”, PKDD'05.

  • J.Chen, W.Hsu, M.Lee,NeMoFinder: Dissecting genome wide protein-protein interactions with repeated and unique network motifs, Seekiong Ng, SIGKDD 2006

  • M. Deshpande, M. Kuramochi, and G. Karypis, “Frequent Sub-structure Based Approaches for Classifying Chemical Compounds”, ICDM 2003

  • M. Deshpande, M. Kuramochi, and G. Karypis. “Automated approaches for classifying structures”, BIOKDD'02

  • C. Faloutsos, K. McCurley, and A. Tomkins, “Fast Discovery of 'Connection Subgraphs”, KDD'04

  • H. Fröhlich, J. Wegner, F. Sieker, and A. Zell, “Optimal Assignment Kernels For Attributed Molecular Graphs”, ICML’05


References 2 l.jpg
References (2) size-k graphs (cont)

  • L. Holder, D. Cook, and S. Djoko. “Substructure discovery in the subdue system”, KDD'94

  • J. Huan, W. Wang, D. Bandyopadhyay, J. Snoeyink, J. Prins, and A. Tropsha. “Mining spatial motifs from protein structure graphs”, RECOMB’04

  • J. Huan, W. Wang, and J. Prins. “Efficient mining of frequent subgraph in the presence of isomorphism”, ICDM'03

  • H. Hu, X. Yan, Yu, J. Han and X. J. Zhou, “Mining Coherent Dense Subgraphs across Massive Biological Networks for Functional Discovery”, ISMB'05

  • A. Inokuchi, T. Washio, and H. Motoda. “An apriori-based algorithm for mining frequent substructures from graph data”, PKDD'00

  • C. James, D. Weininger, and J. Delany. “Daylight Theory Manual Daylight Version 4.82”. Daylight Chemical Information Systems, Inc., 2003.

  • G. Jeh, and J. Widom, “Mining the Space of Graph Properties”, KDD'04

  • H. Kashima, K. Tsuda, and A. Inokuchi, “Marginalized Kernels Between Labeled Graphs”, ICML’03


References 3 l.jpg
References (3) size-k graphs (cont)

  • M. Koyuturk, A. Grama, and W. Szpankowski. “An efficient algorithm for detecting frequent subgraphs in biological networks”, Bioinformatics, 20:I200--I207, 2004.

  • T. Kudo, E. Maeda, and Y. Matsumoto, “An Application of Boosting to Graph Classification”, NIPS’04

  • M. Kuramochi and G. Karypis. “Frequent subgraph discovery”, ICDM'01

  • M. Kuramochi and G. Karypis, “GREW: A Scalable Frequent Subgraph Discovery Algorithm”, ICDM’04

  • C. Liu, X. Yan, H. Yu, J. Han, and P. S. Yu, “Mining Behavior Graphs for ‘Backtrace'' of Noncrashing Bugs’'', SDM'05

  • P. Mahé, N. Ueda, T. Akutsu, J. Perret, and J. Vert, “Extensions of Marginalized Graph Kernels”, ICML’04

  • S. Nijssen and J. Kok. A quickstart in frequent structure mining can make a difference. KDD'04

  • J. Prins, J. Yang, J. Huan, and W. Wang. “Spin: Mining maximal frequent subgraphs from graph databases”. KDD'04


References 4 l.jpg
References (4) size-k graphs (cont)

  • D. Shasha, J. T.-L. Wang, and R. Giugno. “Algorithmics and applications of tree and graph searching”, PODS'02

  • J. R. Ullmann. “An algorithm for subgraph isomorphism”, J. ACM, 23:31--42, 1976.

  • N. Vanetik, E. Gudes, and S. E. Shimony. “Computing frequent graph patterns from semistructured data”, ICDM'02

  • C. Wang, W. Wang, J. Pei, Y. Zhu, and B. Shi. “Scalable mining of large disk-base graph databases”, KDD'04

  • T. Washio and H. Motoda, “State of the art of graph-based data mining”, SIGKDD Explorations, 5:59-68, 2003

  • X. Yan and J. Han, “gSpan: Graph-Based Substructure Pattern Mining”, ICDM'02

  • X. Yan and J. Han, “CloseGraph: Mining Closed Frequent Graph Patterns”, KDD'03

  • X. Yan, P. S. Yu, and J. Han, “Graph Indexing: A Frequent Structure-based Approach”, SIGMOD'04

  • X. Yan, X. J. Zhou, and J. Han, “Mining Closed Relational Graphs with Connectivity Constraints”, KDD'05

  • X. Yan, P. S. Yu, and J. Han, “Substructure Similarity Search in Graph Databases”, SIGMOD'05

  • X. Yan, F. Zhu, J. Han, and P. S. Yu, “Searching Substructures with Superimposed Distance”, ICDE'06

  • M. J. Zaki. “Efficiently mining frequent trees in a forest”, KDD'02


ad