1 / 32

Kavosh : a new algorithm for finding network motifs

Kavosh : a new algorithm for finding network motifs. Jin Chen 2012 Fall Michigan State University. Motivation of this paper. It presents a new algorithm for finding size- k network motifs from a directed network with less memory and CPU time in comparison to other algorithms

tate
Download Presentation

Kavosh : a new algorithm for finding network motifs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kavosh: a new algorithm for finding network motifs Jin Chen 2012 Fall Michigan State University

  2. Motivation of this paper • It presents a new algorithm for finding size-knetwork motifs from a directed network with less memory and CPU time in comparison to other algorithms • Input : A large directed or undirected network • Output: Network motifs which occur in the input network.

  3. Basic Terminologies slides adapted from ShalevItzkovitz’s talk given at IPAM UCLA on July 2005

  4. Basic Terminologies Transcription regulation of gene by Protein slides adapted from ShalevItzkovitz’s talk given at IPAM UCLA on July 2005

  5. Basic Terminologies slides adapted from ShalevItzkovitz’s talk given at IPAM UCLA on July 2005

  6. Basic Terminologies slides adapted from ShalevItzkovitz’s talk given at IPAM UCLA on July 2005

  7. Basic Terminologies Or Motifs slides adapted from ShalevItzkovitz’s talk given at IPAM UCLA on July 2005

  8. Basic Terminologies slides adapted from ShalevItzkovitz’s talk given at IPAM UCLA on July 2005

  9. Definition of Network Motifs • Patterns that occur in a realnetwork significantly more than in randomized networks are called NETWORK MOTIFS. • Randomized Networks: Networks with same characteristics as the real network, but where the connections between nodes and edges are made at random.

  10. Definition of Network Motifs R. Milo et al. Science 2002; vol 298:824-827

  11. Exist algorithms • mFinder: size 3-4, directed and undirected • PAJEK: size 3-4, directed and undirected, visible • FANMOD: size 8, directed and undirected, sampling, visible • NeMoFinder: size 13, undirected

  12. Kavosh consists of 4 steps • Enumeration: finding all subgraphs of a given size that occur in the input graph • Classification: classifying each found subgraphinto isomorphic groups • Random graph generation: generating random networks with respect to the input network (enumeration and classification are also performed on random networks) • Motif identification: distinguishing motifs among all found subgraphson basis of statistical parameters

  13. Enumeration • All subgraphsthat include a particular vertex are discovered • Subsequently, this vertex is removed from the network, and the process is repeated

  14. Enumeration (1,1) (2) G Example of enumeration: to find all size-3 induced subgraphs in G, the composition is (1,1),(2) To find all size-4 induced subgraphs in G, the composition is (1,1,1),(1,2),(2,1),(3)

  15. Enumeration (1,1) After removing node 1 (2) After removing node 1 and 2

  16. Time Complexity of Enumeration Typically, graph partition problems fall under the category of NP-hard problems. Solutions to these problems are generally derived using heuristics and approximation algorithms. However, uniform graph partitioning or a balanced graph partition problem can be shown to be NP-complete to approximate within any finite factor. Even for special graph classes such as trees and grids, no reasonable approximation algorithms exist, unless P=NP. … When not only the number of edges between the components is approximated, but also the sizes of the components, it can be shown that no reasonable fully polynomial algorithms exist for these graphs. from Wikipedia

  17. Classification • NAUTY - algorithm for finding isomorphism subgraphs • NAUTY uses canonical matrix as the unique identifier of a subgraph • Two subgraphsare isomorphic if and only if their canonical matrices are same

  18. Canonical Matrix and Labeling 0011100100010000 subgraph Adjacent-matrix String Switch the node labels for obtaining new adjacent matrix. Turn matrix to string, representing a graph. Canonical Labeling: maximal or minimum string Canonical Labeling 0101001100010000 Node order (2,1,3,4) 

  19. NAUTY • The world's fastest isomorphism testing program is Nauty, by Brendan D. McKay, Professor in the Research School of Computer Science, Australian National University. • Nauty(No AUTomorphisms, Yes?) is a set of efficient C language procedures to produce a canonically-labeled isomorph of the graph, for isomorphism testing. • It can test most graphs of less than 100 vertices in well under a second. • Nautyhas been successfully ported to a variety of operating systems and C compilers. http://www.cs.sunysb.edu/~algorith/implement/nauty/implement.shtml McKay, B.D. Practical Graph Isomorphism, CongressusNumerantium, 30 (1981) 45-87

  20. Random graph generation • Switching operations are applied on the edges of the input network repeatedly, until the network is well randomized. This progress does not change the vertex degrees.

  21. Motif determination • Two statistical measures • Z-score where Npis the number which motif Gpoccurred in the input network, is the mean which Gp occurred in random networks and σ is the standard deviation. The larger the Z-score, the more significant is the network motif • P-value • It indicates the number of random networks in which a motif GPoccurred more often than in a biological network, divided by the total number of random networks. P-value ranges from 0 to 1. The smaller the P-value, the more significant is the network motif.

  22. Parameters in Kavosh The following parameters are used to describe a network motif in Kovash paper • The frequency(in real graph) is larger than 4 • By using 1000 randomized network, p-value < 0.01 • By using 1000 randomized network, Z-score > 1.0

  23. Performance

  24. Performance E. coligene regulatory network Node number: 672 Edge number: 1276

  25. Performance

  26. Performance

  27. Contribution • Designed a new algorithm to find network motif for both directed and undirected network. size: > 8 • A new method to enumerate all the subgraphs

  28. Discussions • In terms of the algorithm for isomorphism testing, any better ones? (LEDA Technical report)

  29. Discussions • What is the bottleneck of this algorithm? Enumeration of subgraphs: Computing combination is exponential Calculation of Canonical Labeling for all the subgraphs

  30. Discussions • Is it unbelievable?

More Related