1 / 234

V9 - visualize cellular interaction data

Bioinformatics III “Systems biology”,“Integrative cell biology” Zusammenfassung Teil 2: Vorlesungen 9-16. V9 - visualize cellular interaction data. e.g. protein interaction data (undirected): nodes – proteins edges – interactions metabolic pathways (directed) nodes – substances

Download Presentation

V9 - visualize cellular interaction data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bioinformatics III “Systems biology”,“Integrative cell biology”Zusammenfassung Teil 2: Vorlesungen 9-16 Bioinformatics III

  2. V9 - visualize cellular interaction data e.g. protein interaction data (undirected): nodes – proteins edges – interactions metabolic pathways (directed) nodes – substances edges – reactions regulatory networks (directed): nodes – transcription factors + regulated proteins edges – regulatory interaction co-localization (undirected) : nodes – proteins edges – co-localization information homology (undirected/directed) nodes – proteins edges – sequence similarity (BLAST score) Bioinformatics III

  3. Force-directed algorithm for graph layout Various graph layout algorithms have been developed to solve this visualisation task. 20 years ago, Peter Eades proposed a graph layout heuristic [A heuristic for graph drawing. Congressus Numerantium, 42:149-160, 1984] which is called the ``Spring Embedder'' algorithm. Edges are replaced by springs and vertexes are replaced by rings that connect the springs. A layout can be found by simulating the dynamics of such a physical system. This method and other methods, which involve similar simulations to compute the layout, are called ``Force Directed'' algorithms. http://www.hpc.unm.edu/~sunls/research/treelayout/node1.html Bioinformatics III

  4. Force-directed algorithm The edges can be modeled as gravitational (or electrostatic) attraction and all nodes have an electrical repulsion between them. It is also possible for the system to simulate unnatural forces acting on the bodies, which have no direct physical analogy, for example the use of a logarithmic distance measure rather than Euclidean. http://www.it.usyd.edu.au/~aquigley/3dfade/ Bioinformatics III

  5. Force-directed algorithm Because of the underlying analogy to a physical system, the force directed graph layout methods tend to meet various aesthetic standards, such as - efficient space filling, - uniform edge length (when equal weights and repulsions are used) - symmetry and the - capability of rendering the layout process with smooth animation (visual continuity). Having these nice features, the force directed graph layout has become the ``work horse'' of layout algorithms. It has been successfully adapted to many domains with variations of implementation. http://www.hpc.unm.edu/~sunls/research/treelayout/node1.html Bioinformatics III

  6. Scaling Force directed layout methods commonly have computational scaling problems. When there are more than a few thousand vertexes in the graph, the running time of the layout computation can become unacceptable. This is caused by the fact that in each step of the simulation, the repulsive force between each pair of unconnected vertexes needs to be computed, costing a running time of O(0.5  V2 – E). Here V is the number of vertexes and E is the number of edges in the graph. This complexity is hard to escape for general graphs without hierarchical structure. http://www.hpc.unm.edu/~sunls/research/treelayout/node1.html Bioinformatics III

  7. Protein interaction graphs Most protein interaction data have the following characteristics: (1) When visualized as a graph, the data yields a disconnected graph with many connected components (2) The data yields a nonplanar graph with a large number of edge crossings that cannot be removed in a 2D drawing (3) #interactions varies widely within the same set of data – p(k) (4) data often contains protein interactions corresponding to self loops  demands robust algorithm. Ju et al. Bioinformatics 19, 317 (2003) Bioinformatics III

  8. InterViewer: Example of force-directed layout algorithm InterViewer does not place initial nodes randomly, but on the surface of a sphere. Fixed # of iterations. The original algorithm has complexity O(N2) per timestep with N # of nodes. When using multipole-methods, this can be reduced to O(N logN) Time may also be saved by introducing a cut-off, e.g. only computing interactions with the next neighbor cells. Update neighbor list infrequently. Ju et al. Bioinformatics 19, 317 (2003) Bioinformatics III

  9. Aim: analyze and visualize homologies between the protein universe :-) 50 genomes  145579 proteins  21  109 BLASTP pairwise sequence comparisons. Expect that fusion proteins („Rosetta Stone proteins“) will link proteins of related function. Need to visualize extremely large network! Develop stepwise scheme. Bioinformatics III

  10. LGL (1) separate original network into connected sets (2) generate coordinates for each node in each connected set (using force-directed layout algorithm and a recipe for the sequential lay out of nodes guided by a minimum spanning tree of the network). (3) integrate connected sets into one coordinate system via a funnel process: the connected sets are sorted in descending size by the number of vertices. The first connected set is placed at the bottom of a potential funnel and other sets are placed one at a time on the rim of the potential funnel and allowed to fall towards the bottom where they are frozen in space upon collision with the previous sets. We concentrate on step (2) in the following Adai et al. J. Mol. Biol. 340, 179 (2004) Bioinformatics III

  11. Minimum Spanning Tree Given: undirected graph G = (V,E) wherefor each edge (u,v) E exists a weight w(u,v) specifying the cost to connect u and v. Find an acyclic graph T  E that connects all of the nodes and whose total weight is minimized. Popular algorithms by Kruskal and Prim. Both are greedy algorithms making the best choice at the moment.  no guarantee to find the best global solution [Cormen] Bioinformatics III

  12. Kruskal’s algorithm Consider edges in sorted order by weight. The arrow points to the edge under consideration at each step. [Cormen] Bioinformatics III

  13. Kruskal’s algorithm (II) Running time  O(E log V) [Cormen] Bioinformatics III

  14. Intuitive description of LGL Successive iterations of the layout. The MST determines the oder of placement of the nodes. The root node could be chosen randomly or based on its centrality in the network (e.g. minimizing the sum of distances to all other nodes). All other nodes are assigned a level according to their edge-based distance in the MST from the root node. Level one vertices (red circles) are placed randomly on a sphere around the root node (black circle). The system is allowed to iterate through time satisfying attractive and repulsive forces until at rest. Level two nodes (blue circles) are placed randomly on spheres directed away from the current layout. Again, the system is allowed to evolve through time till at rest. This process is iterated for the entire graph. Adai et al. J. Mol. Biol. 340, 179 (2004) Bioinformatics III

  15. What is the role of fusion proteins? A protein homology map summarizes the results of billions of sequence comparisons by modeling the proteins as vertices in a network, and the statistically significant sequence similarities as edges connecting the relevant proteins. In this manner, proteins within a sequence family (such as A, A′, A″, and AB; or B, B′ and AB) are all or mostly connected to each other, forming a cluster in the map. Fusion proteins (such as AB) serve to connect their component proteins' families. The structure of the resulting map reflects historic genetic events, such as gene fusions, fissions, and duplications, which are responsible for producing the modern-day genes. The map simultaneously represents homology relationships (edges), remote homologies (proteins not directly connected but in the same cluster), and non-homologous functional relationships (adjacent clusters and clusters linked by fusion proteins). Adai et al. J. Mol. Biol. 340, 179 (2004) Bioinformatics III

  16. LGL Algorithm for very large biological networks The complete protein homology map. A layout of the entire protein homology map; a total of 11,516 connected sets containing 111,604 proteins (vertices) with 1,912,684 edges. The largest connected set is shown more clearly in the inset and is enlarged further on the right side. Adai et al. J. Mol. Biol. 340, 179 (2004) Bioinformatics III

  17. Functionally related gene families form adjacent clusters Three examples illustrate spatial localization of protein function in the map, specifically A, the linkage of the tryptophan synthase  family to the functionally coupled but non-homologous  family by the yeast tryptophan synthase  fusion protein, B, protein subunits of the pyruvate synthase and alpha-ketoglutarate ferredexin oxidoreductase complexes C, metabolic enzymes, particularly those of acetyl CoA and amino acid metabolism. Adai et al. J. Mol. Biol. 340, 179 (2004) Bioinformatics III

  18. Colocalization Neighboring proteins tend to be in the same cellular system. The tendency for proteins to operate in the same cellular system, as defined by the percentage of matching assignments into the 18 COG database pathways, is plotted against the spatial separation in multiples of a typical cluster size. The functional similarity decays exponentially with distance proportional to the function e−0.26d where d is a typical cluster diameter. Adai et al. J. Mol. Biol. 340, 179 (2004) Bioinformatics III

  19. Modularity in molecular networks? A functional module is, by definition, a discrete entity whose function is separable from those of other modules. This separation depends on chemical isolation, which can originate from spatial localization or from chemical specificity. E.g. a ribosome concentrates the reactions involved in making a polypeptide into a single particle, thus spatially isolating its function. A signal transduction system is an extended module that achieves its isolation through the specificity of the initial binding of the chemical signal to receptor proteins, and of the interactions between signalling proteins within the cell. Hartwell et al. Nature 402, C47 (1999) Bioinformatics III

  20. Modularity in molecular networks Modules can be insulated from or connected to each other. Insulation allows the cell to carry out many diverse reactions without cross-talk that would harm the cell. Connectivity allows one function to influence another. The higher-level properties of cells, such as their ability to integrate information from multiple sources, will be described by the pattern of connections among their functional modules. Hartwell et al. Nature 402, C47 (1999) Bioinformatics III

  21. Organization of large-scale molecular networks • Organization of molecular networks revealed by large-scale experiments: • power-law distribution ; P(k)  exp- • similar distribution of the node degree k (i.e. the number of edges of a node) • small-world property (i.e. a high clustering coefficient and a small shortest path between every pair of nodes) • anticorrelation in the node degree of connected nodes (i.e. highly interacting nodes tend to be connected to low-interacting ones) • These properties become evident when hundreds or thousands of molecules and their interactions are studied together. • On the other end of the spectrum: recently discovered motifs that consist of 3-4 nodes. Bioinformatics III

  22. Mesoscale properties of networks Most relevant processes in biological networks correspond to the mesoscale (5-25 genes or proteins) not to the entire network. However, it is computationally enormously expensive to study mesoscale properties of biological networks. e.g. a network of 1000 nodes contains 1  1023 possible 10-node sets. Spirin & Mirny analyzed combined network of protein interactions with data from CELLZOME, MIPS, BIND: 6500 interactions. Bioinformatics III

  23. Identify connected subgraphs The network of protein interactions is typically presented as an undirected graph with proteins as nodes and protein interactions as undirected edges. Aim: identify highly connected subgraphs (clusters) that have more interactions within themselves and fewer with the rest of the graph. A fully connected subgraph, or clique, that is not a part of any other clique is an example of such a cluster. In general, clusters need not to be fully connected. Measure density of connections by where n is the number of proteins in the cluster and m is the number of interactions between them. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  24. (method I) Identify all fully connected subgraphs (cliques) Generally, finding all cliques of a graph is an NP-hard problem. Because the protein interaction graph is sofar very sparse (the number of interactions (edges) is similar to the number of proteins (nodes), this can be done quickly. To find cliques of size n one needs to enumerate only the cliques of size n-1. The search for cliques starts with n = 4, pick all (known) pairs of edges (6500  6500 protein interactions) successively. For every pair A-B and C-D check whether there are edges between A and C, A and D, B and C, and B and D. If these edges are present, ABCD is a clique. For every clique identified, ABCD, pick all known proteins successively. For every picked protein E, if all of the interactions E-A, E-B, E-C, and E-D are known, then ABCDE is a clique with size 5. Continue for n = 6, 7, ... The largest clique found in the protein-interaction network has size 14. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  25. (I) Identify all fully connected subgraphs (cliques) These results include, however, many redundant cliques. For example, the clique with size 14 contains 14 cliques with size 13. To find all nonredundant subgraphs, mark all proteins comprising the clique of size 14, and out of all subgraphs of size 13 pick those that have at least one protein other than marked. After all redundant cliques of size 13 are removed, proceed to remove redundant twelves etc. In total, only 41 nonredundant cliques with sizes 4 - 14 were found. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  26. (method II) Superparamagnetic Clustering (SPC) SPC uses an analogy to the physical properties of an inhomogenous ferromagnetic model to find tightly connected clusters on a large graph. Every node on the graph is assigned a Potts spin variable Si = 1, 2, ..., q. The value of this spin variable Siperforms thermal fluctuations, which are determined by the temperature T and the spin values on the neighboring nodes. Energetically, 2 nodes connected by an edge are favored to have the same spin value. Therefore, the spin at each node tends to align itself with the majority of its neighbors. When such a Potts spin system reaches equilibrium for a given temperature T, high correlation between fluctuating Siand Sjat nodes i and j would indicate that nodes i and j belong to the same cluster. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  27. (II) Superparamagnetic Clustering (SPC) The protein-interaction network is represented by a graph where every pair of interacting proteins is an edge of length 1. The simulations are run for temperatures ranging from 0 to 1 in units of the coupling strength. The network splits two monomers at temperatures between 0.7 and 0.8, whereas larger clusters only exist for temperatures between 0.1 and 0.7. Clusters are recorded at all values temperature. The overlapping clusters are then merged and redundant ones are removed. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  28. (method III) Monte Carlo Simulation Use MC to find a tight subgraph of a predetermined number of nodes M. At time t = 0, a random set of M nodes is selected. For each pair of nodes i,j from this set, the shortest path Lijbetween i and j on the graph is calculated. Denote the sum of all shortest paths Lijfrom this set as L0. At every time step one of M nodes is picked at random, and one node is picked at random out of all its neighbors. The new sum of all shortest paths, L1, is calculated if the original node were to be replaced by this neighbor. If L1 < L0, accept replacement with probability 1. If L1 > L0, accept replacement with probability where T is the effective temperature. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  29. (III) Monte Carlo Simulation Every tenth time step an attempt is made to replace one of the nodes from the current set with a node that has no edges to the current set to avoid getting caught in an isolated disconnected subgraph. This process is repeated (i) until the original set converges to a complete subgraph, or (ii) for a predetermined number of steps, after which the tightest subgraph (the subgraph corresponding to the smallest L0) is recorded. The recorded clusters are merged and redundant clusters are removed. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  30. Optimal temperature in MC simulation For every cluster size there is an optimal temperature that gives the fastest convergence to the tightest subgraph. Time to find a clique with size 7 in MC steps per site as a function of temperature T. The region with optimal temperature is shown in Inset. The required time increases sharply as the temperature goes to 0, but has a relatively wide plateau in the region 3 < T < 7. Simulations suggest that the choice of temperature T  M would be safe for any cluster size M. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  31. Comparison of SPC and Monte Carlo methods Comparison of clusters found with SPC (blue) and MC simulation (red). Reasonable overlap (ca. one third of all clusters are found by both methods) – but both methods seem complementary. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  32. Comparison of SPC and Monte Carlo methods The SPC method is best at detecting high-Q value clusters with relatively few links with the outside world. An example is the TRAPP complex, a fully connected clique of size 10 with just 7 links with outside proteins. This cluster was perfectly detected by SPC, whereas the MC simulation was able to find smaller pieces of this cluster separately rather than the whole cluster. By contrast, MC simulations are better suited for finding very „outgoing“ cliques. The Lsm complex, a clique of size 11, includes 3 proteins with more interactions outside the complex than inside. This complex was easily found by MC, but was not detected as a stand-alone cluster by SPC. Q: warum funktioniert die SPC-Methode besonders gut um Cluster mit hohen Q-Werten und wenigen Verknüpfungen zu finden, wogegen die Monte-Carlo-Methode vor allem „outgoing“ Cliquen findet? Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  33. Merging Overlapping Clusters A simple statistical test shows that nodes which have only one link to a cluster are statistically insignificant. Clean such statistically insignificant members first. Then merge overlapping clusters: For every cluster Aifind all clusters Akthat overlap with this cluster by at least one protein. For every such found cluster calculate Q value of a possible merged cluster AiU Ak . Record cluster Abest(i) which gives the highest Q value if merged with Ai. After the best match is found for every cluster, every cluster Ai is replaced by a merged cluster AiU Abest(i) unless AiU Abest(i) is below a certain threshold value for QC. This process continues until there are no more overlapping clusters or until merging any of the remaining clusters witll make a cluster with Q value lower than QC. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  34. Statistical significance of complexes and modules Number of complete cliques (Q = 1) as a function of clique size enumerated in the network of protein interactions (red) and in randomly rewired graphs (blue, averaged >1,000 graphs where number of interactions for each protein is preserved). Inset shows the same plot in log-normal scale. Note the dramatic enrichment in the number of cliques in the protein-interaction graph compared with the random graphs. Most of these cliques are parts of bigger complexes and modules. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  35. Statistical significance of complexes and modules Distribution of Q of clusters found by the MC search method. Red bars: original network of protein interactions. Blue cuves: randomly rewired graphs. Clusters in the protein network have many more interactions than their counterparts in the random graphs. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  36. Discovered functional modules Examples of discovered functional modules. (A) A module involved in cell-cycle regulation. This module consists of cyclins (CLB1-4 and CLN2) and cyclin-dependent kinases (CKS1 and CDC28) and a nuclear import protein (NIP29). Although they have many interactions, these proteins are not present in the cell at the same time. (B) Pheromone signal transduction pathway in the network of protein–protein interactions. This module includes several MAPK (mitogen-activated protein kinase) and MAPKK (mitogen-activated protein kinase kinase) kinases, as well as other proteins involved in signal transduction. These proteins do not form a single complex; rather, they interact in a specific order. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  37. Robustness of clusters found Model effect of false positives in experimental data: randomly reconnect, remove or add 10-50% of interactions in network. Cluster recovery probability as a function of the fraction of altered links. Black curves correspond to the case when a fraction of links are rewired. Red, removed; green, added. Circles represent the probability to recover 75% of the original cluster; triangles represent the probability to recover 50%. Noise in the form of removal or addions lf links has less deteriorating effect than random rewiring. About 75% of clusters can still be found when 10% of links are rewired. Spirin, Mirny, PNAS 100, 12123 (2003) Bioinformatics III

  38. Summary Here: analysis of meso-scale properties demonstrated the presence of highly connected clusters of proteins in a network of protein interactions. Strong support for suggested modular architecture of biological networks. Distinguish 2 types of clusters: protein complexes and dynamic functional modules. Both complexes and modules have more interactions among their members than with the rest of the network. Dynamic modules are elusive to experimental purification because they are not assembled as a complex at any single point in time. Computational analysis allows detection of such modules by integrating pairwise molecular interactions that occur at different times and places. However, computational analysis alone, does not allow to distinguish between complexes and modules or between transient and simultaneous interactions. Bioinformatics III

  39. V10 Protein complexes and their shared components • - Most cellular processes result from a cascade of events mediated by proteins that act in a cooperative manner. • Protein complexes can share components: proteins can be reused and participate to several complexes. • Methods for analyzing high-throughput protein interaction data have mainly used clustering techniques. • They have been applied to assign protein function by inference from the biological context as given by their interactors, and to identify complexes as dense regions of the network (see V9). • The logical organization into shared and specific components, and its representation remains elusive. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  40. shared components Shared components = proteins or groups of proteins occurring in different complexes are fairly common: A shared component may be a small part of many complexes, acting as a unit that is constantly reused for ist function. Also, it may be the main part of the complex e.g. in a family of variant complexes that differ from each other by distinct proteins that provide functional specificity. Aim: identify and properly represent the modularity of protein-protein interaction networks by identifying the shared components and the way they are arranged to generate complexes. Gagneur et al. Genome Biology 5, R57 (2004) Georg Casari, Cellzome (Heidelberg) Bioinformatics III

  41. Modules A graph and its modules. Nodes connected by a link are called neighbors. In graph theory, a module is a set of nodes that have the same neighbors outside the module. In addition to the trivial modules{a},{b},...,{g} and {a,b,c,..,g}, this graph contains the modules {a,b,c}, {a,b},{a,c},{b,c} and {e,f}. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  42. Quotient • Elements of a module have exactly the same neighbors outside the module • one can substitute all of them for a representative node. In a quotient, all elements of the module are replaced by the representative node, and the edges with the neighbors are replaced by edges to the representative. Quotients can be iterated until the entire graph is merged into a final representative node. Iterated quotients can be captured in a tree, where each node represents a module, which is a subset of ist parent and the set of its descendant leaves. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  43. Modular decomposition Modular decomposition of the example graph shown before. Modular decomposition gives a labeled tree that represents iterations of particular quotients, here the successive quotients on the modules {a,b,c} and {e,f}. The modular decomposition is a unique, canonical tree of iterated quotients (formal proof exists  Möhring 1985). Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  44. Modular decomposition The nodes of the modular decomposition are labeled in 3 ways: As series when the direct descendants are all neighbors of each other, as parallel when the direct descendants are all non-neighbors of each other, and by the structure of the module otherwise (prime module case). Series are labeled by an asterisk within a circle, parallel by two parallel lines within a circle, and prime by a P within a circle. The prime is advantageously labeled by its structure. The graph can be retrieved from the tree on the right by recursively expanding the modules using the information in the labels. Therefore, the labeled tree can be seen as an exact alternative representation of the graph. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  45. Results from protein complex purifications (PCP), e.g. TAP • Different types of data: • Y2H: detects direct physical interactions between proteins • PCP by tandem affinity purification with mass-spectrometric identification of the protein components identifies multi-protein complexes • Molecular decomposition will have a different meaning due to different semantics of such graphs. Here, focus analysis on PCP content. PCP experiment: select bait protein where TAP-label is attached  Co-purify protein with those proteins that co-occur in at least one complex with the bait protein. In future, integrated view combining both types of data would be preferred. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  46. Clique and maximal clique A clique is a fully connected sub-graph, that is, a set of nodes that are all neighbors of each other. In this example, the whole graph is a clique and consequently any subset of it is also a clique, for example {a,c,d,e} or {b,e}. A maximal clique is a clique that is not contained in any larger clique. Here only {a,b,c,d,e} is a maximal clique. Assuming complete datasets and ideal results, a permanent complex will appear as a clique. The opposite is not true: not every clique in the network necessarily derives from an existing complex. E.g. 3 connected proteins can be the outcome of a single trimer, 3 heterodimers or combinations thereof. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  47. Results from protein complex purifications (PCP), e.g. TAP Interpretation of graph and module labels for systematic PCP experiments. (a) Two neighbors in the network are proteins occurring in a same complex. (b) Several potential sets of complexes can be the origin of the same observed network. Restricting interpretation to the simplest model (top right), the series module reads as a logical AND between its members. (c) A module labeled ´parallel´ corresponds to proteins or modules working as strict alternatives with respect to their common neighbors. (d) The ´prime´ case is a structure where none of the two previous cases occurs. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  48. Obtain maximal cliques Modular decomposition provides an instruction set to deliver all maximal cliques of a graph. In particular, when the decomposition has only series and parallels, the maximal cliques are straightforwardly retrieved by traversing the tree recursively from top to bottom. A series module acts as a product: the maximal cliques are all the combinations made up of one maximal clique from each „child“ node. A parallel module acts as a sum: the set of maximal cliques is the union of all maximal cliques from the „child“ nodes. Gagneur et al. Genome Biology 5, R57 (2004) Bioinformatics III

  49. Hier wurde deutlich gekürzt. Nur Grundaspekte des Algorithmus sind wichtig. Consider undirected graph G=(V,E) with n =|V| vertices and m=|E| edges. The complement of a graph G is denoted by G. If X is a subset of vertices, then G[X] is the subgraph of G induced by X. Let x be an arbitrary vertex, then N(x) and N(x) stand respectively for the neighborhood of x and its non-neighborhood. A vertex xdistinguishes two vertices u and v if (x,u)  E and (x,v)  E. A moduleM of a graph G is a set of vertices that is not distinguished by any vertex. Bioinformatics III

  50. A simple linear algorithm for modular decomposition The modules of a graph are a potentially exponentially-sized family However, the sub-family of strong modules, the modules that overlap no other modules, has size O(n). AoverlapsB if A  B  , A \ B   and B \ A   The inclusion order of this family defines the previously explained modular tree decomposition, which is enough to store the module family of a graph. The root of this tree is the trivial module V and its n leaves are the trivial modules {x}, xV. Habib, de Montgolfier, Paul (2004) Bioinformatics III

More Related