1 / 20

Graph, Search Algorithms

Graph, Search Algorithms. Ka-Lok Ng Department of Bioinformatics Asia University. Content. How to characterize a biology network ? – Graph theory, topological parameters (node degrees, average path length, clustering coefficient, and node degree correlation.)

Download Presentation

Graph, Search Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University

  2. Content How to characterize a biology network ? – Graph theory, topological parameters (node degrees, average path length, clustering coefficient, and node degree correlation.) – Random graph, Scale-free network, Hierarchical network Search algorithm – Breadth-first Search, Depth-first Search

  3. Biological Networks - metabolic networks Metabolism is the most basic network of biochemical reactions, which generate energy for driving various cell processes, and degrade and synthesize many different bio-molecules.

  4. Biological Networks - Protein-protein interaction network (PIN) Proteins perform distinct and well-defined functions, but little is known about how interactions among them are structured at the cellular level. Protein-protein interaction account for binding interactions and formation of protein complex. - Experiment – Yeast two-hybrid method, or co-immunoprecipitation Limitation: No subcellular location, and temporal information. Cliques – protein complexes ? www.utoronto.ca/boonelab/proteomics.htm

  5. Biological Networks - PIN Yeast Protein-protein interaction network - protein-protein interactions are not random - highly connected proteins are unlikely to interact with each other. Not a random network • Data from the high- • throughput two-hybrid • experiment (T. Ito, et al. • PNAS (2001) ) • The full set containing • 4549 interactions among • 3278 yeast proteins • 87% nodes in the largest • component • kmax ~285 ! • Figure shows nuclear • proteins only

  6. Biological Networks – Gene regulation networks In a gene regulatory network, the protein encoded by a gene can regulate the expression of other genes, for instance, by activating or inhibiting DNA transcription. These genes in turn produce new regulatory proteins that control other genes. Example of a genetic regulatory network of two genes (a and b), each coding for a regulatory protein (A and B).

  7. Biological Networks – Gene regulation networks Transcription regulatory network in H. sapiens Data courtesy of Ariadne Genomics obtained from the literature search: 1449 regulations among 689 proteins Transcription regulatory network in E. coli Data (courtesy of Uri Alon) was curated from the Regulon database: 606 interactions between 424 operons (by 116 TFs) Transcription regulatory network in Yeast - From the YPD database: 1276 regulations among 682 proteins by 125 transcription factors (~10 regulated genes per TF) - Part of a bigger genetic regulatory network of 1772 regulations among 908 proteins

  8. Graph Theory – Basic concepts Graphs G=(N,E) N={n1 n2,... nN} E={e1 e2,... eM} ek={ni nj} Nodes: proteins Edges: protein interactions Mutligraph ek={ni nj}+ duplicate edges i.e. em={ni nj} Nodes: proteins Edges: interactions of different sort: binding and similarity Hypergraphs Hyperedge: ex={ni, nj, nk ...} Nodes: proteins Edges: protein complexes Directed hypergraph Hyperedge: ex={ni, nj .. | nk nl ...} Nodes: substances Edges: chemical reactions A + B  C +D eX={A, B .. | C, D ...} Directed graph ek={ni nj} Nodes: genes and their products Edges from A to B: gene regulation gene A regulates expression of gene B Different systems  Different graphs

  9. Graph Theory – Basic concepts Clustering coefficient Ci if A-B, B-C, then it is highly probable that A-C Node degree Components Complete graph (Clique) Shortest path length Two ways to compute Ci -Ei actual connections out of Ck2 possible connections -number of triangles that included i/ki(ki-1) Average clustering coefficient

  10. 1 3 2 4 Graph Theory – Vertex adjacency matrix Undirected graph ki 1 3 1 1 symmetric - ∞ means not directly connected - node i connectivity, ki= countj(mij = 1) Bipartite graph

  11. G 1 a symmetric b 3 2 d 4 a b L(G) A(L(G)) = E(G) c d Graph Theory – Edge adjacency matrix a b c d a b c d c The edge adjacency matrix (E) of a graph G is identical to vertex adjacency matrix (A) of the line graph of G, L(G). That is the edge in G are replaced by vertices in L(G). Two vertices in L(G) are connected whenever the corresponding edges in G are adjacent. The labeling of the same graph G are related by a similarity transformation, P-1A(G1)P=A(G2).

  12. Graph Theory – average network distance Interaction path length or average network distance, d • the average of the distances between all pairs of nodes • frequency of the shortest interaction path length, f(L) • determined by using the Floyd’s algorithm The average network diameter d is given by where L is the shortest path length between two nodes. Network diameter (global)  Average network distance (local)

  13. 1 3 2 4 i j k Graph Theory – the shortest path The shortest path • Floyd algorithm, an O(N3) algorithm. For iteration n, • given three nodes i, j and k, it is shorter to reach j from i by passing through k Mnij=min{Mn-1ij, Mn-1ik+Mn-1kj} - search for all possible paths, e.g. 1-2, 1-2-3, 1-2-4, 2-3, 2-4

  14. Random Graph Theory = Graph Theory +Probability

  15. Random Graph Theory = Graph Theory +Probability

  16. Random Graph Theory= Graph Theory + Probability Random graph (Erdos and Renyi, 1960) N nodes labeled and connected by n edges • CN2 = N(N-1)/2 possible edges • possible graphs with N nodes and n edges N = 4  C6n N = 4 n 3 3 4 4 5 6

  17. Search Algorithms • Find the shortest route, in terms of distance between nodes S and G. • A matrix representation of the graph in Figure 3.1

  18. Search Algorithms – Breadth-first search (BFS) • Nodes are expanded in the order in which they are generated. S is expanded into A, B, and C, which are generated in the order 1,2,and 3. • A is expanded first to B, C and D, which has generation order 4, 5 and 6 • BFS goes back to node B and expands that next to A, C and E (generation order 7, 8 and 9) and then goes back to node 3 (C) and expands that to A, B, D, E and F (generation order 10, 11, 12, 13 and 14).

  19. Search Algorithms – Depth-first search (DFS) • Begin from the root node of the tree • Visited the first unvisit node, then marked this node • Then find the next unvisit node, then marked this node • When proceed, all the nodes are already visited, go back to the parent node

  20. Search Algorithms – Depth-first search (DFS) E

More Related