1 / 63

Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n 2 Easy Steps. Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187. Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University. Outline.

benard
Download Presentation

Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. How to Reconstruct a Large Genetic Network from n Gene Perturbations in fewer than n2 Easy Steps Andreas Wagner, Bioinformatics, vol. 17, No. 12, 2001, pp. 1183-1187. Speaker: Chuang Chieh Lin Advisor: Professor R. C. T. Lee National Chi-Nan University CSIE in National Chi-Nan University

  2. Outline • Introduction and basic definitions • Graph theoretical framework • Parsimonious network • Algorithm and complexity • Cycles in genetic networks • Conclusions • References CSIE in National Chi-Nan University

  3. Outline • Introduction and basic definitions • Graph theoretical framework • Parsimonious network • Algorithm and complexity • Cycles in genetic networks • Conclusions • References CSIE in National Chi-Nan University

  4. Introduction and basic definitions • Gene activity includes whether a gene is expressed or not, as mRNA, as protein etc.. • Gene network: In this paper, we define a genetic network as a group of genes in which individual gene can influence the activity of other genes. • The core task of reconstructing genetic networks is to identify the causal structure of a gene network. CSIE in National Chi-Nan University

  5. To reconstruct a genetic network is to identify, for each network gene, which other genes and their activity the gene influences directly. • Now, let’s see an illustration of genetic network. CSIE in National Chi-Nan University

  6. transcription factor protein kinase protein phosphatase transcription factor inactive inactive P protein P active active DNA Gene 5 Gene 4 Gene 2 Gene 3 Gene 1 This is a hypothetical biochemical pathway involving two transcription factors, a protein kinase and a protein phosphatase, as well as the genes encoding them. CSIE in National Chi-Nan University

  7. Genetic perturbation: an experimental manipulation of gene activity by manipulating either a gene itself or its product. It includes point mutations, gene deletions, or other interference with the activity of the product. CSIE in National Chi-Nan University

  8. transcription factor protein kinase protein phosphatase transcription factor inactive inactive P protein P active active DNA Gene 5 Gene 4 Gene 2 Gene 3 Gene 1 Genetic perturbation: gene deletion Genetic perturbation: gene deletion Aspect of gene activity: mRNA expression Aspect of gene activity: phosphorlation state G1: G2, G5 G1: G3, G4 G2: G5 G2: G3, G4 G3: G5 G3: G4 G4: G5 G4: G5: G5: CSIE in National Chi-Nan University

  9. Outline • Introduction and basic definitions • Graph theoretical framework • Parsimonious network • Algorithm and complexity • Cycles in genetic networks • Conclusions • References CSIE in National Chi-Nan University

  10. Graph theoretical framework • As the previous instance indicated, we are concerned with qualitative information on gene interaction. • We consider a “digraph”, a graph representation of genetic networks, to this qualitative information. • A digraph is a directed graph consisting of nodes and directed edges. • Let’s see an example. CSIE in National Chi-Nan University

  11. We use a → b to mean that gene a influence the activity of gene b directly. For brevity, genes will be labeled by numbers from now on. 1 13 18 17 4 8 11 20 10 7 9 19 6 3 2 5 15 12 16 0 14 CSIE in National Chi-Nan University

  12. Adjacency list: for each gene i, it simply shows which genes’ activity state the gene i influences directly. • We denote Adj(G) to be the adjacency list of graph G and Adj(i) to be the set of nodes (genes) adjacent to (directly influenced by) node i. CSIE in National Chi-Nan University

  13. 0: 16 1: 2: 3: 2 5 8 4: 5: 12 6: 5 12 7: 2 17 8: 9: 10 15 10: 1 20 11: 20 12: 14 13: 8 17 14: 0 15: 0 16: 2 17: 8 18: 19: 8 20: 6 18 Adjacency list of G: 1 13 18 17 4 8 11 20 10 7 9 19 6 3 2 5 15 12 16 0 14 G CSIE in National Chi-Nan University

  14. Accessibility list: the list of perturbation effects or the list of regulatory effects. It shows all nodes (genes) that can be accessed (influenced in their activity state) from a given gene by paths of direct interactions. • We denote Acc(G) to be the accessibility list of the graph G and Acc(i) to be the set of nodes that can be reached (influenced) from node (gene) i. CSIE in National Chi-Nan University

  15. 0: 2 16 1: 2: 3: 0 2 5 8 12 14 16 4: 5: 0 2 12 14 16 6: 0 2 5 12 14 16 7: 2 8 17 8: 9: 0 1 2 5 6 10 12 14 15 16 18 20 10: 0 1 2 5 6 12 14 16 18 20 11: 0 2 5 6 12 14 16 18 20 12: 0 2 14 16 13: 8 17 14: 0 2 16 15: 0 2 16 16: 2 17: 8 18: 19: 8 20: 0 2 5 6 12 14 16 18 Accessibility list of G: 1 13 18 17 4 8 11 20 10 7 9 19 6 3 2 5 15 12 16 0 14 G CSIE in National Chi-Nan University

  16. Outline • Introduction and basic definitions • Graph theoretical framework • Parsimonious network • Algorithm and complexity • Cycles in genetic networks • Conclusions • References CSIE in National Chi-Nan University

  17. Before proceeding with the algorithm, we have to give some concepts and theorems first. CSIE in National Chi-Nan University

  18. The most parsimonious network • An acyclic digraph defines its accessibility list, but an accessibility list may have more than one corresponding acyclic digraph. • Let’s see an example first. CSIE in National Chi-Nan University

  19. (d) is the most parsimonious network of Acc, i.e., (a). 0 0: 1 2 3 4 5 1: 2 3 4 5 2: 3 4 5 3: 4: 5 5: 1 2 4 3 (b) (a) 5 0 0 1 1 2 2 4 3 4 3 5 5 (c) (d) CSIE in National Chi-Nan University

  20. An accessibility list Acc and a digraph G are compatible if G has Acc as its accessibility list. Acc is the accessibility list induced by G. • Gpars is called the most parsimonious network compatible with Acc. CSIE in National Chi-Nan University

  21. Why we prefer the most parsimonious network? • We prefer simplest or most parsimonious one of gene network. • For any accessibility list Acc of a digraph G, there exists a most parsimonious network Gpars. (From a result of a theorem.) Therefore Gpars is the core of all the corresponding digraphs. • More complicated digraphs make people confused. CSIE in National Chi-Nan University

  22. Theorem 1 • Let Acc be the accessibility list of an acyclic digraph. Then there exists exactly one graph Gpars that has Acc as its accessibility list and that has fewer edges than any other graph G with Acc as its accessibility list. • Before starting the proof, we need to introduce some terminology. CSIE in National Chi-Nan University

  23. Range and shortcut • Consider two nodes i and j of a digraph that are connected by an edge e. The ranger of the edge e is the length of the shortest path between i and j in the absence of e. If there is no other path connecting i and j, then r : = . • An edge e with range r≥ 2 but is called a shortcut. • Let’s see an example. CSIE in National Chi-Nan University

  24. e j i e is a shortcut. When eliminating e, i and j are still connected by a path of length k + 1, so r(e) = k + 1. r(e) = k + 1 zk z1 zk-1 z2 zk-2 CSIE in National Chi-Nan University

  25. Lemma 1 • For any accessibility list Acc of a digraph, there exists a compatible graph Gpars that is free of shortcuts. CSIE in National Chi-Nan University

  26. ei yi yi xi xi Pi Pi Length of Piis greater than 1. Proof of Lemma 1 • Assume that there is no such graph Gpars. deleting ei If there exists a shortcut ei between xi and yi , delete ei . Then by the definition of shortcut, we’ll derive that xi and yi are still connected via Pi , whose length is greater than 1. CSIE in National Chi-Nan University

  27. Suppose that we have n possible (xi , yi), i.e., (x1, y1), …, (x1, xn). After repeating all possible (xi , yi), i = 1, …, n, we’ll derive a shortcut-free graph compatible with the accessibility list. This is a contradiction to the assumption made in the beginning of this proof. CSIE in National Chi-Nan University

  28. Lemma 2 • Assume that Acc is the accessibility list of a digraph G. For each node x, the adjacency list Adj(x) of a shortcut-free graph Gpar compatible with Acc is a subset of the adjacency list Adj(x) of any graph compatible with Acc. CSIE in National Chi-Nan University

  29. Proof of Lemma 2 • Assume that Lemma 2 is false. • W. L. O. G., suppose that a shortcut-free graph Gpars and some other graph G induce Acc. • By assumption, Gpars contains at least one node x so that Adj(x) of Gpars contains at least one node y that isn’t in Adj(x) of G. CSIE in National Chi-Nan University

  30. Because G and Gpars have the same accessibility list Acc, there must exist some path x → z1 → z2 → … → zk → y from x to y in G. For the same reason, z1 is accessible from x in Gpars, z2 from z1 in Gpars, … and zk from zk-1 in Gpars. • Therefore we can find two paths (x →…→y) in Gpars: (1) the edge e between x and y (2) the path x → z1 →z2 →… →zk →y • This is in contradiction to the assumption that Gpars is shortcut-free because e is a shortcut. Let’s see an example! CSIE in National Chi-Nan University

  31. x z1 z2 y G x: z1 y z1: z2 z2: y x: z1z2 z1: z2 z2: y x: z1z2y z1: z2y z2: y Acc: Adj(Gpars): Adj(G): x z1 A shortcut! z2 y Gpars CSIE in National Chi-Nan University

  32. Corollary 1 • The shortcut-free graph Gpars compatible with Acc is a unique graph with the fewest edges among all graphs G compatible with Acc. • This corollary follows immediately from Lemma 2. CSIE in National Chi-Nan University

  33. Now, we can proceed to the algorithm. CSIE in National Chi-Nan University

  34. Outline • Introduction and basic definitions • Graph theoretical framework • Parsimonious network • Algorithm and complexity • Cycles in genetic networks • Conclusions • References CSIE in National Chi-Nan University

  35. A recursive pruning algorithm to reconstruct the most parsimonious graph from an accessibility list. 1: for all nodes i of G 2: Adj(i) = Acc(i) 3: for all nodes i of G 4: if node i hasn’t been visited 5: call PRUNE_ACC(i) 6: end if 7: PRUNE_ACC(i) 8: for all nodes j Acc(i) 9: if Acc(j) = 10: declare j as visited. 11: else 12: call PRUNE_ACC(j) 13: end if 14: for all nodes jAcc(i) 15: for all nodes k Adj(j) 16: if k Acc(i) 17: delete k from Adj(i) 18: end if 19: declare node i as visited 20: end PRUNE_ACC(i) CSIE in National Chi-Nan University

  36. This algorithm is based on the following theorem, so we have to get something from the theorem. CSIE in National Chi-Nan University

  37. Theorem 2 • Let Acc(G) be the accessibility list of an acyclic digraph, Gpars its most parsimonious graph, and V(Gpars) the set of all nodes of Gpars. Then the following identity holds: • In stead of proving the theorem, we give an example later. CSIE in National Chi-Nan University

  38. 0 0 0 1 1 1 2 2 2 4 4 3 3 4 3 5 5 5 0: 1 2 3 4 5 1: 2 3 4 5 2: 3 4 5 3: 4: 5 5: 0: 1 1: 2 3 4 5 2: 3 4 5 3: 4: 5 5: 0: 1 1: 2 2: 3 4 5 3: 4: 5 5: Original Acc(G) 1 via 2, 3, 4, 5 0 via 1, 2, 3, 4, 5 A possible corresponding G CSIE in National Chi-Nan University

  39. 0 0 0 1 1 1 2 2 2 4 4 4 3 3 3 5 5 5 0: 1 1: 2 2: 3 4 3: 4: 5 5: 0: 1 1: 2 2: 3 4 3: 4: 5 5: 0: 1 1: 2 2: 3 4 5 3: 4: 5 5: 2 via 3, 4, 5 4 via 5 The most parsimonious network CSIE in National Chi-Nan University

  40. Actually, the aforementioned example is an illustration of our algorithm. • From this theorem, we can derive Corollary 2. CSIE in National Chi-Nan University

  41. i A shortcut !! j k Corollary 2 • Let i, j and k be any three pairwise different nodes of an acyclic directed shortcut-free graph G. If j is accessible from i, then no node k accessible from j is adjacent to i. CSIE in National Chi-Nan University

  42. Computational complexity • Let k < n− 1 be the average number of entries in a node’s accessibility list. • Assume that there are n genes, that is, n entries. CSIE in National Chi-Nan University

  43. During execution, each node accessible from a node j induces one recursive call of PRUNE_ACC, after which the node accessed from j is declared as visited. Thus each entry of the accessibility list of a node is explored no more than once. • Line 15 of the algorithm loops over all nodes adjacent to a node j. Let a denotes the average number of entries in Adj(j). • The overall computational complexity would be O(nka). CSIE in National Chi-Nan University

  44. For practical matters, large scale experimental gene perturbations in the yeast Saccharomyces cerevisiae (n≈ 6300) suggest that k < 50 ([HMJRS2000]), a≤ 1 ([W2001a]) and thus nka << n2. CSIE in National Chi-Nan University

  45. Storage complexity • The algorithm stores two copies of the accessibility list, as well as a list of the nodes that has been visited. • Because the graph is acyclic, the recursion depth can be no greater than n − 1. • Note that k < n− 1 is the average number of entries in a node’s accessibility list. • The overall storage requirements are O(nk). CSIE in National Chi-Nan University

  46. Outline • Introduction and basic definitions • Graph theoretical framework • Parsimonious network • Algorithm and complexity • Cycles in genetic networks • Conclusions CSIE in National Chi-Nan University

  47. Dealing with cycles • All we have mentioned are restricted on acyclic graphs. • Now let us go to see the problems brought by cyclic graphs. CSIE in National Chi-Nan University

  48. 1 2 4 3 2 1 0 4 3 0 Problems that single gene perturbation can’t solve They have the same accessibility list. Therefore, we can not reconstruct the gene network uniquely. 0: 1 2 3 4 1: 0 2 3 4 2: 0 1 3 4 3: 0 1 2 4 4: 0 1 2 3 CSIE in National Chi-Nan University

  49. 1 2 4 3 2 1 0 4 3 0 0: 3 1: 4 2: 1 3: 2 4: 0 0: 1 1: 2 2: 3 3: 4 4: 0 Note that the order of direct regulatory interactions in these two networks is different, as reflected in the adjacency lists. CSIE in National Chi-Nan University

  50. Instead of solving this problem, we collapse the nodes which form a cycle into a single group of nodes with indistinguishable order of regulatory interactions. • Such a single group can be also called a strongly connected component or strong component of a directed graph G. Every two nodes in a strong component are mutually accessible. • Let us see an example. CSIE in National Chi-Nan University

More Related