1 / 35

CSCE555 Bioinformatics

CSCE555 Bioinformatics. Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555. University of South Carolina Department of Computer Science and Engineering

nerina
Download Presentation

CSCE555 Bioinformatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSCE555 Bioinformatics • Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page: http://www.scigen.org/csce555 University of South Carolina Department of Computer Science and Engineering 2008 www.cse.sc.edu.

  2. In the beginning there was DNA… Liolios K, Tavernarakis N, Hugenholtz P, Kyrpides, NC. The Genomes On Line Database (GOLD) v.2: a monitor of genome projects worldwide. NAR 34, D332-334

  3. …then came protein interactions Arabidopsis PPI network E. Coli PPI network Yeast PPI network

  4. Comparative Genomics to Comparative Interactomics • Evolutionary conservation implies functional relevance • Sequence conservation implies functional conservation • Network conservation implies functional conservation too! • What new insights might we gain from network comparisons? (Why should we care?)

  5. Network comparisons allow us to: • Identify conserved functional modules • Query for a module, ala BLAST • Predict functions of a module • Predict protein functions • Validate protein interactions • Predict protein interactions Only possible with network comparisons Possible with existing techniques, but improved with network comparisons

  6. What is a Protein Interaction Network? • Proteins are nodes • Interactions are edges • Edges may have weights Yeast PPI network H. Jeong et al. Lethality and centrality in protein networks. Nature 411, 41 (2001)

  7. The Network Alignment Problem • Given k different protein interaction networks belonging to different species, we wish to find conserved sub-networks within these networks • Conserved in terms of protein sequence similarity (node similarity) and interaction similarity (network topology similarity)

  8. Example Network Alignment Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

  9. General Framework For Network Alignment Algorithms Network construction Alignment algorithm Scoring function Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

  10. Pearson Correlation Arrays Gene C Gene B Gene A .8 1 .8 - .7 Genes Gene A = 1 - .6 Gene B -.7 -.6 1 Gene C Microarray data Building Co-expression Networks Expression Balaji S. Srinivasan

  11. Two Algorithms • NetworkBLAST (covered today) Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005. • Græmlin Flannick et al. Græmlin: General and robust alignment of multiple large interaction networks.Genome Res 16: 1169-1181, 2006.

  12. Overview of Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS, 102(6):1974-1979, 2005.

  13. Estimation of Interaction Probabilities • In the preprocessing step, edges in the network are given a reliability score using a logistic regression model based on three features: • Number of times an interaction was observed • Pearson correlation coefficient between expression profiles • Proteins’ small world clustering coefficient

  14. Network Alignment Graphs • Construct a Network Alignment Graph to represent the alignment • Nodes contain groups of sequence similar proteins from the k organisms • Edges represent conserved interactions. • An edge between two nodes is present if: • One pair of proteins directly interacts, the rest are distance at most 2 away • All protein pairs are of distance exactly 2 • At least max(2, k – 1) protein pairs directly interact • Tries to account for interaction deletions

  15. a a’ a’’ b b’ b’’ c c’ c’’ Example Network Alignment Graph b’’ b Species X Species Y Species Z b’ a’’ a a’ Nodes Individual species’ PPI network c’’ c c’ Network alignment graph

  16. Scoring Function • Sharan et al. devise a scoring scheme based on a likelihood model for the fit of a single sub-network to the given structure • High scoring subgraphs correspond to structured sub-networks (cliques or pathways) • Only network topology is scored, node similarity is not

  17. Pr(Subgraph occurs | Conserved Network) log Pr(Subgraph occurs | Random Network) Log Likelihood Ratio Model • Measures the likelihood that a subgraph occurs if it is a conserved network vs. that if it were a randomly constructed network • Randomly constructed network preserves degree distribution for nodes

  18. Log Likelihood Ratio Model • (i) in a real subnetwork,each interaction should be present independently with high probability,and (ii) in a random subnetwork, the probability of an interactionbetween any two proteins depends on their total number of connectionsin the network.

  19. Probability of complex being observed in a conserved network model Probability of subgraph being observed in a random network model Likelihood Ratio Scoring of a Protein Complex in a Single Species U : a subset of vertices (proteins) in the PPI graph OU : collection of all observations on vertex pairs in U Ouv : interaction between proteins u, v observed Ms : conserved network model Mn: random network (null) model Tuv : proteins u, v interact Fuv : proteins u, v do not interact β : probability that proteins u, v interact in conserved model puv : probability that edge u, v exists in a random model

  20. Likelihood Ratio Scoring of a Protein Complex in a Single Species • Hence, log likelihood for a complex occurring in a single species is given by • For multiple complexes across different species, it is the sum of the log likelihoods L(A, B, C) = L(A) + L(B) + L(C)

  21. a a’ a’’ b b’ b’’ c c’ c’’ Example of Complex Scoring b’’ Complex X1 in Species X Complex Y1 in Species Y Complex Z1 in Species Z b b’ a’’ a a’ Nodes Individual species’ PPI network c’’ c L(A) = L(X1) + L(Y1) + L (Z1) c’ Conserved complex A in the Network alignment graph

  22. Alignment algorithm • Problem of identifying conserved sub-networks reduces to finding high scoring subgraphs • NP-complete problem • Heuristic solution: • Greedy extension of high scoring seeds • (Does this sound familiar? BLAST?) • Common to both papers discussed

  23. Alignment algorithm • Find seeds for each node v in the alignment graph • Find high scoring paths of 4 nodes by exhaustive search • Greedily add 3 other nodes one by one, that maximally increase the score of the seed

  24. Alignment algorithm • Iteratively add or remove nodes to increase the overall score of the node • Original seeds are preserved • Limit size of discovered subgraphs to 15 nodes • Record up to 4 highest scoring subgraphs discovered around each node

  25. Alignment algorithm • Filter subgraphs with a high degree of overlap • Iteratively find high scoring subgraph and remove all highly overlapping ones remaining

  26. Results Conserved network regions within yeast (orange ovals), fly (green rectangles) and worm (blue hexagons) PPI networks.

  27. Results • Prediction of protein function • ‘Guilt by association’ • If a conserved cluster or path is significantly enriched in a functional annotation • Prediction of protein interactions • Predictions based on 2 strategies: • Evidence that proteins with similar sequences interact • Co-occurrence of proteins in the same conserved cluster or path • Experimental verification of Yeast interactions using Y2H yielded 40-62% success rate

  28. Overview of • Fast, scalable, network alignment • Scales linearly in number of networks compared • NetworkBLAST scales exponentially • Supports efficient querying of modules • Speed-sensitivity control via user defined parameter • Not supported in NetworkBLAST

  29. Input to the Algorithm • Weighted protein interaction graphs • Weights represent probability that proteins interact • Constructed via network integration algorithm • A phylogenetic tree relating the species in the desired alignment • Used for progressive alignment

  30. Key Ideas of Graimin • Generating An Initial Alignment From The Seed • Greedy Seed Extension Phase • Progressive alignment technique using the phylogenetic tree

  31. Results Functional module identification using network alignment

  32. Results Multiple alignment of 10 networks showing possible cell division module Functional annotation using network alignment

  33. The Future of Network Comparison Græmlin? Græmlin Sharan and Ideker. Modeling cellular machinery through biological network comparison. Nature Biotechnology 24, pp. 427-433, 2006

  34. Summary • The problem: Network comparison/comparative interactomes • NetworkBlast algorithm • Brief introduction fo • The analogy between sequence comparison and network comparison

  35. Reference & Acknowledgements • Chuan Sheng Foo • Sharan et al. Conserved patterns of protein interaction in multiple species. PNAS. February 8, 2005 | vol. 102 | no. 6 | 1974-1979

More Related