1 / 42

Protein Interaction Networks

Protein Interaction Networks. Thanks to Mehmet Koyuturk. Protein-Protein Interactions. Physical association between proteins Signal transduction, phosphorylation Docking, complex formation Permanent vs. transient interactions Co-location of proteins

Download Presentation

Protein Interaction Networks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Protein Interaction Networks Thanks to Mehmet Koyuturk

  2. Protein-Protein Interactions • Physical association between proteins • Signal transduction, phosphorylation • Docking, complex formation • Permanent vs. transient interactions • Co-location of proteins • Proteins that work in the same cellular component • Soluble location: lysosome, mitochondrial stroma • Membrane location: receptors in plasma membrane, transporters in mitochondrial membrane • Functional association of proteins • Proteins involved in the same biomolecular activity • Enzymes in the same pathway, co-regulated proteins

  3. Permanent vs Transient Interactions • Permanent interactions • Some proteins form a stable protein complex that carries out a structural or functional biomolecular role • These proteins are protein subunits of the complex and they work together • ATPase subunits, subunits of nuclear pore • Transient interactions • Proteins that come together in certain cellular states to undertake a biomolecular function • DNA replicative complex, signal transduction

  4. Signal Transduction • Phosphorylation • Protein-kinase interaction • Enzyme activation • Signaling cascade

  5. Why Study Protein Interactions? • Identification of functional modules and interconnections between these modules • Functional annotation based on binding partners and interaction patterns • Identification of evolutionarily conserved pathways • Identification of drug target proteins to minimize side effects

  6. Identification of Protein Interactions • Traditionally, protein interactions are identified by wetlab experiments based on hypotheses on candidate proteins • Small scale assays • Coimmunoprecipitation: Immunoprecipitate one protein, see if other is also precipitated • Reliable, but can only verify interactions between suspected partners • High throughput screening • Throw in thousands of ORFs and see which ones bind to each other • Yeast two hybrid, tandem affinity purification • Large scale, but a lot of noise

  7. Yeast Two Hybrid • Split yeast GAL4gene, which encodes a transcription factor, required for activation of GAL genes in two parts • Activating domain, binding domain • The split protein does not work unless the two parts are in physical contact

  8. Protein Interaction Networks • Organize all identified interactions in a network, where proteins are represented by nodes and interactions are represented by edges • TAP identifies a group of proteins that are caught by target protein • Spoke model (star network) vs. matrix model (clique) Interaction Protein

  9. Functional Modularity in PPI Networks • A protein complex • Dense subgraph • A signal transduction pathway • Simple path, parallel paths • A protein with common, key, • fundamental role (e.g., a kinase) • Hub node

  10. Computational Prediction of PPIs • Functional association is a higher level conceptualization of interaction • Proteins that act as enzymes catalyzing reactions in the same metabolic pathway • Functionally associated proteins are likely to show up in similar contexts • Co-regulation, co-expression, co-evolution, co-citation… • Functional association between proteins can be computationally identified by looking at different sources of data such as sequences, gene expression, literature • Can also be extended to capture physical associations, for example, by taking into account evolution at structural level

  11. Conservation of Gene Neighborhood • In bacteria, the genome of an organism is organized in such a way that that functionally related proteins are coded by neighboring regions • Operons • When more than one bacterial species are considered, it is observed that this neighborhood relationship becomes even more relevant Distribution of neighboring genes in H. Influenzae and E. coli into functional classes

  12. Comparison of Nine Bacterial Genomes • trpB-trpA is the only gene pair whose proximity is conserved across nine prokaryotic genomes • These genes encode the two subunits of tryptophan synthase that interact and catalyze a single reaction

  13. Close Orthologs • Run of genes • A set of genes on one strand, such that gaps between adjacent genes is less than a threshold, g (in practice, g 300 bp) • Any pair of genes on the same run are said to be close • Bidirectional best hits • Genes X1 and X2 from genomes G1 and G2 are BBH, if their sequence similarity is significant and there are no Y1 (Y2) in G1(G2) that is more similar to X2 (X1) than X1 (X2) Pair of close bidirectional best hits: Xa, Ya close in G1, Xb, Yb close in G2, Xa&Xb BBH, Ya& Yb BBH

  14. Predicting Interactions • For each pair of close orthologs (occuring at least one pair of genomes), calculate a score • Score should increase with the phylogenetic distance between the two genomes, since closely related organisms are more likely to have similar genes nearby due to chance alone • Existence of a triplet (P1, P2, P3) should be stronger than the existence of two pairs (P1, P2 and P1, P3) • Triplet distance can be estimated as the minimum distance between any pair of organisms (in addition to pair score)

  15. Reconstructing Pathways • Can identify the association between unknown proteins and known pathways! Purine Metabolism

  16. Projection of Gene Neighborhood • The composition of operons is evolutionarily variable • A particular set of functionally related genes do not always comprise an operon • The application of gene neighborhood based interaction prediction is limited for a single organism • With multiple organisms, it is possible to statistically strengthen conclusions and project findings on other organisms • If an operon with functionally related genes exists in several genomes, a functional association can be predicted for other organisms, even if the corresponding genes are scattered • Variability turns out to be an advantage for prediction

  17. Gene Neighborhood - Limitations • It is only directly applicable to bacteria (and archaea), because relevance of gene order does not necessarily extend to eukaryotes • For closely related species, conserved gene order might just be due to lack of time for genome rearrangements • We are interested in selective constraints that preserve gene order • Compared species should be distant enough • But not too distant, because we need sufficient number of orthologs to be able to derive statistically meaningful results

  18. Gene Fusion • Domain fusion events • Two protein domains that act as independent proteins (components) in one organism may form (part of) a single polypoptide chain (composites) in another organism • Most proteins that are involved in domain fusion events are known to be subunits of multiprotein complexes (76% in E. coli metabolic network)

  19. Gene Fusion Based PPI Prediction • A pair of proteins in query genome are candidate interacting pairs if • They show (local) sequence similarity to the same protein (rosetta stone) in reference genome • They do now show sequence similarity with each other • Complete genomes!

  20. Predicted Interactions Known physical interactions Proteins in the same pathway

  21. Gene Fusion Based Prediction - Results • Interactions predicted based on gene fusion events • Distance on circle shows distance on genome

  22. Co-evolution of Interacting Proteins • Selective pressure is likely to act on common function • Proteins that are interacting are expected to either be conserved together along with their interactions, or not conserved at all • Hypothesis 1: Orthologs of interacting proteins also interact in other species (supported by evidence, but there are subtleties, which we will discuss this later) • Hypothesis II: If two proteins are interacting, then they will show similar conservation patterns Phylogenetic profiles

  23. Phylogenetic Profiles

  24. Correlation of Phylogenetic Profiles • Assume we have N genomes, protein X has homologs in x of them, Y has y, and they co-occur in z genomes • Hamming distance: • Pearson correlation: • Mutual information: • Statistical significance:

  25. Phylogenetic Profiles - Limitations • Many processes may be common across lineages • Too many false positives • Database of genomes may be biased • All organisms are treated equally • Improvement: Use trees instead of profiles • Proteins are assumed to be conserved as a whole • It is domains that interact • Improvement: Use domain profiles Yeast nucleoli and ribosomal proteins Organisms

  26. Phylogenetic Tree Based Prediction • Phylogenetic trees of Ntr-family two-component sensor histidine kinases and their corresponding regulators

  27. Mirror Tree Method • Need to have sufficient number of genomes that contain homologs of both proteins

  28. Matrix Method • Start with families of proteins that are suspected to interact • Identify specific pairs of proteins that interact by aligning the phylogenetic trees that underly the two families • Assumption: Identical number of proteins in each family

  29. Correlated Mutations • Co-evolution of interacting proteins can be followed more closely by quantifying the degree of co-variation between pairs of residues from these proteins • Correlated mutations may correspond to compensatory mutations that stabilize the mutations in one protein with changes in the other Distribution of distances between aminoacid positions on a folded protein

  30. In silico Two-Hybrid • The correlation of mutations between two positions (may be on different proteins) can be estimated from pairwise assessment of aligned multiple sequences • Position pairs with high correlation are potential contact points • Interaction index • For a protein pair, compute the aggregate correlation (of mutations) across all positions

  31. In silico Two-Hybrid

  32. Performance of I2H • I2H predicts physical, rather than functional association • It requires complete genomes & sufficient number of homologs

  33. Co-citation Based PPI Prediction • Functionally associated proteins are likely to be cited in the same research article • We can assess the statistical significance of co-citation based on hypergeometric model • Algorithmic problem: How to recognize & match protein names? • Train algorithm using annotated abstracts via conditional random fields (CRF)

  34. Performance of Co-citation • The method is robust to choice of parameters for name recognition • Statistical significance is quite relevant until it saturates

  35. Integrating PPI Networks • Interaction data coming from multiple sources • Different sources refer to different levels of interaction • Can integration handle noise, making interaction data more reliable? • Superpose interactions based on their reliability

  36. Bayesian Integration • For each prediction method, compute log-likelihood score • Let P(L|E) be the number of interactions predicted by method E, such that functional association between corresponding proteins is known • Let ~P(L|E) be the number of false positives • Let P(L) and ~P(L) be the corresponding priors • Assign weights to methods based on their log-likelihood scores

  37. Comparison of Prediction Methods • Integrated network captures functional association better • Note that the integrated network is “trained” using available data on functional association

  38. Classification Based Integration • Points: Proteins, Space: Expression, Conservation, Labels: Function • Points: Protein Pairs, Space: Co-expression, Co-evolution, etc., Labels: Existence of Interaction

  39. Performance of Domain Co-evolution

  40. Co-Evolutionary Matrix

  41. Domain Identification

  42. Difference between Predicted PPIs

More Related