1 / 30

FunCoup: reconstructing protein networks in the worm and other animals

FunCoup: reconstructing protein networks in the worm and other animals. Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center. C. elegans computed interactomes. A worm. ?. Mouse. High-throughput evidence. Fly. Find orthologs*. B worm. Human. Yeast.

jerry
Download Presentation

FunCoup: reconstructing protein networks in the worm and other animals

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. FunCoup:reconstructing protein networks in the worm and other animals Andrey Alexeyenko, Erik Sonnhammer Stockholm Bioinformatics Center

  2. C. elegans computed interactomes

  3. Aworm ? Mouse High-throughput evidence Fly Find orthologs* Bworm Human Yeast FunCoup is a data integration framework to discover functional coupling in eukaryotic proteomes with data from model organisms

  4. FunCoup • Each piece of data is evaluated • Data FROM many eukaryotes (7) • Practical maximum of data sources (>60) • Predicted networks FOR a number of eukaryotes (8) • Organism-specific efficient and robust Bayesian frameworks • Orthology-based information transfer and phylogenetic profiling • Networks predicted for different types of functional coupling (metabolic, signaling etc.)

  5. Li&Vidal’s set 5535 pairs IntAct (Oct. 2007) 4517 pairs C. elegans’ benefit from the model species data integration: 6841 Other C. elegans data 6 eukaryotes' data 36000 predicted C.elegans pairs

  6. Data sources in FunCoup: • Species: • H. sapiens • M. musculus • R. norvegicus • D. melanogaster • C. elegans • S. cerevisiae • A. thaliana • Types: • Protein-protein interactions • Protein domain associations • Protein-DNA interactions • mRNA expression • Protein expression • miRNA targeting • Sub-cellular co-localization • Phylogenetic profiling

  7. Multilateral data transfer Human Mouse Rat FunCoup Ciona Fly Worm Yeast Arabidopsis Data from the same species is an important but not indispensable component of the framework. Hence, a network can be constructed for an organism with no experimental datasets at all.

  8. InParanoid P r o t e o m e A P r o t e o m e B Reciprocally best hits ~ seed orthologs Inparalogs Automatic clustering of orthologs and in-paralogs from pairwise species comparisons Maido Remm, Christian E. V. Storm and Erik L. L. Sonnhammer Journal of Molecular Biology 314, 5, 14 December 2001, Pages 1041-1052

  9. How orthology works? Log overlap between KEGG pathways and complexes (Gavin et al., 2006)

  10. Rat Human Mouse Comparing networks

  11. Conclusions FunCoup: • is a flexible, exhaustive, and robust framework to infer confident functional links • enables practical web access to candidate interactions in both small and global-scale network context • is open towards better data quality and coverage http://FunCoup.sbc.su.se

  12. Acknowledgements: • Carsten Daub • Kristoffer Forslund • Anna Henricson • Olof Karlberg • Martin Klammer • Mats Lindskog • Kevin O’Brien • Tomas Ohlson • Sanjit Rupra • Gabriel Östlund • Sean Hooper • All previous interaction network developers

  13. Talk outline • Other network resources • Why FunCoup • Orthology and InParanoid • Implementation • Applications and future development

  14. FunCoup is a naïve Bayesian network (NBN)Bayesian inference: Genes A and B co-expressed P(C|E) = (P(C) * P(E|C)) / P(E) A<->B Genes A and B are functionally coupled

  15. Problem: Solution: In situatons with multiple inparalogs, how to deal with alternative evidence? Treat ALL inparalogs equally, and choose the BEST value

  16. Problem: Solution: Absolute probabilities of FC are intractable. The full Bayesian network is impossible Naïve Bayesian network. Calculate a belief change instead (likelihood ratios, LR). Assume NO data dependency P(A|C), P(C|A) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(D|C), P(C|D) P(B|A), P(A|B) P(A|D), P(D|A) P(B|C), P(C|B) A<->B P(B|D), P(D|B) A<->B

  17. gene evolution functional link Problem: Solution: How to establish optimal bridges between species? Via groups of orthologs that emerged from speciation

  18. Homologs P r o t e o m e A P r o t e o m e B Homologs: proteins with similar sequence and, thus, common origin

  19. An InParanoid cluster of orthologs Inparalogs

  20. Problem: Solution: Some LR are weak and arise due to non-representative sampling Enforce confidence check and remove insignificant nodes P(E|+) / P(E|-) P(E|+) / P(E|-) χ2-test P(E|+) / P(E|-) P(E|+) / P(E|-) A<->B

  21. Reciprocally best hits P r o t e o m e A P r o t e o m e B Reciprocally best hits

  22. P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) P(E|+) / P(E|-) A|B A<>B Problem: Solution: Multinet Decide which types of FC are needed (provide as positive training sets) and perform the previous steps customized Definitions and notions of FC vary A<>B A|B A||B

  23. Multinet presents several link types in parallel Proteins of the Parkinson’s disease pathway (KEGG #05020) Physical protein-protein interaction “Signaling” link Metabolic “non-signaling” link

  24. The limits of data integration

  25. FunCoup’s web interface http://FunCoup.sbc.su.se Hooper S., Bork P. Medusa: a simple tool for interaction graph analysis. Bioinformatics. 2005 Dec 15;21(24):4432-3. Epub 2005 Sep 27.

  26. Reconctructing the “regulatory blueprint”* in C. intestinalis *Imai KS, Levine M, Satoh N, Satou Y (2006) Regulatory blueprint for a chordate embryo. Science, 26:1183-7. Proteins of the “Regulatory Blueprint for a Chordate Embryo” [*] 18 links mentioned in [*] AND found by FunCoup Links found by FunCoup (about 140) The rest, 202 links from [*] that FunCoup did not find, not shown

  27. Overview and comparison of ortholog databases Alexeyenko A, Lindberg J, Pérez-Bercoff Å, Sonnhammer ELL Drug Discovery Today:Technologies (2006) v. 3; 2, 137-143 Orthologs Functional link Inparalogs C. elegans D. melanogaster human S cerevisiae

  28. Solution: Find them individually for each data set and FC class, accounting for the joint “feature – class” distribution Problem: Distribution areas informative of FC may vary + + + + + + + +++ +++ +++ ++ + ++ - - - ----- -- ------ - - -- - - - -1 0 Pearson r 1

  29. Validation Jack-knife procedure: • Take “positive” and “negative” sets • Split each randomly as 50:50 • Use the first parts to train the algorithm, the second to test the performance • Repeat a number of times Analysis Of VAriance: • Introduce features A, B, C in the workflow of FunCoup (e.g., using PCA, selecting nodes of BN by relevance, ways of using ortholog data etc.) • Run FunCoup with all possible combinations of absence/presence of A, B, C to produce a balanced and orthogonal ANOVA design with replicates • Study effects of A,B,C or their combinations AxB, BxC,.. AxBxC to see if they influence the performance significantly (whereas all other effects did not exist)

More Related