1 / 26

Evaluation and integration of multiple datasets using Bayes theorem

Evaluation and integration of multiple datasets using Bayes theorem. John van Dam. How can we integrate multiple datasets?. Proteomics data. ?. Published data. Genetic data. Expression data. Evolutionary data. How can we integrate multiple datasets?. Proteomics data. Published data.

Download Presentation

Evaluation and integration of multiple datasets using Bayes theorem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Evaluation and integration of multiple datasetsusing Bayes theorem John van Dam

  2. How can we integrate multiple datasets? Proteomics data ? Published data Genetic data Expression data Evolutionary data

  3. How can we integrate multiple datasets? Proteomics data Published data Genetic data Expression data Evolutionary data

  4. Thomas Bayes (1701 – 1761) • Presbyterian minister • Fellow of the Royal Society • Published two works: • A religious essay • An essay defending the work of Sir Isaac Newton • His work on the “Bayes’ theorem” was published by Richard Price in 1763 • Mathematics of probabilities • A hot topic in science in early 18th century • A lot of people at the time were interested in mathematics, statistics and probabilities because of gambling!

  5. Bayes’ theorem P(A|B) = Probability of A given observation B P(B|A) = Probability of observation of B given A P(A) = The a priori probability of A P(B) = The probability that B is observed Bayes’ theorem deals with “inverse probabilities”

  6. Example: • A friend tells you he had a nice conversation with someone in the train to Nijmegen • What is the chance that this other person is a woman? • Your friend only tells you that this person has long hair. • Does this change the previous probability? • Say: • 75% of women have long hair • 15% of men have long hair

  7. Bayes’ theorem • What if your friend told you that this person was also wearing high heels? • We can use P(W|L) as the new prior! • This is called Bayesian updating • You adjust your ‘belief’ with each new piece of information! • Bayesian updating assumes no relationship between L and H other than via W!

  8. Bayesian odds For convenience we can rewrite Bayes’ equation into odds (or Bayes factor)

  9. Bayesian odds If we now perform Bayesian updating we can simply write

  10. Beware of ‘extreme’ cases (or priors) • “A Bayesian is one who, vaguely expecting a horse, and catching a glimpse of a donkey, strongly believes he has seen a mule.” • http://www2.isye.gatech.edu/~brani/isyebayes/jokes.html • What did we just “probabilistically” describe if the person was actually a man?

  11. How can we integrate multiple datasets? Proteomics data Published data Genetic data Expression data Evolutionary data

  12. Ciliary biology; a relatively young field

  13. Ciliated tissues (some examples) Inner ear: Cilia function in hearing and balance Sperm cells Cerebral cavities, Bronchia & Fallopian tubes Retina: Cones and Rods

  14. Bayesian integration on SysCilia data • Tandem Affinity Purifications & SILAC • Yeast 2 Hybrid screens • Ciliary evolutionary co-occurrence • Gene presence/absence profiles matching ciliary presence/absence • System co-expression • Genes with XBOX transcription factor binding sites • What is the probability that gene X is ciliary given thatit is reported by experiments 1, 2, 3, …, and n?

  15. Bayesian integration of multiple observations • n is the number of datasets considered • fi= dataset i • P(fi|T) = probability that a gene is reported by dataset i given it is a known ciliary gene • We take log odds because deviations, caused by rounding and measurement errors, are not enlarged with each multiplication

  16. Can we say something about genes that were not reported? In case of yes/no experiments, “No” can also have meaning. In case we have a result which has a value, we can use categories.For instance: Each gene falls into one category for each experiment.

  17. Evaluating True and False per experiment • We need a list of known ciliary genes (a Gold Standard) • We need a list of known non-ciliary genes (a Negative Set) • Then simply becomes Fraction of GS reported by experiment i Fraction of NS reported by experiment i

  18. Gold Standard & Negative set

  19. System co-expression

  20. Distinguishing between ciliary vs. non-ciliary genes

  21. Ranking based on Bayesian integration Ciliary Predicted Non-ciliary The Bayesian integration enriches for more known ciliary genes, than the individual datasets. We can control for False Discovery Rate.

  22. ROC-curve and performance of individual datasets AUC: 0.86

  23. Application of the Bayesian integration • Predicting causative genes in ciliopathy disease loci or exome data • Predict which genes are likely involved in ciliary function, and which are not • Example BBS5 locus (182 genes):

  24. Conclusion • Bayesian integration is a powerful way to predict novel ciliary genes by objective evaluation and integration of experimental datasets • New datasets can easily be incorporated • You can use such a Bayesian integration to • Predict novel ciliary genes • Rank target genes from new experiments • Predict causative genes in patient exome data

  25. Acknowledgements Ueffing lab, Tübingen Huynen Lab, Radboud UMC Roepman lab, Radboud UMC Oliver Blacque, UCD Dublin

More Related