1 / 30

Lewin A 1 , Richardson S 1 , Marshall C 1 , Glazier A 2 and Aitman T 2 (2006),

Bayesian Modelling of Differential Gene Expression. Lewin A 1 , Richardson S 1 , Marshall C 1 , Glazier A 2 and Aitman T 2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College Microarray Centre. Outline.

pelham
Download Presentation

Lewin A 1 , Richardson S 1 , Marshall C 1 , Glazier A 2 and Aitman T 2 (2006),

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayesian Modelling of Differential Gene Expression Lewin A1, Richardson S1, Marshall C1, Glazier A2 and Aitman T2 (2006), Biometrics 62, 1-9. 1: Imperial College Dept. Epidemiology 2: Imperial College Microarray Centre

  2. Outline • Introduction to microarrays and differential expression • Bayesian hierarchical model for differential expression • Decision rules • Predictive model checks • Gene Ontology analysis for differentially expressed genes • Further work

  3. * * * * * DNA TGCT cDNA ACGA Microarrays measure gene expression (mRNA) (1) Array contains thousands of spots Millions of strands of DNA of known sequence fixed to each spot (2) Sample (unknown sequences of cDNA) labelled with fluorescent dye (3) Matching sequences of DNA and cDNA hybridize together (4) Array washed  only matching samples left (see which from fluorescent spots) Pictures courtesy of Affymetrix

  4. Microarray experiment to find genes associated with Cd36 Cd36: gene known to be important in insulin resistance Aitman et al 1999, Nature Genet 21:76-83 Microarray Data 3 SHR compared with 3 transgenic rats (with Cd36) 3 wildtype (normal) mice compared with 3 mice with Cd36 knocked out  12000 genes on each array Biological Question Find genes which are expressed differently between animals with and without Cd36.

  5. Outline • Introduction to microarrays and differential expression • Bayesian hierarchical model for differential expression • Decision rules • Predictive model checks • Gene Ontology analysis for differentially expressed genes • Further work

  6. overall gene expression (fixed effect) variance for each gene differential effect for gene g between 2 conditions (fixed effect or mixture prior) array effect or normalisation (function of g) Bayesian hierarchical model for differential expression • 1st level yg1r | g, δg, g1  N(g – ½ δg + r(g)1 , g12), yg2r | g, δg, g2  N(g + ½ δg + r(g)2 , g22), ygsr is log gene expession

  7. Prior for gene variances 3 wildtype mice • 2nd level gs2 | μs, τs logNorm (μs, τs) Hyper-parameters μs and τs can be influential, so these are estimated in the model. • 3rd level μs  N( c, d) τs  Gamma (e, f) Variances estimated using information from all measurements (~12000 x 3) rather than just 3

  8.  a0 a1 a3 a2 Prior for array effects (Normalization) Spline Curve r(g)s = quadratic in g for ars(k-1)≤ g ≤ ars(k) with coeff (brsk(1),brsk(2) ), k =1, … #breakpoints Locations of break points not fixed Must do sensitivity checks on # break points

  9. Bayesian posterior mean loess Array effect as function of gene effect

  10. biological interest biological interest statistical confidence Decision Rules for Inference: Fixed Effects Model Inference on δ (1) dg= E(δg | data) posterior mean Like point estimate of log fold change. Decision Rule: gene g is DE if |dg| > δcut (2) pg = P( |δg|> δcut | data) posterior probability (incorporates uncertainty) Decision Rule: gene g is DE if pg > pcut This allows biologist to specify what size of effect is interesting (not just statistical significance)

  11. Illustration of decision rule 3 wildtype v. 3 knock-out mice pg = P( |δg|> log(2) and g > 4| data) xpg > 0.8 Δ t-statistic > 2.78 (95% CI)

  12. Outline • Introduction to microarrays and differential expression • Bayesian hierarchical model for differential expression • Decision rules • Predictive model checks • Gene Ontology analysis for differentially expressed genes • Further work

  13. Predictive Model Checks Key Points • Predict new data from the model (using the posterior distribution) • Get Bayesian p-value for each gene • Use all genes together (1000’s) to assess model fit (p-value distribution close to Uniform if model is good)

  14. Mixed Predictive Checks Mixed prediction is less conservative than posterior prediction μ,τ g σg σgpred post. pred. Sg mixed pred. Sg ybarg Sg

  15. Bayesian predictive p-values

  16. Outline • Introduction to microarrays and differential expression • Bayesian hierarchical model for differential expression • Decision rules • Predictive model checks • Gene Ontology analysis for differentially expressed genes • Further work

  17. Gene Ontology: network of terms Links connect more general to more specific terms Directed Acyclic Graph ~16,000 terms Picture from Gene Ontology website

  18. Annotations of genes to a node Each term may have 1000s of genes annotated (or none) Gene may be annotated to several GO terms Gene annotated to term A  annotated to all ancestors of A Picture from Gene Ontology website

  19. GO annotations of genes associated with the insulin-resistance gene Cd36 Compare GO annotations of genes most and least differentially expressed Most differentially expressed ↔ pg > 0.5 (280 genes) Least differentially expressed ↔ pg < 0.2 (11171 genes)

  20. Inflammatory response recently found to be important in insulin resistance GO annotations of genes associated with the insulin-resistance gene Cd36 Use Fisher’s test to compare GO annotations of genes most and least differentially expressed (one test for each GO term) None significant with simple multiple testing adjustment, but there are many dependencies

  21. Summary of work in Biometrics paper • Bayesian hierarchical model flexible, estimates variances robustly • Predictive model checks show exchangeable prior good for gene variances • Useful to find GO terms over-represented in the most differentially-expressed genes

  22. Outline • Introduction to microarrays and differential expression • Bayesian hierarchical model for differential expression • Decision rules • Predictive model checks • Gene Ontology analysis for differentially expressed genes • Further work

  23. BGmix: mixture model for differential expression • Group genes into 3 classes: • non-DE • over-expressed • under-expressed • Estimation and classification is simultaneous Change the prior on the differential expression parameters δg

  24. BGmix: mixture model for differential expression Choice of Null Distribution • True log fold changes = 0 • ‘Nugget’ null: true log fold changes = small but not necessarily zero Choice of DE genes distributions • Gammas • Uniforms • Normal

  25. BGmix: mixture model for differential expression Outputs • Point estimates (and s.d.) of log fold changes (stabilised and smoothed) • Posterior probability for gene to be in each group • Estimate of proportion of differentially expressed genes based on grouping (parameter of model)

  26. BGmix: mixture model for differential expression Obtaining gene lists • Threshold on posterior probabilities (Posterior probability of classification in the null < threshold → gene is DE) • Estimate of False Discovery Rate for any gene list (estimate = average of posterior probabilities) • Very simple estimate! • Choice of decision rule: • Bayes Rule • Fix False Discovery Rate • More complex rules for mixture of 3 components

  27. Predictive Checks for Mixture Model w • Model checks for differential expression parameters δg • More complex for mixture model • Important point: we check each mixture component separately η zg μ,τ σg g gpred σgpred mixed pred. ybarg mixed pred. Sg Sg ybarg

  28. Bayesian p-values for Mixture Model Simulated data from incorrect model Simulated data from correct model

  29. Acknowledgements Co-authors Sylvia Richardson, Clare Marshall (IC Epidemiology) Tim Aitman, Anne-Marie Glazier (IC Microarray Centre) Collaborators on BGX Grant Anne-Mette Hein, Natalia Bochkina (IC Epidemiology) Helen Causton (IC Microarray Centre) Peter Green (Bristol) BBSRC Exploiting Genomics Grant

  30. Papers and Software Software: Winbugs code for model in Biometrics paper BGmix (R package) includes mixture model Papers: BGmix paper, submitted Paper on predictive checks for mixure prior, in preparation http://www.bgx.org.uk/

More Related