1 / 49

Bayes Factors

Bayes Factors. Greg Francis. PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University. Hypothesis testing. Suppose the null is true and check to see if a rare event has occurred

ahemphill
Download Presentation

Bayes Factors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bayes Factors Greg Francis PSY 626: Bayesian Statistics for Psychological Science Fall 2018 Purdue University

  2. Hypothesis testing • Suppose the null is true and check to see if a rare event has occurred • e.g., does our random sample produce a t value that is in the tails of the null sampling distribution? • If a rare event occurred, reject the null hypothesis

  3. Hypothesis testing • But what is the alternative? • Typically: “anything goes” • But that seems kind of unreasonable • Maybe the “rare event” would be even less common if the null were not true!

  4. Bayes Theorem • Conditional probabilities

  5. Ratio • Ratio of posteriors conveniently cancels out P(D) Posterior odds Bayes Factor Prior odds

  6. Bayesian Model Selection • It’s not really about hypotheses, but hypotheses suggest models • The Bayes Factor is often presented as BF12 • You could also compute BF21 Posterior odds Bayes Factor Prior odds

  7. Bayes Factor Evidence for the alternative hypothesis (or the null) is computed with the Bayes Factor (BF) • BF>1 indicates that the data is evidence for the alternative, compared to the null • BF<1 indicates that the data is evidence for the null, compared to the alternative

  8. Bayes Factor When BF10 = 2, the data are twice as likely under H1 as under H0. When BF01 = 2, the data are twice as likely under H0 as under H1. These interpretations do not require you to believe that one model is better than the other You can still have priors that favor one model, regardless of the Bayes Factor You would want to make important decisions based on the posterior Still, if you consider both models to be plausible, then the priors should not be so different from each other

  9. Rules of thumb Evidence for the alternative hypothesis (or the null) is computed with the Bayes Factor (BF)

  10. Similar to AIC • For a two-sample t-test, the null hypothesis (reduced model) is that a score from group s (1 or 2) is defined as • With the same mean for each group s X12 X11 X22 X21

  11. AIC • For a two-sample t-test, the alternative hypothesis (full model) is that a score from group s (1 or 2) is defined as • With different means for each group s X12 X11 X22 X21

  12. AIC • AIC and its variants are a way of comparing model structures • One mean or two means? • Always uses maximum likelihood estimates of the parameters • Bayesian approaches identify a posterior distribution of parameter values • We should use that information!

  13. Models of what? • We have been building models of trial-level scores # Model without intercept (more natural) model2 = brm(Leniency ~ 0+SmileType, data = SLdata, iter = 2000, warmup = 200, chains = 3) print(summary(model2)) GrandSE = 10 stanvars <-stanvar(GrandMean, name='GrandMean') + stanvar(GrandSE, name='GrandSE') prs <- c(prior(normal(GrandMean, GrandSE), class = "b") ) model6 = brm(CorrectResponses ~ 0 + Dosage + (1 |SubjectID), data = ATdata, iter = 2000, warmup = 200, chains = 3, thin = 2, prior = prs, stanvars=stanvars ) print(summary(model6))

  14. Models of what? • We have been building models of trial-level scores • That is not the only option • In traditional hypothesis testing, we care more about effect sizes than about individual scores • Signal-to-noise ratio • Of course, the effect size is derived from the individual scores • In many cases, it is enough to just model the effect size itself rather than the individual scores • Cohen’s d • t-statistic • p-value • Correlation r • “Sufficient” statistic

  15. Models of means • It’s not really going to be practical, but let’s consider a case where we assume that the population variance is known (and equals 1) and we want to compare null and alternative hypotheses of fixed values

  16. Models of means • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample)

  17. Models of means • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative

  18. Models of means • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null

  19. Models of means • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample) Suppose we observe Data are more likely under alternative than under null

  20. Bayes Factor • The ratio of likelihood for the data under the null compared to the alternative • Or the other way around Suppose we observe Data are more likely under alternative than under null

  21. Decision depends on alternative • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample) Suppose we observe Data are more likely under null than under alternative

  22. Decision depends on alternative • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample)

  23. Decision depends on alternative • The likelihood of any given observed mean value is derived from the sampling distribution • Suppose n=100 (one sample)

  24. Decision depends on alternative For a fixed sample mean, evidence for the alternative only happens for alternative population mean values of a given range For big alternative values, the observed sample mean is less likely than for a null population value • The sample mean may beunlikely for both models • Rouder et al. (2009) Evidence for null Evidence for alternative Mean of alternative

  25. Models of means • Typically, we do not hypothesize a specific value for the alternative, but a range of plausible values

  26. Likelihoods • For the null, we compute likelihood in the same way • Suppose n=100 (one sample)

  27. Likelihoods • For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values • Suppose n=100 (one sample)

  28. Likelihoods • For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values • Suppose n=100 (one sample)

  29. Likelihoods • For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values • Suppose n=100 (one sample)

  30. Likelihoods • For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values • Suppose n=100 (one sample)

  31. Average Likelihood • For the alternative, we have to consider each possible value of mu, compute the likelihood of the sample mean for that value, and then average across all possible values • Suppose n=100 (one sample) Prior for value of mu Likelihood for given value of mu (from sampling distribution)

  32. Bayes Factor • Ratio of the likelihood for the null compared to the (average) likelihood for the alternative P(D | H1)

  33. Uncertainty • The prior standard deviation for mu establishes a range of plausible values for mu More flexible Less flexible

  34. Uncertainty 0.15 0.15 • With a very narrow prior, you may not fit the data More flexible Less flexible

  35. Uncertainty 0.15 0.15 • With a very broad prior, you will fit well for some values of mu and poorly for other values of mu More flexible Less flexible

  36. Uncertainty 0.15 0.15 • Uncertainty in the prior functions similar to the penalty for parameters in AIC More flexible Less flexible

  37. Penalty • Averaging acts like a penalty for extra parameters • Rouder et al. (2009) Evidence for null Evidence for alternative Width of alternative prior

  38. Models of effect size • Consider the case of two-sample t-test • We often care about the standardized effect size • Which we can estimate from data as:

  39. Models of effect size • If we were doing traditional hypothesis testing, we would compare a null model: • Against an alternative: • Equivalent statements can be made using the standardized effect size • As long as the standard deviation is not zero

  40. Priors on effect size • For the null, the prior is (again) a spike at zero

  41. JZS Priors on effect size • For the alternative, a good choice is a Cauchy distribution (t-distribution with df=1) Rouder et al. (2009) Jeffreys, Zellner, Siow

  42. JZS Priors on effect size • It is a good choice because the integration for the alternative hypothesis can be done numerically • t is the t-value you use in a hypothesis test (from the data) • v is the “degrees of freedom” (from the data) • This might not look easy, but it is simple to calculate with a computer

  43. Variations of JZS Priors • Scale parameter “r” • Bigger values make for a broader prior • More flexibility! • More penalty!

  44. Variations of JZS Priors • Medium r= 1 • Wide r= sqrt(2)/2 • Ultrawide r=sqrt(2)

  45. How do we use it? • Super easy • Rouder’s web site: • http://pcl.missouri.edu/bayesfactor • In R • library(BayesFactor)

  46. How do we use it?

  47. How do we use it? • library(BayesFactor) • ttest.tstat(t=2.2, n1=15, n2=15, simple=TRUE) • B10 • 1.993006

  48. What does it mean? • Guidelines BFEvidence 1 – 3 Anecdotal 3 – 10 Substantial 10 – 30 Strong 30 – 100 Very strong >100 Decisive

  49. Conclusions • JZS Bayes Factors • Easy to calculate • Pretty easy to understand results • A bit arbitrary for setting up • Why not other priors? • How to pick scale factor? • Criteria for interpretation are arbitrary • Fairly painless introduction to Bayesian methods

More Related