Binomial Distribution & Bayes’ Theorem. Questions. What is a probability? What is the probability of obtaining 2 heads in 4 coin tosses? What is the probability of obtaining 2 or more heads in 4 coin tosses? Give an concrete illustration of p(D|H) and p(H|D). Why might these be different?.
Bayesian statistics are about the revision of belief. Bayesian statisticians look into statistically optimal ways of combining new information with old beliefs.
Prior probability – personal belief or data. Input.
Likelihood – likelihood of data given hypothesis.
Posterior probability – probability of hypothesis given data.
Scientists are interested in substantive hypotheses, e.g., does Nicorette help people stop smoking. The p level that comes from the study is the probability of the sample data given the hypothesis, not the probability of the hypothesis given the data. That is
Bayes theorem is old and mathematically correct. But its use is controversial. Suppose you have a hunch about the null (H0) and the alternative (H1) that specifies the probability of each before you do a study. The probabilities p(H0) and p(H1) are priors. The likelihoods are p(y| H0) and p(y| H1). Standard p values. The posterior is given by:
Suppose before a study is done that the two hypotheses are H0: p =.80 and H1: p=.40 for the proportion of male grad students. Before the study, we figure that the probability is .75 that H0 is true and .25 That H1 is true. We grab 10 grad students at random and find that 6 of 10 are male. Binomial applies.
Bayes theorem says we should revise our belief of the probability that H0 is true from .75 to .70 based on new data. Small change here, but can be quite large depending on data and prior.
Problems with choice of prior. Handled by empirical data or by “flat” priors. There are Bayesian applications to more complicated situations (e.g., means and correlations). Not used much in psychology yet except in meta-analysis (empricial Bayes estimates) and judgment studies (Taxis, etc). Rules for exchangeability (admissible data) need to be worked out.
Give an concrete illustration of p(D|H) and p(H|D). Why might these be different?