Lecture 2. Bayesian Statistics and Inference. Lecture Contents. What is Bayesian inference Prior distributions Examples of conjugate Bayesian analysis Credible intervals Bayes factors Bayesian linear regression. Bayes Theorem. Bayesian statistics named after Rev. Thomas Bayes (1702-1761)
Bayesian Statistics and Inference
Let A be the event patient is truly positive, A’ be the event that they are truly negative
Let B be the event that they test positive
So from Bayes Theorem
Thus over 95% of those testing positive will, in fact, not have HIV.
In Bayesian inference there is a fundamental distinction between
θ can be statistical parameters, missing data, latent variables…
In the Bayesian framework we make probability statements about model parameters
In the frequentist framework, parameters are fixed non-random quantities and the probability statements concern the data.
As with all statistical analyses we start by positing a model which specifies p(x| θ)
This is the likelihood which relates all variables into a ‘full probability model’
However from a Bayesian point of view :
Note this is like the prevalence in the example
Also x is known so should be conditioned on and here we use Bayes theorem to obtain the conditional distribution for unobserved quantities given the data which is known as the posterior distribution.
The prior distribution expresses our uncertainty about before seeing the data.
The posterior distribution expresses our uncertainty about after seeing the data.
Known variance, unknown mean
It is easier to consider first a model with 1 unknown parameter. Suppose we have a sample of Normal data:
Let us assume we know the variance, 2 and we assume a prior distribution for the mean, based on our prior beliefs:
Now we wish to construct the posterior distribution p(|x).
So we have
For a Normal distribution with response y with mean and variance we have
We can equate this to our posterior as follows:
In other words the posterior precision = sum of prior precision and data precision, and the posterior mean is a (precision weighted) average of the prior mean and data mean.
So posterior variance
And so posterior distribution
Compared to in the frequentist setting
We will assume the variance is known to be 50.
Two individuals gave the following prior distributions for the mean height
Note this prior is not as close to the data as prior 1 and hence posterior is somewhere between prior and likelihood.
‘A Bayesian is one who, vaguely expecting a horse and catching a glimpse of a donkey, strongly concludes he has seen a mule’ (Senn)
and a 95% credible interval for μ is 165.23±1.96×sqrt(2.222) = (162.31,168.15).
Similarly prior 2 results in a 95% credible interval for μ is (163.61,170.63).
Note that credible intervals can be interpreted in the more natural way that there is a probability of 0.95 that the interval contains μ rather than the frequentist conclusion that 95% of such intervals contain μ.
Another big issue in statistical modelling is the ability to test hypotheses and model comparisons in general.
The Bayesian approach is in some ways more straightforward. For an unknown parameter θ
we simply calculate the posterior probabilities
and decide between H0and H1accordingly.
We also require the prior probabilitiesto achieve this
Note that when hypotheses are simple B is the likelihood ratio of H0against H1i.e. the odds in favour of H0against H1 that are given by the data however in complex hypotheses B also involves the prior distributions.
Let us assume that H0is μ >165 and hence H1is μ ≤165. Now we have π0= π1=0.5 under the N(165,4) prior
The posterior is N(165.23,2.222) which results in p0 =0.561 p1=0.439 and results in a Bayes factor of 0.561/0.439=1.278 here the Bayes factor is close to 1 and so the data has not much altered our beliefs about the hypothesis under discussion.
Now under the N(170,9) prior we have π0=0.952 and π1=0.048 so strong a priori evidence for H0against H1
The posterior is N(167.12,3.214) which results in p0 =0.881, p1=0.119 and results in a Bayes factor of (0.881×0.048)/(0.952×0.119) = 0.373 so in the case the Bayes factor is smaller than 1 as the data gives less evidence for H0against H1 than the prior distribution.
It should be noted that care needs to be taken when using Bayes factors and non-informative priors.
We have so far restricted ourselves to an example with only 1 unknown parameter which is generally unrealistic.
For example it would be more common to consider a Normal distribution with both mean and variance unknown.
In such a situation interest may focus on the marginal posterior distribution of the mean treating the variance as a nuisance parameter.
The marginal distribution is created by integrating the joint posterior distribution over the nuisance parameters
This integration is one of the reasons why Bayesian statistics has been of less practical use in the past. This means that for even reasonably simple models Bayesian inference becomes involved.
However the revolution in computer speed and memory size has meant that integrations can be easily approximated by simulation methods as we will describe in the next session.
We will now briefly describe a Bayesian linear regression model before going on to a lab that allows you to try simulation approaches to solve the simple models in these lectures.
In our whistle-stop tour of Bayesian statistics we have here skipped over many standard multiple parameter models. We will focus on linear regression here for comparison with the frequentist methods.
I will give brief details as it is less important to know how to calculate posterior distributions analytically when we will generally use simulation-based methods later.
Although the intention is not to scare you, the derivations are rather complex.
Now we need priors for the 3 unknown parameters, which we will consider in more detail in the practical.
For now we will use a convenient non-informative prior based on a uniform distribution on
This results in
The posterior can be expressed as follows:
We then get
To sample from the posterior distribution given, we firstly calculate the values of
Note that in the practical we will return to the heights example and regress the girls heights on their weights while trying various informative priors.
In this first practical you will use an MCMC estimation package called WinBUGS to fit the models discussed in the lecture.
This practical is meant to confirm the answers from the lecture notes and also to familiarize you a little with WinBUGS.
We will give more details on WinBUGS in later lectures.