1 / 29

Priors, Normal Models , Computing Posteriors

Priors, Normal Models , Computing Posteriors. st5219 : Bayesian hierarchical modelling lecture 2.3. Why posteriors need to be computed. If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form

lana
Download Presentation

Priors, Normal Models , Computing Posteriors

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Priors, Normal Models,Computing Posteriors st5219: Bayesian hierarchical modelling lecture 2.3

  2. Why posteriors need to be computed • If you can find a conjugate prior for your data-parameter model, it’s very sensible to use that and to coerce your prior into that form • Eg you were going to have a normal prior but then noticed the gamma is conjugate in your model, then convert your normal prior into a gamma and use that • If not, you need to do some computation • In the past: this was a big problem that precluded Bayesian analysis outside of the ivory tower • Now: personal computers make this no problem lah

  3. What’s the challenge? Frequentist Bayesian • For ML estimation: need to find maximum of a surface or curve • Approaches • calculus • deterministic search algorithms using (i) gradient or (ii) heights only • stochastic algorithms, eg simulated annealing, cross entropy • For Bayesian estimation: need to find the whole posterior Quadrature is quite difficult Optimisation is fairly simple

  4. The curse of dimensionality Toy problems Real problems • Illustrate concepts • Simple data, few complications • Often just use calculus • Usually few parameters • :o) • Solving real problems • Complex data, always loads of complications • Cannot do differentiation • Usually lots of parameters • :o( Lots of parameters prohibit grid searches

  5. Simulations • Assume you’re interested in the lengths of elephant tusks • There will be some distribution of these • How do you try to estimate that distribution?

  6. Why not?

  7. Simulations • Easy to sample statistical distributions(more than elephants) • Can build up large samples quickly • Most popular method is MCMC • Others are • Monte Carlo • Importance sampling If your sample is of size 10 000 or 100 000 then it IS the distribution you’re sampling from Sample the posterior Calculate stuff you’re interested inlike the posterior mean (mean of sample), quantiles etc

  8. Monte Carlo • Capital of Monaco • Casino • Roulette wheel gaveinspiration to USscientists working onthe Manhattan project Read Metropolis (1987)http://library.lanl.gov/cgi-bin/getfile?00326866.pdf

  9. When to use Monte Carlo • If your posterior distribution is known, and you can simulate from it, but it’s awkward to do anything else • Example: you have data (between sex difference in prey items per h within songlark nests) you assume are a random draw from a N(μ,σ²) distribution • xˉ=3.18 • s=4.42 • n=22 • Prior: (μ,σ²)~NSIG(0,1/100,1/100,1/100)

  10. Example Monte Carlo code • library(geoR) ; set.seed(666) • mu_0=0;kappa_0=1;nu_0=1;sigma2_0=100 • ndraws=100000 • prior_sigma2=rinvchisq(ndraws,nu_0,sigma2_0) • prior_mu=rnorm(ndraws,mu_0,prior_sigma2/kappa_0) • xbar=3.18 • s=4.42 • n=22 • mu_n=(kappa_0/(kappa_0+n))*mu_0 + (n/(kappa_0+n))*xbar • kappa_n= kappa_0+n • nu_n= nu_0 + n • sigma2_n= (nu_0*sigma2_0 + (n-1)*s*s + kappa_0*n*((xbar-mu_0)^2)/(kappa_0+n))/nu_n • posterior_sigma2=rinvchisq(ndraws,nu_n,sigma2_n) • posterior_mu=rnorm(ndraws,mu_n,posterior_sigma2/kappa_n) • plot(posterior_mu[q],sqrt(posterior_sigma2[q]),pch=16,col=rgb(0,0,0,0.1), xlab=expression(mu),ylab=expression(sigma))

  11. Plot of posterior

  12. Marginal posteriors • summariser=function(x){print(paste( "Posterior mean is ",signif(mean(x),3), ", 95%CI is (",signif(quantile(x,0.025),3), ",",signif(quantile(x,0.975),3),")“,sep=""),quote=FALSE)} • summariser(posterior_mu)[1] Posterior mean is 3.04, 95%CI is (0.776,5.3) • summariser(posterior_sigma2)[1] Posterior mean is 24.7, 95%CI is (13.6,44.5) • summariser(sqrt(posterior_sigma2))[1] Posterior mean is 4.91, 95%CI is (3.69,6.67) • summariser(sqrt(posterior_sigma2)/posterior_mu)[1] Posterior mean is 1.37, 95%CI is (0.913,5.79)

  13. Wondrousness of Bayesianism • Once you have a sample from the posterior: the world is your oyster! • Want a confidence interval for the variance? • Want a confidence interval for the standard deviation? • Want a confidence interval for the coefficient of variation? • How to do this in the frequentist paradigm?See Koopmans et al (1964) Biometrika 51:25--32It’s not nice

  14. Problem with Monte Carlo • Usually an exact form for the posterior cannot be got • Two alternative simulation-based approaches present themselves: • Importance sampling • Markov chain Monte Carlo

  15. Importance Sampling • The idea of importance sampling is that instead of sampling from the actual posterior, you simulate from some other distribution and “correct” the sample by weighting it • If • You want f(|data) • You can simulate and evaluate the density of q() • You can evaluate the density of f( |data) up to a constant of proportionality • you can use importance sampling.

  16. Importance sampling recipe • Generate a sample of theta from q() • For each sample i, associate a weightwi f(i|data)/q(i) • Scale the ws such that wi =1 • The posterior mean of any function g() of interest isE[g()] = wig(i) • You can get things like confidence intervals by working with the CDF of the marginals

  17. Example: H1N1 • Naïve version: Simulate (p,σ) from U(0,1) ², ie q(p,σ)=1 over [0,1]² • Weight by posterior, scale to sum to one

  18. Code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=runif(NDRAWS),sigma=runif(NDRAWS)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors) • weights=weights/sum(weights) • plot(sampled$p,sampled$sigma,cex=20*weights,pch=16,xlab='p',ylab=expression(sigma),xlim=0:1,ylim=0:1)

  19. Naïve sample

  20. Naïve sample

  21. Better code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=runif(NDRAWS),sigma=rbeta(NDRAWS,630,136)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)) • weights=weights/sum(weights)

  22. Better sample

  23. Better sample

  24. Even better code • set.seed(666) • logpost=function(p,sigma){dbinom(98,727,p*sigma,log=TRUE) +dbeta(p,1,1,log=TRUE) +dbeta(sigma,630,136,log=TRUE)} • NDRAWS=10000 • sampled=list(p=rbeta(NDRAWS,20,100),sigma=rbeta(NDRAWS,630,136)) • logposteriors=logpost(sampled$p,sampled$sigma) • logposteriors=logposteriors-max(logposteriors) • weights=exp(logposteriors-dbeta(sampled$sigma,630,136,log=TRUE)-dbeta(sampled$p,20,100,log=TRUE)) • weights=weights/sum(weights)

  25. Better sample

  26. Better sample

  27. Problems with importance sampling • Choice of sampling distribution q: an obvious choice, the prior, leads to a simple weight function (the likelihood) but if the prior is over dispersed relative to the posterior, lots of wasted samples • Alternatives would be to do it iteratively, trying the prior first, then tuning q to be nearer to the area that looks like the posterior and iterating until a good distribution is found • Ideally, sample directly from the posterior: weights 1/m • Dealing with a weighted sample is a bit of a nuisance

  28. An alternative • Finding the posterior is simpler if the computer will automatically move in the right direction even given a rubbish guess at q • Markov chain Monte Carlo is an ideal way to achieve this • Next section!!!

More Related