1 / 25

§❷ An Introduction to Bayesian inference

§❷ An Introduction to Bayesian inference. Robert J. Tempelman. Bayes Theore m. Recall basic axiom of probability: f ( q , y ) = f ( y | q ) f ( q ) Also f ( q , y ) = f ( q | y ) f ( y ) Combine both expressions to get: or Posterior  Likelihood * Prior.

marina
Download Presentation

§❷ An Introduction to Bayesian inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. §❷ An Introduction to Bayesian inference Robert J. Tempelman

  2. Bayes Theorem • Recall basic axiom of probability: • f(q,y) = f(y|q) f(q) • Also • f(q,y) = f(q|y) f(y) • Combine both expressions to get: or Posterior  Likelihood * Prior

  3. Prior densities/distributions • What can we specify for ? • Anything that reflects our prior beliefs. • Common choice: “conjugate” prior. • is chosen such that is recognizeable and of same form. • “Flat” prior: . Then • flat priors can be dangerous…can lead to improper ; i.e.

  4. Prior information / Objective? • Introducing prior information may somewhat "bias" sample information; nevertheless, ignoring existing prior information is inconsistent with • 1) human rational behavior • 2) nature of the scientific method. • Memory property: past inference (posterior) can be used as updated prior in future inference. • Nevertheless, many applied Bayesian data analysts try to be as “objective” as possible using diffuse (e.g., flat) priors.

  5. Example of conjugate prior • Recall the binomial distribution: • Suppose we express prior belief on p using a beta distribution: • Denoted as Beta(a,b)

  6. Examples of different beta densities Diffuse (flat) bounded prior (but it is proper since it is bounded!)

  7. Posterior density of p • Posterior  Likelihood * Prior • i.e. Beta(y+a,n-y+b) • Beta is conjugate to the Binomial

  8. Suppose we observe data Posterior densities: • y = 10, n = 15. • Consider three alternative priors: • Beta(1,1) • Beta(9,1) • Beta(2,18) • Beta(y+a,n-y+b)

  9. Suppose we observed a larger dataset • y = 100, n = 150. • Consider same alternative priors: • Beta(1,1) • Beta(9,1) • Beta(2,18) Posterior densities

  10. Posterior information • Given: • Posterior information = likelihood information + prior information. • One option for point estimate: joint posterior mode of q using Newton Raphson. • Also called MAP (maximum a posteriori) estimate of q.

  11. Recall the plant genetic linkage example • Recall Suppose • Then Almost as if you increased the number of plants in genotypes 2 and 3 by b-1…in genotype 4 by a-1.

  12. Plant linkage example cont’d. Suppose datanewton; y1 = 1997; y2 = 906; y3 = 904; y4 = 32; alpha = 50; beta=500; theta = 0.01; /* try starting value of 0.50 too */ do iterate = 1 to 10; logpost = y1*log(2+theta) + (y2+y3+beta-1)*log(1-theta) + (y4+alpha-1)*log(theta); firstder = y1/(2+theta) - (y2+y3+beta-1)/(1-theta) + (y4+alpha-1)/theta; secndder = (-y1/(2+theta)**2 - (y2+y3+beta-1)/(1-theta)**2 - (y4+alpha-1)/theta**2); theta = theta + firstder/(-secndder); output; end; asyvar = 1/(-secndder); /* asymptotic variance of theta_hat at convergence */ poststd = sqrt(asyvar); call symputx("poststd",poststd); output; run; title "Posterior Standard Error = &poststd"; procprint; var iterate theta logpost; run; Posterior standard error

  13. Output Posterior Standard Error = 0.0057929339

  14. Additional elements of Bayesian inference • Suppose that q can be partitioned into two components, a px1 vector q 1and a qx1 vector q2, • If want to make probability statements about q, use probability calculus: • There is NO repeated sampling concept. • Condition on one observed dataset. • However, Bayes estimators typically do have very good frequentist properties!

  15. Marginal vs. conditional inference • Suppose you’re primarily interested in q1: • i.e. average over uncertainty on q2 (nuisance variables) • Of course, if q2 was known, you would conditionyour inference on q1 accordingly:

  16. Two-stage model example • Given with yi~ NIID (m, s2) where s2 is known. Wish to infer m. From Bayes theorem: Suppose i.e.

  17. Simplify likelihood

  18. Posterior density • Consider the following limit: • Consistent with or

  19. Interpretation of Posterior Density with Flat Prior • So • Then • i.e.

  20. Posterior density with informative prior • Now After algebraic simplication:

  21. Note that Posterior precision = prior precision + sample (likelihood) precision i.e., weighted average of data mean and prior mean

  22. Hierarchical models • Given • Two stage: • Three stage: • What’s the difference? When do you consider one over another?

  23. Simple hierarchical model • Random effects model • Yij = m + ai + eij m: overall mean, ai ~ NIID(0,t2) ; eij~ NIID(0,s2). Suppose we knew m , s2, and t2: Shrinkage factor

  24. What if we don’t know m , s2, or t2? • Option 1: Estimate them: • Then “plug them” in. • Not truly Bayesian. • Empirical Bayesian (EB) (next section). • Most of us using PROC MIXED/GLIMMIX are EB! e.g.method of moments

  25. A truly Bayesian approach • 1) Yij|qi ~ N(qi,s2) ; for all i,j • 2)q1, q2, …, qkare iidN(m, t2) • Structural prior (exchangeable entities) • 3) m ~ p(m); t2~ p(t2); s2~ p(s2) • Subjective prior Fully Bayesian inference (next section after that!)

More Related