1 / 20

Computing the Marginal Likelihood

Computing the Marginal Likelihood. David Madigan. Bayesian Criterion. Typically impossible to compute analytically All sorts of Monte Carlo approximations. Laplace Method for p ( D | M ). (i.e., the log of the integrand divided by n ). Laplace’s Method:. Laplace cont.

verlee
Download Presentation

Computing the Marginal Likelihood

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing the Marginal Likelihood David Madigan

  2. Bayesian Criterion • Typically impossible to compute analytically • All sorts of Monte Carlo approximations

  3. Laplace Method for p(D|M) (i.e., the log of the integrand divided by n) Laplace’s Method:

  4. Laplace cont. • Tierney & Kadane (1986, JASA) show the approximation is O(n-1) • Using the MLE instead of the posterior mode is also O(n-1) • Using the expected information matrix in  is O(n-1/2) but convenient since often computed by standard software • Raftery (1993) suggested approximating by a single Newton step starting at the MLE • Note the prior is explicit in these approximations

  5. Monte Carlo Estimates of p(D|M) Draw iid 1,…, m from p(): In practice has large variance

  6. Monte Carlo Estimates of p(D|M) (cont.) Draw iid 1,…, m from p(|D): “Importance Sampling”

  7. Monte Carlo Estimates of p(D|M) (cont.) Newton and Raftery’s “Harmonic Mean Estimator” Unstable in practice and needs modification

  8. p(D|M) from Gibbs Sampler Output First note the following identity (for any * ): p(D|*) and p(*) are usually easy to evaluate. What about p(*|D)? Suppose we decompose  into (1,2) such that p(1|D,2) and p(2|D,1) are available in closed-form… Chib (1995)

  9. p(D|M) from Gibbs Sampler Output The Gibbs sampler gives (dependent) draws from p(1,2 |D) and hence marginally from p(2 |D)… “Rao-Blackwellization”

  10. What about 3 parameter blocks… OK ? OK To get these draws, continue the Gibbs sampler sampling in turn from: and

  11. p(D|M) from Metropolis Output Chib and Jeliazkov, JASA, 2001

  12. p(D|M) from Metropolis Output E1 with respect to [|y] E2 with respect to q(’, )

  13. A Critique of the Bayesian Information Criterion for Model Selection.; By: WEAKLIEM, DAVID L.., Sociological Methods & Research, Feb99, Vol. 27 Issue 3, p359, 39p Bayesian Information Criterion (SL is the negative log-likelihood) • BIC is an O(1) approximation to p(D|M) • Circumvents explicit prior • Approximation is O(n-1/2) for a class of priors called “unit information priors.” • No free lunch (Weakliem (1998) example)

  14. Deviance Information Criterion • Deviance is a standard measure of model fit: • Can summarize in two ways…at posterior mean or mode: • (1) • or by averaging over the posterior: • (2) • (2) will be bigger (i.e., worse) than (1)

  15. Deviance Information Criterion • is a measure of model complexity. • In the normal linear model pD(1) equals the number of parameters • More generally pD(1) equals the number of unconstrained parameters • DIC = • Approximately equal to

  16. Score Functions on Hold-Out Data • Instead of penalizing complexity, look at performance on hold-out data • Note: even using hold-out data, performance results can be optimistically biased • Pseudo-hold-out Score: Recall:

  17. Checks Based on Individual Observations

  18. Test Data? ith test observation training data “cross-validation” See BUGS Manual

More Related