1 / 59

Model Inference and Averaging

Model Inference and Averaging. Shanghai Jiao Tong University. Contents. The Bootstrap and Maximum Likelihood Methods Bayesian Methods Relationship Between the Bootstrap and Bayesian Inference The EM Algorithm MCMC for Sampling from the Posterior Bagging Model Averaging and Stacking.

jbang
Download Presentation

Model Inference and Averaging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Model Inference and Averaging Shanghai Jiao Tong University

  2. Contents • The Bootstrap and Maximum Likelihood Methods • Bayesian Methods • Relationship Between the Bootstrap and Bayesian Inference • The EM Algorithm • MCMC for Sampling from the Posterior • Bagging • Model Averaging and Stacking Model Inference and Averaging

  3. Bootstrap by Basis Expansions • Consider a linear expansion • The least square error solution • The Covariance of \beta

  4. Model Inference and Averaging

  5. Parametric Model • Assume a parameterized probability density (parametric model) for observations Model Inference and Averaging

  6. Maximum Likelihood Inference • Suppose we are trying to measure the true value of some quantity (xT). • We make repeated measurements of this quantity {x1, x2, … xn}. • The standard way to estimate xT from our measurements is to calculate the mean value: and set xT = μx. Model Inference and Averaging

  7. Maximum Likelihood Inference • Suppose we are trying to measure the true value of some quantity (xT). • We make repeated measurements of this quantity {x1, x2, … xn}. • The standard way to estimate xT from our measurements is to calculate the mean value: and set xT = μx. DOES THIS PROCEDURE MAKE SENSE? The maximum likelihood method (MLM) answers this question and provides a general method for estimating parameters of interest from data. Model Inference and Averaging

  8. The Maximum Likelihood Method • Statement of the Maximum Likelihood Method • Assume we have made N measurements of x{x1, x2, …, xn}. • Assume we know the probability distribution function that describes x: f(x, a). • Assume we want to determine the parameter a. • MLM: pick a to maximize the probability of getting the measurements (the xi's) we did!

  9. The MLM Implementation • The probability of measuring x1 is • The probability of measuring x2 is • The probability of measuring xn is • If the measurements are independent, the probability of getting the measurements we did is:

  10. Log Maximum Likelihood Method • Maximizes • Often easier to maximize • and are both maximum at the same location. • converts the product into a summation.

  11. Log Maximum Likelihood Method • The new maximization condition is: • could be an array of parameters (e.g. slope and intercept) or just a single variable. • Resultant equations: simple linear equations or coupled non-linear equations. Model Inference and Averaging

  12. An Example: Gaussian • Let f(x, ) be given by a Gaussian distribution function. • Let =μ be the mean of the Gaussian. Use observed data+MLM to find the mean. • To find the best estimate of from our set of n measurements {x1, x2, …, xn}. • Let’s assume that s is the same for each measurement. Model Inference and Averaging

  13. An Example: Gaussian • Gaussian PDF • The likelihood function for this problem is: Model Inference and Averaging

  14. An Example: Gaussian • We want to find the a that maximizes the log likelihood function: Model Inference and Averaging

  15. An Example: Gaussian • If are different for each data point then is just the weighted average: Weighted Average Model Inference and Averaging

  16. An Example: Poisson • Let f(x, ) be given by a Poisson distribution. • Let = μ be the mean of the Poisson. • We want the best estimate of a from our set of n measurements {x1, x2, … xn}. • Poisson PDF: Model Inference and Averaging

  17. An Example: Poisson • The likelihood function for this problem is: Model Inference and Averaging

  18. An Example: Poisson • Find a that maximizes the log likelihood function: Average Model Inference and Averaging

  19. General properties of MLM • For large data samples (large n) the likelihood function, L, approaches a Gaussian distribution. • Maximum likelihood estimates are usually consistent. • For large n the estimates converge to the true value of the parameters we wish to determine. • Maximum likelihood estimates are usually unbiased. • For all sample sizes the parameter of interest is calculated correctly. Model Inference and Averaging

  20. General properties of MLM • Maximum likelihood estimate is efficient: the estimate has the smallest variance. • Maximum likelihood estimate is sufficient: it uses all the information in the observations (the xi’s). • The solution from MLM is unique. • Bad news: we must know the correct probability distribution for the problem at hand! Model Inference and Averaging

  21. Maximum Likelihood • We maximize the likelihood function • Log-likelihood function Model Inference and Averaging

  22. Score Function • Assess the precision of using the likelihood function • Assume that L takes its maximum in the interior parameter space. Then Model Inference and Averaging

  23. Likelihood Function • We maximize the likelihood function • We omit normalization since only adds a constant factor • Log-likelihood function

  24. Fisher Information • Negative sum of second derivatives is the information matrix • is called the observed information, should be greater 0. • Fisher information ( expected information ) is Model Inference and Averaging

  25. Sampling Theory • Basic result of sampling theory • The sampling distribution of the max-likelihood estimator approaches the following normal distribution, as when we sample independently from • This suggests to approximate the distribution with Model Inference and Averaging

  26. Error Bound • The corresponding error estimates are obtained from • The confidence points have the form z(1-α) is the 1-α percentile of the normal distribution

  27. Simplified form of the Fisher information Suppose, in addition, that the operations of integration and differentiation can be swapped for the second derivative of f(x;θ) as well, i.e., In this case, it can be shown that the Fisher information equals The Cramér–Rao bound can then be written as Model Inference and Averaging

  28. An Example • Consider a linear expansion • The log-likelihood • Estimating equations:

  29. An Example • Consider a linear expansion • The least square error solution • The Covariance of Model Inference and Averaging

  30. An Example • The confidence region Model Inference and Averaging

  31. Bayesian Methods • Given a sampling model and a prior for the parameters, estimate the posterior probability • By drawing samples or estimating its mean or parameters • Differences to mere counting ( frequentist approach ) • Prior: allow for uncertainties present before seeing the data • Posterior: allow for uncertainties present after seeing the data

  32. Bayesian Methods • The posterior distribution affords also a predictive distribution of seeing future values • In contrast, the max-likelihood approach would predict future data on the basis of not accounting for the uncertainty in the parameters Model Inference and Averaging

  33. An Example • Consider a linear expansion • The least square error solution Model Inference and Averaging

  34. The posterior distribution for \beta is also Gaussian, with mean and covariance • The corresponding posterior values for \mu(x), Model Inference and Averaging

  35. Model Inference and Averaging

  36. Bootstrap vs Bayesian • The bootstrap mean is an approximate posterior average • Simple example: • Single observation z drawn from a normal distribution • Assume a normal prior for : • Resulting posterior distribution Model Inference and Averaging

  37. Bootstrap vs Bayesian • Three ingredients make this work • The choice of a noninformative prior for • The dependence of on Z only through the max-likelihood estimate Thus • The symmetry of Model Inference and Averaging

  38. Bootstrap vs Bayesian • The bootstrap distribution represents an (approximate) nonparametric, noninformative posterior distribution for our parameter. • But this bootstrap distribution is obtained painlessly without having to formally specify a prior and without having to sample from the posterior distribution. • Hence we might think of the bootstrap distribution as a “poor man's” Bayes posterior. By perturbing the data, the bootstrap approximates the Bayesian effect of perturbing the parameters, and is typically much simpler to carry out. Model Inference and Averaging

  39. Contents • The Bootstrap and Maximum Likelihood Methods • Bayesian Methods • Relationship Between the Bootstrap and Bayesian Inference • The EM Algorithm • MCMC for Sampling from the Posterior • Bagging • Model Averaging and Stacking Model Inference and Averaging

  40. The EM Algorithm • Gaussian Mixture Model • EM algorithm for 2 Gaussian mixtures • Given log-likelihood: • Suppose we observe Latent Binary Bad Good

  41. The EM Algorithm • The EM algorithm for two-component Gaussian mixtures • Take initial guesses for the parameters • Expectation Step: Compute the responsibilities Model Inference and Averaging

  42. The EM Algorithm • Maximization Step: Compute the weighted means and variances • Iterate 2 and 3 until convergence Model Inference and Averaging

  43. The EM Algorithm in General input data, with log-likelihood latent data (in our example ∆i) complete data with log-likelihood Model Inference and Averaging

  44. The EM Algorithm in General Model Inference and Averaging

  45. The EM Algorithm in General • Taking conditional expectations with respect to the distribution of T|Z governed by parameter θ gives Model Inference and Averaging

  46. The EM Algorithm in General In the M step, the EM algorithm maximizes Q(θ′, θ) over θ′, rather than the actual objective function . Note that R(θ∗, θ) is the expectation of a log-likelihood of a density (indexed by θ∗), with respect to the same density indexed by θ, and hence (by Jensen’s inequality) is maximized as a function of θ∗, when θ∗ = θ

  47. The EM Algorithm in General • Start with initial params • Expectation Step: at the j-th step compute as a function of the dummy argument • Maximization Step: Determine the new params by maximizing • Iterate 2 and 3 until convergence Model Inference and Averaging

  48. Contents • The Bootstrap and Maximum Likelihood Methods • Bayesian Methods • Relationship Between the Bootstrap and Bayesian Inference • The EM Algorithm • MCMC for Sampling from the Posterior • Bagging • Model Averaging and Stacking Model Inference and Averaging

  49. Model Inference and Averaging

  50. Local semantics of Bayesian network e.g., JohnCalls is independent of Burglary and Earthquake, given the value of Alarm.

More Related