End of chapter 8
Download
1 / 29

End of Chapter 8 - PowerPoint PPT Presentation


  • 102 Views
  • Uploaded on

End of Chapter 8. Neil Weisenfeld March 28, 2005. Outline. 8.6 MCMC for Sampling from the Posterior 8.7 Bagging 8.7.1 Examples: Trees with Simulated Data 8.8 Model Averaging and Stacking 8.9 Stochastic Search: Bumping. MCMC for Sampling from the Posterior. Markov chain Monte Carlo method

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'End of Chapter 8' - booth


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
End of chapter 8

End of Chapter 8

Neil Weisenfeld

March 28, 2005


Outline
Outline

  • 8.6 MCMC for Sampling from the Posterior

  • 8.7 Bagging

    • 8.7.1 Examples: Trees with Simulated Data

  • 8.8 Model Averaging and Stacking

  • 8.9 Stochastic Search: Bumping


Mcmc for sampling from the posterior
MCMC for Sampling from the Posterior

  • Markov chain Monte Carlo method

  • Estimate parameters given a Bayesian model and sampling from the posterior distribution

  • Gibbs sampling, a form of MCMC, is like EM only sample from conditional dist rather than maximizing


Gibbs sampling
Gibbs Sampling

  • Wish to draw a sample from the joint distribution

  • If this is difficult, but it’s easy to simulate conditional distributions

  • Gibbs sampler simulates each of these

  • Process produces a Markov chain with stationary distribution equal to desired joint disttribution


Algorithm 8 3 gibbs sampler
Algorithm 8.3: Gibbs Sampler

  • Take some initial values

  • for t=1,2,…:

    • for k=1,2,…,K generate from:

  • Continue step 2 until joint distribution of

    does not change


Gibbs sampling1
Gibbs Sampling

  • Only need to be able to sample from conditional distribution, but if it is known, then:

    is a better estimate


Gibbs sampling for mixtures
Gibbs sampling for mixtures

  • Consider latent data from EM procedure to be another parameter:

  • See algorithm (next slide), same as EM except sample instead of maximize

  • Additional steps can be added to include other informative priors


Algorithm 8 4 gibbs sampling for mixtures
Algorithm 8.4: Gibbs sampling for mixtures

  • Take some initial values

  • Repeat for t=1,2,…,

    • For I=1,2,…,N generate

    • Set

  • Continue step 2 until the joint distribution of

    doesn’t change.


Figure 8 8 gibbs sampling from mixtures
Figure 8.8: Gibbs Sampling from Mixtures

Simplified case with fixed variances and mixing proportion


Outline1
Outline

  • 8.6 MCMC for Sampling from the Posterior

  • 8.7 Bagging

    • 8.7.1 Examples: Trees with Simulated Data

  • 8.8 Model Averaging and Stacking

  • 8.9 Stochastic Search: Bumping


8 7 bagging
8.7 Bagging

  • Using bootstrap to improve the estimate itself

  • Bootstrap mean approximately posterior average

  • Consider regression problem:

  • Bagging averages estimates over bootstrap samples to produce:


Bagging cnt d
Bagging, cnt’d

  • Point is to reduce variance of the estimate while leaving bias unchanged

  • Monte-Carlo estimate of “true” bagging estimate, approaching as

  • Bagged estimate will differ from the original estimate only when latter is adaptive or non-linear function of the data


Bagging b spline example
Bagging B-Spline Example

  • Bagging would average the curves in the lower left-hand corner at each x value.


Quick tree intro
Quick Tree Intro

  • Can’t do.

  • Recursive subdivision.

  • Tree.

  • f-hat.



Bagging trees
Bagging Trees

  • Each run produces different trees

  • Each tree may have different terminal nodes

  • Bagged estimate is the average prediction at x from the B trees. Prediction can be a 0/1 indicator function, in which case bagging gives a pkproportion of trees predicting class k at x.


8 7 1 example trees with simulated data
8.7.1: Example Trees with Simulated Data

  • Original and 5 bootstrap-grown trees

  • Two classes, five features, Gaussian distribution

  • Y from

  • Bayes error 0.2

  • Trees fit to 200 bootstrap samples


Example performance
Example Performance

  • High variance among trees because features have pairwise correlation 0.95.

  • Bagging successfully smooths out vairance and reduces test error.


Where bagging doesn t help
Where Bagging Doesn’t Help

  • Classifier is a single axis-oriented split.

  • Split is chosen along either x1or x2 in order to minimize training error.

  • Boosting is shown on the right.


Outline2
Outline

  • 8.6 MCMC for Sampling from the Posterior

  • 8.7 Bagging

    • 8.7.1 Examples: Trees with Simulated Data

  • 8.8 Model Averaging and Stacking

  • 8.9 Stochastic Search: Bumping


Model averaging and stacking
Model Averaging and Stacking

  • More general Bayesian model averaging

  • Given candidate models Mm, m =1…M and a training set Z and

  • Bayesian prediction is weighted avg of indiv predictions with weights proportional to posterior of each model


Other averaging strategies
Other Averaging Strategies

  • Simple unweighted average of predictions (each model equally likely)

  • BIC: use to estimate posterior model probabilities: weight each model depending on fit and how many parameters it uses

  • Full Bayesian strategy:


Frequentist viewpoint of averaging
Frequentist Viewpoint of Averaging

  • Given a set of predictions from M models, we seek optimal weights w:

  • Input x is fixed and N observations in Z are distributed according to P. Solution is the linear regression of Y on the vector of model predictions:


Notes of frequentist viewpoint
Notes of Frequentist Viewpoint

  • At the population level, adding models with arbitrary weights can only help.

  • But the population is, of course, not available

  • Regression over training set can be used, but this may not be ideal: model complexity not taken into account…


Stacked generalization stacking
Stacked Generalization, Stacking

  • Cross validated predictions avoid unfairly high weight to models with high complexity

  • If w restricted to vectors with one unit weight and the rest zero, model choice has smallest leave-one-out cross validation

  • In practice we use combined models with optimal weights: better prediction, but less interpretability


Outline3
Outline

  • 8.6 MCMC for Sampling from the Posterior

  • 8.7 Bagging

    • 8.7.1 Examples: Trees with Simulated Data

  • 8.8 Model Averaging and Stacking

  • 8.9 Stochastic Search: Bumping


Stochastic search bumping
Stochastic Search: Bumping

  • Rather than average models, try to find a better single model.

  • Good for avoiding local minima in the fitting method.

  • Like bagging, draw bootstrap samples and fit model to each, but choose model that best fits the training data


Stochastic search bumping1
Stochastic Search: Bumping

  • Given B bootstrap samples Z*1,…, Z*B, fitting model to each yields predictions:

  • For squared error, choose model from bootstrap sample:

  • Bumping tries to move around the model space by perturbing the data.


A contrived case where bumping helps
A contrived case where bumping helps

  • Greedy tree-based algorithm tries to split on each dimension separately, first one, then the other.

  • Bumping stumbles upon the right answer.