1 / 35

Bagging

Bagging. LING 572 Fei Xia 1/25/07. Classifiers we have learned so far. Naïve Bayes kNN Rocchio Decision tree Decision list  Similarities and Differences?. How to improve performance?. Bagging: b ootstrap agg regat ing Boosting System combination …. Outline.

ishana
Download Presentation

Bagging

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bagging LING 572 Fei Xia 1/25/07

  2. Classifiers we have learned so far • Naïve Bayes • kNN • Rocchio • Decision tree • Decision list  Similarities and Differences?

  3. How to improve performance? • Bagging: bootstrap aggregating • Boosting • System combination • …

  4. Outline • An introduction to the bootstrap • Bagging: basic concepts (Breiman, 1996) • Case study: bagging a treebank parser (Henderson and Brill, ANLP 2000)

  5. Introduction to bootstrap

  6. Motivation • What’s the average price of house prices? • Get a sample {x1, x2, …, xn}, and calculate the average u. • Question: how reliable is u? What’s the standard error of u? what’s the confidence interval?

  7. Solutions • One possibility: get several samples. • Problem: it is impossible (or too expensive) to get multiple samples. • One solution: bootstrap

  8. Bootstrap

  9. The general bootstrap algorithm Let the original sample be L={x1,x2,…,xn} • Repeat B time: • Generate a sample Lk of size n from L by sampling with replacement. • Compute for x*.  Now we end up with bootstrap values • Use these values for calculating all the quantities of interest (e.g., standard deviation, confidence intervals)

  10. X1=(1.57, 0.22,19.67, 0, 0.22, 3.12) X=(3.12, 0, 1.57, 19.67, 0.22, 2.2) Mean=4.13 Mean=4.46 X2=(0, 2.2, 2.2, 2.2, 19.67, 1.57) Mean=4.64 X3=(0.22, 3.12,1.57, 3.12, 2.2, 0.22) Mean=1.74 An example

  11. A quick view of bootstrapping • Introduced by Bradley Efron in 1979 • Named from the phrase “to pull oneself up by one’s bootstraps”, which is widely believed to come from “the Adventures of Baron Munchausen”. • Popularized in 1980s due to the introduction of computers in statistical practice. • It has a strong mathematical background. • It is well known as a method for estimating standard errors, bias, and constructing confidence intervals for parameters.

  12. Bootstrap distribution • The bootstrap does not replace or add to the original data. • We use bootstrap distribution as a way to estimate the variation in a statistic based on the original data.

  13. Sampling distribution vs. bootstrap distribution • The population: certain unknown quantities of interest (e.g., mean) • Multiple samples  sampling distribution • Bootstrapping: • One original sample  B bootstrap samples • B bootstrap samples  bootstrap distribution

  14. Bootstrap distributions usually approximate the shape, spread, and bias of the actual sampling distribution. • Bootstrap distributions are centered at the value of the statistic from the original sample plus any bias. • The sampling distribution is centered at the value of the parameter in the population, plus any bias.

  15. Cases where bootstrap does not apply • Small data sets: the original sample is not a good approximation of the population • Noisy data: outliers add variability in our estimates. • Dependence structures (e.g., time series, spatial problems): Bootstrap is based on the assumption of independence. • …

  16. How many bootstrap sample are needed? Choice of B depends on • Computer availability • Type of the problem: standard errors, confidence intervals, … • Complexity of the problem

  17. Resampling methods • Bootstrap • Permutation tests • Jackknife: we ignore one observation at each time • …

  18. Bagging: basic concepts

  19. Bagging • Introduced by Breiman (1996) • “Bagging” stands for “bootstrap aggregating”. • It is an ensemble method: a method of combining multiple predictors.

  20. Predictors • Let L be a training set {(xi, yi) | xi in X, yi in Y}, drawn from the set Λ of possible training sets. • A predictor Φ: X  Y is a function that for any given x, it produces y=Φ(x). • A learning algorithm (a.k.a. learner) Ψ: Λ that given any L in Λ, it produces a predictor Φ=Ψ(L) in . • Types of predictors: • Classifiers: DT, DL, kNN, … • Estimators: Regression trees • Others: parsers

  21. Bagging algorithm Let the original training data be L • Repeat B times: • Get a bootstrap sample Lk from L. • Train a predictor using Lk. • Combine B predictors by • Voting (for classification problem) • Averaging (for estimation problem) • …

  22. Bagging ML f1 ML f2 f ML fB bootstrap + system combination

  23. Bagging decision trees 1. Splitting the data set into training set T1 and test set T2. 2. Bagging using 50 bootstrap samples. 3. Repeat Steps 1-2 100 times, and calculate average test set misclassification rate.

  24. How many bootstrap samples are needed? • Bagging decision trees for the waveform task: • Unbagged rate is 29.0%. • We are getting most of the improvement using • only 10 bootstrap samples.

  25. Bagging regression trees Bagging with 25 bootstrap samples. Repeat 100 times.

  26. Bagging k-nearest neighbor classifiers 100 bootstrap samples. 100 iterations. Bagging does not help.

  27. Experiment results • Bagging works well for “unstable” learning algorithms. • Bagging can slightly degrade the performance of “stable” learning algorithms.

  28. Learning algorithms • Unstable learning algorithms: small changes in the training set result in large changes in predictions. • Neural network • Decision tree • Regression tree • Subset selection in linear regression • Stable learning algorithms: • kNN

  29. Case study

  30. Experiment settings • Henderson and Brill ANLP-2000 paper • Parser: Collins’s Model 2 (1997) • Training data: sections 01-21 • Test data: Section 23 • Bagging: • Different ways of combining parsing results

  31. Techniques for combining parsers(Henderson and Brill, EMNLP-1999) • Parse hybridization: combining the substructures of the input parses • Constituent voting • Naïve Bayes • Parser switching: selecting one of the input parses • Similarity switching • Naïve Bayes

  32. Experiment results • Baseline (no bagging): 88.63 • Initial (one bag): 88.38 • Final (15 bags): 89.17

  33. Training corpus size effects

  34. Summary • Bootstrap is a resampling method. • Bagging is directly related to bootstrap. • It uses bootstrap samples to train multiple predictors. • Output of predictors are combined by voting or other methods. • Experiment results: • It is effective for unstable learning methods. • It does not help stable learning methods.

  35. Uncovered issues • How to determine whether a learning method is stable or unstable? • Why bagging works for unstable algorithms?

More Related