- 78 Views
- Uploaded on
- Presentation posted in: General

Boosting of classifiers

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Boosting of classifiers

Ata Kaban

- Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner
- E.g. if an email contains the work “money” then classify it as spam, otherwise as non-spam

- Is it possible to use this weak learning algorithm to create a strong classifier with error rate close to 0?
- Ensemble learning – the wisdom of crowds
- More heads are better than one

- The answer is YES
- Rob Shapire and Yoav Freund developed the Adaboost algorithm
- Given:
- Examples where
- A weak learning algorithm A, that produces weak classifiers

- Goal: Produce a new classifier with error Note, is not required to be in

- Use the weak learning algorithm to produce a collection of weak classifiers
- Modify the input each time when asking for a new weak classifier
- Weight the training points differently

- Modify the input each time when asking for a new weak classifier
- Find a good way to combine them

- Iterative algorithm
- Maintains a distribution of weights over the training examples
- Initially weights are equal
- At successive iterations the weight of misclassified examples is increased
- This forces the algorithm to focus on the examples that have not been classified correctly in previous rounds
- Take a linear combination of the predictions of the weak learners, with coefficients proportional to the performance of the weak learner.

- For t=1,…,T
- Construct a discrete probability distribution over indices of training points {1,2,…N}, denote it as
- Run algorithm A on to produce weak classifier
- Calculate where by the weak learning assumption this is slightly smaller than ½ (random guessing)

- Output where

- How to construct
- How to determine
Adaboost does these in the following way:

- Initially all weights are equal.
- Weights of examples go up or down depending on how easy the example was to classify: If an example is easy it will get small weight , hard ones get large weights

- Weighted vote, where the coefficient for weak-learner is related to how well the weak classifier performed on the weighted training set:

- One can show that the training error of Adaboost drops exponentially fast as the rounds progress
- The more rounds the more complex the final classifier is, so overfitting can happen
- In practice overfitting is rarely observed and Adaboost tends to have excellent generalisation performance

- Can construct arbitrarily complex decision regions
- Generic: Can use any classifier as weak learner, we only need it to be slightly better than random guessing
- Simple to implement
- Fast to run
- Adaboost is one of the ‘top 10’ algorithms in data mining

- Adaboost can fail if there is noise in the class labels (wrong labels)
- It can fail if the weak-learners are too complex
- It can fail of the weak-learners are no better than random guessing

- Other combination schemes for classifiers
- E.g. Bagging

- Combinations for unsupervised learning

- Robert E. Schapire. The boosting approach to machine learning. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.
- RobiPolikar. Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Magazine, 6(3), pp. 21-45, 2006. http://users.rowan.edu/~polikar/RESEARCH/PUBLICATIONS/csm06.pdf
- Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2): 139-158, 2000.
- Collection of papers: http://www.cs.gmu.edu/~carlotta/teaching/CS-795-s09/info.html