1 / 18

# Boosting of classifiers - PowerPoint PPT Presentation

Boosting of classifiers. Ata Kaban. Motivation & beginnings. Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Boosting of classifiers' - rafael

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Boosting of classifiers

Ata Kaban

• Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner

• E.g. if an email contains the work “money” then classify it as spam, otherwise as non-spam

• Is it possible to use this weak learning algorithm to create a strong classifier with error rate close to 0?

• Ensemble learning – the wisdom of crowds

• More heads are better than one

• Rob Shapire and Yoav Freund developed the Adaboost algorithm

• Given:

• Examples where

• A weak learning algorithm A, that produces weak classifiers

• Goal: Produce a new classifier with error Note, is not required to be in

• Use the weak learning algorithm to produce a collection of weak classifiers

• Modify the input each time when asking for a new weak classifier

• Weight the training points differently

• Find a good way to combine them

• Iterative algorithm

• Maintains a distribution of weights over the training examples

• Initially weights are equal

• At successive iterations the weight of misclassified examples is increased

• This forces the algorithm to focus on the examples that have not been classified correctly in previous rounds

• Take a linear combination of the predictions of the weak learners, with coefficients proportional to the performance of the weak learner.

• For t=1,…,T

• Construct a discrete probability distribution over indices of training points {1,2,…N}, denote it as

• Run algorithm A on to produce weak classifier

• Calculate where by the weak learning assumption this is slightly smaller than ½ (random guessing)

• Output where

• How to construct

• How to determine

Adaboost does these in the following way:

• Initially all weights are equal.

• Weights of examples go up or down depending on how easy the example was to classify: If an example is easy it will get small weight , hard ones get large weights

• Weighted vote, where the coefficient for weak-learner is related to how well the weak classifier performed on the weighted training set:

• One can show that the training error of Adaboost drops exponentially fast as the rounds progress

• The more rounds the more complex the final classifier is, so overfitting can happen

• In practice overfitting is rarely observed and Adaboost tends to have excellent generalisation performance

• Can construct arbitrarily complex decision regions

• Generic: Can use any classifier as weak learner, we only need it to be slightly better than random guessing

• Simple to implement

• Fast to run

• Adaboost is one of the ‘top 10’ algorithms in data mining

• Adaboost can fail if there is noise in the class labels (wrong labels)

• It can fail if the weak-learners are too complex

• It can fail of the weak-learners are no better than random guessing

• Other combination schemes for classifiers

• E.g. Bagging

• Combinations for unsupervised learning

• Robert E. Schapire. The boosting approach to machine learning. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.

• RobiPolikar. Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Magazine, 6(3), pp. 21-45, 2006. http://users.rowan.edu/~polikar/RESEARCH/PUBLICATIONS/csm06.pdf

• Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2): 139-158, 2000.

• Collection of papers: http://www.cs.gmu.edu/~carlotta/teaching/CS-795-s09/info.html