This presentation is the property of its rightful owner.
1 / 18

# Boosting of classifiers PowerPoint PPT Presentation

Boosting of classifiers. Ata Kaban. Motivation & beginnings. Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner

Boosting of classifiers

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Boosting of classifiers

Ata Kaban

### Motivation & beginnings

• Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner

• E.g. if an email contains the work “money” then classify it as spam, otherwise as non-spam

• Is it possible to use this weak learning algorithm to create a strong classifier with error rate close to 0?

• Ensemble learning – the wisdom of crowds

• More heads are better than one

### Motivation & beginnings

• Rob Shapire and Yoav Freund developed the Adaboost algorithm

• Given:

• Examples where

• A weak learning algorithm A, that produces weak classifiers

• Goal: Produce a new classifier with error Note, is not required to be in

### Idea

• Use the weak learning algorithm to produce a collection of weak classifiers

• Modify the input each time when asking for a new weak classifier

• Weight the training points differently

• Find a good way to combine them

• Iterative algorithm

• Maintains a distribution of weights over the training examples

• Initially weights are equal

• At successive iterations the weight of misclassified examples is increased

• This forces the algorithm to focus on the examples that have not been classified correctly in previous rounds

• Take a linear combination of the predictions of the weak learners, with coefficients proportional to the performance of the weak learner.

### Pseudo-code

• For t=1,…,T

• Construct a discrete probability distribution over indices of training points {1,2,…N}, denote it as

• Run algorithm A on to produce weak classifier

• Calculate where by the weak learning assumption this is slightly smaller than ½ (random guessing)

• Output where

### Details for pseudo-code

• How to construct

• How to determine

Adaboost does these in the following way:

### The weights of training points

• Initially all weights are equal.

• Weights of examples go up or down depending on how easy the example was to classify: If an example is easy it will get small weight , hard ones get large weights

### The combination coefficients

• Weighted vote, where the coefficient for weak-learner is related to how well the weak classifier performed on the weighted training set:

• One can show that the training error of Adaboost drops exponentially fast as the rounds progress

• The more rounds the more complex the final classifier is, so overfitting can happen

• In practice overfitting is rarely observed and Adaboost tends to have excellent generalisation performance

### Typical behaviour

• Can construct arbitrarily complex decision regions

• Generic: Can use any classifier as weak learner, we only need it to be slightly better than random guessing

• Simple to implement

• Fast to run

• Adaboost is one of the ‘top 10’ algorithms in data mining

### Caveats

• Adaboost can fail if there is noise in the class labels (wrong labels)

• It can fail if the weak-learners are too complex

• It can fail of the weak-learners are no better than random guessing

### Topics not covered

• Other combination schemes for classifiers

• E.g. Bagging

• Combinations for unsupervised learning