Boosting of classifiers
This presentation is the property of its rightful owner.
Sponsored Links
1 / 18

Boosting of classifiers PowerPoint PPT Presentation

  • Uploaded on
  • Presentation posted in: General

Boosting of classifiers. Ata Kaban. Motivation & beginnings. Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner

Download Presentation

Boosting of classifiers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Boosting of classifiers

Boosting of classifiers

Ata Kaban

Motivation beginnings

Motivation & beginnings

  • Suppose we have a learning algorithm that is guaranteed with high probability to be slightly better than random guessing – we call this a weak learner

    • E.g. if an email contains the work “money” then classify it as spam, otherwise as non-spam

  • Is it possible to use this weak learning algorithm to create a strong classifier with error rate close to 0?

  • Ensemble learning – the wisdom of crowds

    • More heads are better than one

Motivation beginnings1

Motivation & beginnings

  • The answer is YES

  • Rob Shapire and Yoav Freund developed the Adaboost algorithm

  • Given:

    • Examples where

    • A weak learning algorithm A, that produces weak classifiers

  • Goal: Produce a new classifier with error Note, is not required to be in

Boosting of classifiers


  • Use the weak learning algorithm to produce a collection of weak classifiers

    • Modify the input each time when asking for a new weak classifier

      • Weight the training points differently

  • Find a good way to combine them

Main idea behind adaboost

Main idea behind Adaboost

  • Iterative algorithm

  • Maintains a distribution of weights over the training examples

  • Initially weights are equal

  • At successive iterations the weight of misclassified examples is increased

  • This forces the algorithm to focus on the examples that have not been classified correctly in previous rounds

  • Take a linear combination of the predictions of the weak learners, with coefficients proportional to the performance of the weak learner.

Demo first two rounds

Demo – first two rounds

Demo cont d third round

Demo (cont’d) – third round …

Demo cont d the final classifier

Demo (cont’d) – the final classifier

Pseudo code


  • For t=1,…,T

    • Construct a discrete probability distribution over indices of training points {1,2,…N}, denote it as

    • Run algorithm A on to produce weak classifier

    • Calculate where by the weak learning assumption this is slightly smaller than ½ (random guessing)

  • Output where

Details for pseudo code

Details for pseudo-code

  • How to construct

  • How to determine

    Adaboost does these in the following way:

The weights of training points

The weights of training points

  • Initially all weights are equal.

  • Weights of examples go up or down depending on how easy the example was to classify: If an example is easy it will get small weight , hard ones get large weights

The combination coefficients

The combination coefficients

  • Weighted vote, where the coefficient for weak-learner is related to how well the weak classifier performed on the weighted training set:



  • One can show that the training error of Adaboost drops exponentially fast as the rounds progress

  • The more rounds the more complex the final classifier is, so overfitting can happen

  • In practice overfitting is rarely observed and Adaboost tends to have excellent generalisation performance

Typical behaviour

Typical behaviour

Practical advantages of adaboost

Practical advantages of Adaboost

  • Can construct arbitrarily complex decision regions

  • Generic: Can use any classifier as weak learner, we only need it to be slightly better than random guessing

  • Simple to implement

  • Fast to run

  • Adaboost is one of the ‘top 10’ algorithms in data mining



  • Adaboost can fail if there is noise in the class labels (wrong labels)

  • It can fail if the weak-learners are too complex

  • It can fail of the weak-learners are no better than random guessing

Topics not covered

Topics not covered

  • Other combination schemes for classifiers

    • E.g. Bagging

  • Combinations for unsupervised learning

Further readings

Further readings

  • Robert E. Schapire. The boosting approach to machine learning. In D. D. Denison, M. H. Hansen, C. Holmes, B. Mallick, B. Yu, editors, Nonlinear Estimation and Classification. Springer, 2003.

  • RobiPolikar. Ensemble Based Systems in Decision Making, IEEE Circuits and Systems Magazine, 6(3), pp. 21-45, 2006.

  • Thomas G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: bagging, boosting, and randomization. Machine Learning, 40(2): 139-158, 2000.

  • Collection of papers:

  • Login