1 / 35

A speech about Boosting

A speech about Boosting. Presenter: Roberto Valenti. The Paper*. *R.Schapire. The boosting approach to Machine Learning An Overview, 2001. I want YOU… . …TO UNDERSTAND. Overview. Introduction Adaboost How Does it work? Why does it work? Demo Extensions Performance & Applications

redell
Download Presentation

A speech about Boosting

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A speech about Boosting Presenter: Roberto Valenti

  2. The Paper* *R.Schapire. The boosting approach to Machine Learning An Overview, 2001

  3. I want YOU… …TO UNDERSTAND

  4. Overview • Introduction • Adaboost • How Does it work? • Why does it work? • Demo • Extensions • Performance & Applications • Summary & Conclusions • Questions

  5. Introduction to Boosting Let’s start

  6. Introduction • An example of Machine Learning: Spam classifier • Highly accurate rule: difficult to find • Inaccurate rule: ”BUY NOW” • Introducing Boosting: “An effective method of producing an accurate prediction rule from inaccurate rules”

  7. Introduction • History of boosting: • 1989: Schapire • First provable polynomial time boosting • 1990: Freund • Much more efficient, but practical drawbacks • 1995: Freund & Schapire • Adaboost: Focus of this Presentation • …

  8. Introduction • The Boosting Approach • Lots of Weak Classifiers • One Strong Classifier • Boosting key points: • Give importance to misclassified data • Find a way to combine weak classifiers in general rule.

  9. Adaboost How does it work?

  10. Adaboost – How does it work?

  11. Adaboost – How does it work? Base Learner Job: • Find a base Hypothesis: • Minimize the error: • Choose at

  12. Adaboost – How does it work?

  13. Adaboost Why does it work?

  14. Adaboost – Why does it work? • Basic property: reduce the training error • On binary Distributions: e =1/2 - gt • Training error bounded by: • Is at most e-2Tg2 ->drops exponentially!

  15. Adaboost – Why does it work? • Generalization Error bounded by: • T= number of iterations • m=sample size • d= Vapnik-Chervonenkis dimension2 • Pr [.]= empirical probability • Õ = Logarithmic and constant factors • Overfitting in T!

  16. Adaboost – Why does it work? • Margins of the training examples margin(x,y)= • Positive only if correctly classified by H • Confidence in prediction: • Qualitative Explanation of Effectiveness • Not Quantitative.

  17. Adaboost – Other View • Adaboost as a zero-sum Game • Game matrix M • Row Player: Adaboost • Column Player: Base Learner • Row player plays rows with distribution P • Column player plays with distribution Q • Expected Loss: PTMQ • Play a Repeated game Matrix

  18. Adaboost – Other View • Von Neumann’s minmax theorem: • If exist a classifier with e <1/2 - g • Then exist a combination of base classifiers with margin > 2g • Adaboost has potential of success • Relations with Linear Programming and Online Learning

  19. Adaboost Demo

  20. Demo

  21. Adaboost Extensions

  22. Adaboost - Extensions • History of Boosting: • … • 1997: Freund & Schapire • Adaboost.M1 • First Multiclass Generalization • Fails if weak learner achieves less than 50% • Adaboost.M2 • Creates a set of binary problems • For x, better l1 or l2? • 1999: Schapire & Singer • Adaboost.MH • For x, better l1 or one of the others?

  23. Adaboost - Extensions • 2001: Rochery, Schapire et al. • Incorporating Human Knowledge • Adaboost is data-driven • Human Knowledge can compensate lack of data • Human expert: • Chose rule p mapping x to p(x) Є [0,1] • Difficult! • Simple rules should work..

  24. Adaboost - Extensions • To incorporate human knowledge • Where RE(p||q)=p ln(p/q)+(1-p) ln((1-p)/(1-q))

  25. Adaboost Performance and Applications

  26. Adaboost - Performance & Applications Error Rates on Text categorization Reuters newswire articles AP newswire headlines

  27. Adaboost - Performance & Applications Six Class Text Classification (TREC) Test Error Training Error

  28. Adaboost - Performance & Applications Spoken Language Classification “How may I help you” “Help desk”

  29. Adaboost - Performance & Applications OCR: Outliers Rounds: 4 12 25 class, label1/weight1,label2/weight2

  30. Adaboost - Applications • Text filtering • Schapire, Singer, Singhal. Boosting and Rocchio applied to text filtering.1998 • Routing • Iyer, Lewis, Schapire, Singer, Singhal. Boosting for document routing.2000 • “Ranking” problems • Freund, Iyer, Schapire, Singer. An efficient boostingalgorithm for combining preferences.1998 • Image retrieval • Tieu, Viola. Boosting image retrieval.2000 • Medical diagnosis • Merler, Furlanello, Larcher, Sboner. Tuning costsensitive boosting and its application to melanoma diagnosis.2001

  31. Adaboost - Applications • Learning problems in natural language processing • Abney, Schapire, Singer. Boosting applied to tagging and PP attachment.1999 • Collins. Discriminative reranking for natural language parsing.2000 • Escudero, Marquez, Rigau. Boosting applied to word sense disambiguation.2000 • Haruno, Shirai, Ooyama. Using decision trees to construct a practical parser.1999 • Moreno, Logan, Raj. A boosting approach for confidence scoring.2001 • Walker, Rambow, Rogati. SPoT: A trainable sentence planner.2001

  32. Summary and Conclusions At last…

  33. Summary • Boosting takes a weak learner and converts it to a strong one • Works by asymptotically minimizing the training error • Effectively maximizes the margin of the combined hypothesis • Adaboost is related to other many topics • It Works!

  34. Conclusions • Adaboost advantages: • Fast, simple and easy to program • No parameter required • Performance Dependency: • (Skurichina, 2001) Boosting is only useful for large sample size. • Choice of weak classifier • Incorporation of classifier weights • Data distribution

  35. Questions ? (don’t be mean)

More Related