1 / 24

Advanced Machine Learning & Perception

Advanced Machine Learning & Perception. Instructor: Tony Jebara. Boosting. Combining Multiple Classifiers Voting Boosting Adaboost Based on material by Y. Freund, P. Long & R. Schapire. Combining Multiple Learners. Have many simple learners Also called base learners or

ham
Download Presentation

Advanced Machine Learning & Perception

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara

  2. Tony Jebara, Columbia University Boosting • Combining Multiple Classifiers • Voting • Boosting • Adaboost • Based on material by Y. Freund, P. Long & R. Schapire

  3. Tony Jebara, Columbia University Combining Multiple Learners • Have many simple learners • Also called base learners or • weak learners which have a classification error of <0.5 • Combine or vote them to get a higher accuracy • No free lunch: there is no guaranteed best approach here • Different approaches: • Voting • combine learners with fixed weight • Mixture of Experts • adjust learners and a variable weight/gate fn • Boosting • actively search for next base-learners and vote • Cascading, Stacking, Bagging, etc.

  4. Tony Jebara, Columbia University Voting • Have T classifiers • Average their prediction with weights • Like mixture of experts but weight is constant with input y a a a Expert Expert Expert x

  5. Tony Jebara, Columbia University Mixture of Experts • Have T classifiers or experts and a gating fn • Average their prediction with variable weights • But, adapt parameters of the gating function • and the experts (fixed total number T of experts) Output Gating Network Expert Expert Expert Input

  6. Tony Jebara, Columbia University Boosting • Actively find complementary or synergistic weak learners • Train next learner based on mistakes of previous ones. • Average prediction with fixed weights • Find next learner by training on weighted versions of data. training data weak rule weak learner ensemble of learners weights prediction

  7. Tony Jebara, Columbia University AdaBoost • Most popular weighting scheme • Define margin for point i as • Find an ht and find weight at to min the cost function • sum exp-margins training data weak rule weak learner ensemble of learners weights prediction

  8. Tony Jebara, Columbia University AdaBoost • Choose base learner & at to • Recall error of • base classifier ht must be • For binary h, Adaboost puts this weight on weak learners: • (instead of the • more general rule) • Adaboost picks the following for the weights on data for • the next round (here Z is the normalizer to sum to 1)

  9. Tony Jebara, Columbia University Y no no yes yes 5 X 3 Decision Trees -1 +1 X>3 -1 Y>5 -1 -1 +1

  10. Tony Jebara, Columbia University Y -1 +1 +0.2 -0.1 +0.1 X>3 no yes -1 -0.3 +0.1 sign -0.1 Y>5 no yes -0.3 +0.2 X Decision tree as a sum -0.2 -0.2

  11. Tony Jebara, Columbia University Y +0.2 Y<1 -0.1 +0.1 -1 +1 no yes X>3 +0.7 0.0 no yes -0.3 +0.1 -0.1 -1 sign Y>5 no yes -0.3 +0.2 +1 X An alternating decision tree -0.2 +0.7

  12. Tony Jebara, Columbia University Example: Medical Diagnostics • Cleve dataset from UC Irvine database. • Heart disease diagnostics (+1=healthy,-1=sick) • 13 features from tests (real valued and discrete). • 303 instances.

  13. Tony Jebara, Columbia University Ad-Tree Example

  14. Tony Jebara, Columbia University Cross-validated accuracy

  15. Tony Jebara, Columbia University Logitboost Loss Brownboost 0-1 loss Margin Mistakes Correct AdaBoost Convergence • Rationale? • Consider bound on • the training error: • Adaboost is essentially doing gradient descent on this. • Convergence? exp bound on step definition of f(x) recursive use of Z

  16. Tony Jebara, Columbia University AdaBoost Convergence • Convergence? Consider the binary ht case. So, the final learner converges exponentially fast in T if each weak learner is at least better than gamma!

  17. Tony Jebara, Columbia University Curious phenomenon Boosting decision trees Using <10,000 training examples we fit >2,000,000 parameters

  18. Tony Jebara, Columbia University Explanation using margins 0-1 loss Margin

  19. Tony Jebara, Columbia University No examples with small margins!! Explanation using margins 0-1 loss Margin

  20. Tony Jebara, Columbia University Experimental Evidence

  21. Tony Jebara, Columbia University AdaBoost Generalization • Also, a VC analysis gives a generalization bound: • (where d is VC of • base classifier) • But, more iterations  overfitting! • A margin analysis is possible, redefine margin as: • Then have

  22. Tony Jebara, Columbia University AdaBoost Generalization • Suggests this optimization problem: Margin

  23. Tony Jebara, Columbia University AdaBoost Generalization • Proof Sketch

  24. Tony Jebara, Columbia University UCI Results % test error rates

More Related