1 / 13

Decision Trees

Decision Trees. Ensemble methods. Use multiple models to obtain better predictive performance Ensembles combine multiple hypotheses to form a (hopefully) better hypothesis Combine multiple weak learners to produce a strong learner

Download Presentation

Decision Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Decision Trees

  2. Ensemble methods • Use multiple models to obtain better predictive performance • Ensembles combine multiple hypotheses to form a (hopefully) better hypothesis • Combine multiple weak learners to produce a strong learner • Typically much more computation, since you are training multiple learners

  3. Ensemble learners • Typically combine multiple fast learners (like decision trees) • Tend to overfit • Tend to get better results since there is deliberately introduced significant diversity among models • Diversity does not mean reduced performance • Note that empirical studies have shown that random forests do better than an ensemble of decision trees • Random forest is an ensemble of decisions trees that do not minimize entropy to choose tree nodes.

  4. Bayes optimal classifier is an ensemble learner

  5. Bagging: Bootstrap aggregating • Each model in the ensemble votes with equal weight • Train each model with a random training set • Random forests do better than bagged entropy reducing DTs

  6. Bootstrap estimation • Repeatedly draw n samples from D • For each set of samples, estimate a statistic • The bootstrap estimate is the mean of the individual estimates • Used to estimate a statistic (parameter) and its variance

  7. Bagging • For i = 1 .. M • Draw n*<n samples from D with replacement • Learn classifier Ci • Final classifier is a vote of C1 .. CM • Increases classifier stability/reduces variance

  8. Boosting • Incremental • Build new models that try to do better on previous model's mis-classifications • Can get better accuracy • Tends to overfit • Adaboost is canonical boosting algorithm

  9. Boosting (Schapire 1989) • Randomly select n1 < nsamples from D without replacement to obtain D1 • Train weak learner C1 • Select n2 < nsamples from D with half of the samples misclassified by C1 toobtain D2 • Train weak learner C2 • Select allsamples from D that C1 and C2 disagree on • Train weak learner C3 • Final classifier is vote of weak learners

  10. Adaboost • Learner = Hypothesis = Classifier • Weak Learner: < 50% error over any distribution • Strong Classifier: thresholded linear combination of weak learner outputs

  11. Discrete Adaboost

  12. Real Adaboost

  13. Comparison

More Related