1 / 17

Boosting to Correct Inductive Bias in Text Classification

CIKM’02. Boosting to Correct Inductive Bias in Text Classification. Yan Liu, Yiming Yang and Jaime Carbonell School of Computer Science Carnegie Mellon University Nov 1, 2002. Introduction to Boosting. Boosting Running weak learning algorithm on sampled examples

Download Presentation

Boosting to Correct Inductive Bias in Text Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CIKM’02 Boosting to Correct Inductive Bias in Text Classification Yan Liu, Yiming Yang and Jaime Carbonell School of Computer Science Carnegie Mellon University Nov 1, 2002 Yan Liu, Yiming Yang and Jaime Carbonell

  2. Introduction to Boosting • Boosting • Running weak learning algorithm on sampled examples • Combining the classifiers produced by the weak learners into a single composite classifier • Characteristics • Error-driven based sampling • Combination strategies • Variant • AdaBoost VS. Adaptive Resampling Yan Liu, Yiming Yang and Jaime Carbonell

  3. Boosting Algorithm -AdaBoost • AdaBoost Algorithm (By Freund and Schapire) • Sampling strategies • Combination strategies Yan Liu, Yiming Yang and Jaime Carbonell

  4. Boosting Algorithm -AdaBoost • Theoretical Analysis • Bound on Error • Training error drops exponentially fast [Schapire] • A qualitative bound on the generalization error • Connections • Logistic regression [Friedman] • Game theory and linear programming [Schapire] • Exponential model [Lebanon & Lafferty] • Applications Yan Liu, Yiming Yang and Jaime Carbonell

  5. Boosting Algorithm – Adaptive Resampling • Adaptive Resampling (By Weiss et al.) • Sampling strategies • Combination strategies • Linear combination: unweighted voting • Theoretical Basis • Resampling with any technique that can increase the likelihood of the misclassified examples will achieve improvement Yan Liu, Yiming Yang and Jaime Carbonell

  6. Task Identification • Perspective • How boosting reacts to the inductive bias of different classifiers? • Main Focus • How well boosting works for “non-weak” learning Algorithms? • Decision Tree, Naïve Bayes, Support Vector Machines and Rocchio-based classifier Yan Liu, Yiming Yang and Jaime Carbonell

  7. Inductive Bias • Inductive Learning • Inducing classification functions from a set of training examples • Inductive Bias • The underlying assumptions in the inductive inferences • Restriction bias VS. Preference bias • Search Space vs. Search Strategy Yan Liu, Yiming Yang and Jaime Carbonell

  8. Boosting Decision Tree Yan Liu, Yiming Yang and Jaime Carbonell

  9. Boosting Naïve Bayes Yan Liu, Yiming Yang and Jaime Carbonell

  10. Boosting SVMs Yan Liu, Yiming Yang and Jaime Carbonell

  11. Boosting Rocchio Yan Liu, Yiming Yang and Jaime Carbonell

  12. Experiments • Data Collection • Reuters-21578 Corpus • 90 Categories, 7769 training examples and 3019 testing examples • Pre-processing • Removal of stopwords and Stemming • Measurement • Micro Averaged F1 VS. Macro Averaged F1 Yan Liu, Yiming Yang and Jaime Carbonell

  13. Experiment Results: Boosting SVM • Micro-Averaged F1 • Highest score: 0.875 (with Adaptive resampling) • Overfit problems • Macro-Averaged F1 • Achieve 17% improvement over the results by Yang & Liu • More effective for rare class than for common class Yan Liu, Yiming Yang and Jaime Carbonell

  14. Experiment Results: Boosting Decision Tree • Micro-Averaged F1 • Only slight improvement • Macro-Averaged F1 • Achieve 13% improvement over baseline • More effective for rare class than for common class Yan Liu, Yiming Yang and Jaime Carbonell

  15. Experiment Results: Boosting Naïve Bayes • Micro-Averaged F1 • Only marginal improvement • Macro-Averaged F1 • Decrease the performance • Not effective for rare class problems of this dataset • Open Question Yan Liu, Yiming Yang and Jaime Carbonell

  16. Discussion • Boosting is effective to correct the inductive bias for SVMs and Dtree of rare categories Yan Liu, Yiming Yang and Jaime Carbonell

  17. Conclusion • Rare Categories: effectiveness to correct the inductive bias varies across classifiers • good for SVMs and DTree • We achieve 13-17% improvement in Macro F1 measure by boosting SVMs and Dtree • Common Categories: not significantly effective to correct the inductive bias • However, we achieve the best micro-averaged F1 by boosting SVMs Yan Liu, Yiming Yang and Jaime Carbonell

More Related