180 likes | 456 Views
CIKM’02. Boosting to Correct Inductive Bias in Text Classification. Yan Liu, Yiming Yang and Jaime Carbonell School of Computer Science Carnegie Mellon University Nov 1, 2002. Introduction to Boosting. Boosting Running weak learning algorithm on sampled examples
 
                
                E N D
CIKM’02 Boosting to Correct Inductive Bias in Text Classification Yan Liu, Yiming Yang and Jaime Carbonell School of Computer Science Carnegie Mellon University Nov 1, 2002 Yan Liu, Yiming Yang and Jaime Carbonell
Introduction to Boosting • Boosting • Running weak learning algorithm on sampled examples • Combining the classifiers produced by the weak learners into a single composite classifier • Characteristics • Error-driven based sampling • Combination strategies • Variant • AdaBoost VS. Adaptive Resampling Yan Liu, Yiming Yang and Jaime Carbonell
Boosting Algorithm -AdaBoost • AdaBoost Algorithm (By Freund and Schapire) • Sampling strategies • Combination strategies Yan Liu, Yiming Yang and Jaime Carbonell
Boosting Algorithm -AdaBoost • Theoretical Analysis • Bound on Error • Training error drops exponentially fast [Schapire] • A qualitative bound on the generalization error • Connections • Logistic regression [Friedman] • Game theory and linear programming [Schapire] • Exponential model [Lebanon & Lafferty] • Applications Yan Liu, Yiming Yang and Jaime Carbonell
Boosting Algorithm – Adaptive Resampling • Adaptive Resampling (By Weiss et al.) • Sampling strategies • Combination strategies • Linear combination: unweighted voting • Theoretical Basis • Resampling with any technique that can increase the likelihood of the misclassified examples will achieve improvement Yan Liu, Yiming Yang and Jaime Carbonell
Task Identification • Perspective • How boosting reacts to the inductive bias of different classifiers? • Main Focus • How well boosting works for “non-weak” learning Algorithms? • Decision Tree, Naïve Bayes, Support Vector Machines and Rocchio-based classifier Yan Liu, Yiming Yang and Jaime Carbonell
Inductive Bias • Inductive Learning • Inducing classification functions from a set of training examples • Inductive Bias • The underlying assumptions in the inductive inferences • Restriction bias VS. Preference bias • Search Space vs. Search Strategy Yan Liu, Yiming Yang and Jaime Carbonell
Boosting Decision Tree Yan Liu, Yiming Yang and Jaime Carbonell
Boosting Naïve Bayes Yan Liu, Yiming Yang and Jaime Carbonell
Boosting SVMs Yan Liu, Yiming Yang and Jaime Carbonell
Boosting Rocchio Yan Liu, Yiming Yang and Jaime Carbonell
Experiments • Data Collection • Reuters-21578 Corpus • 90 Categories, 7769 training examples and 3019 testing examples • Pre-processing • Removal of stopwords and Stemming • Measurement • Micro Averaged F1 VS. Macro Averaged F1 Yan Liu, Yiming Yang and Jaime Carbonell
Experiment Results: Boosting SVM • Micro-Averaged F1 • Highest score: 0.875 (with Adaptive resampling) • Overfit problems • Macro-Averaged F1 • Achieve 17% improvement over the results by Yang & Liu • More effective for rare class than for common class Yan Liu, Yiming Yang and Jaime Carbonell
Experiment Results: Boosting Decision Tree • Micro-Averaged F1 • Only slight improvement • Macro-Averaged F1 • Achieve 13% improvement over baseline • More effective for rare class than for common class Yan Liu, Yiming Yang and Jaime Carbonell
Experiment Results: Boosting Naïve Bayes • Micro-Averaged F1 • Only marginal improvement • Macro-Averaged F1 • Decrease the performance • Not effective for rare class problems of this dataset • Open Question Yan Liu, Yiming Yang and Jaime Carbonell
Discussion • Boosting is effective to correct the inductive bias for SVMs and Dtree of rare categories Yan Liu, Yiming Yang and Jaime Carbonell
Conclusion • Rare Categories: effectiveness to correct the inductive bias varies across classifiers • good for SVMs and DTree • We achieve 13-17% improvement in Macro F1 measure by boosting SVMs and Dtree • Common Categories: not significantly effective to correct the inductive bias • However, we achieve the best micro-averaged F1 by boosting SVMs Yan Liu, Yiming Yang and Jaime Carbonell