An Empirical Study of Learning from Imbalanced Data Using R andom F orest

An Empirical Study of Learning from Imbalanced Data Using Random Forest Presenter : Ai-Chen Liao Authors : Taghi M. Khoshgofattr, Moiz Golawala, and Jason Van Hulse 2007 . ICTAI . Page : 310 - 317

Outline • Motivation • Objective • Method • Experiment • Experimental Result • Conclusion • Comments

A forest Motivation A tree

… Motivation • RF is a relatively new learner, only preliminary experimentation on the construction of random forest classifiers in the context of imbalanced data has been reported in previous work. What should be the recommended default number of trees in the ensemble? Whatshould the recommended value be for thenumber ofattributes? How does the RF learner perform onimbalanceddata when compared with othercommonly-used learners? NB, SVM, KNN, C4.5, etc. 4

Objective • This work, is the first to conduct comprehensive experimentation with the RF learner in Weka and recommend empirically proven default values for the numTrees and numFeatures parameters.

Method ─ RF Dataset : … 取後放回取後放回 1 2 1 4 2 5 3 6

Metrics： The area under the ROC curve (AUC) The Kolmogorov-Smirnov (KS) Method ─ Experimental Datasets

numTrees numFeatures Experimental ResultsPhase 1: Selecting an Appropriate RF Learner

Good ! Good ! Experimental ResultsPhase 2: Comparison of RF-100 to Other Learners

Conclusion • The contribution of this study is to provide an extensive empirical evaluation of RF learners built from imbalanced data. • The parameters for the RF learners were chosen to ensure good performance in many different circumstances and to be reasonable for the imbalanced datasets.

Comments • Advantage • Building many learners in these experiments let me believe in the reliability of their experimental results. • Drawback • Due to space restrictions many experiments results are not included here. • Application • Handling imbalanced data

An Empirical Study of Learning from Imbalanced Data Using R andom F orest

An Empirical Study of Learning from Imbalanced Data Using R andom F orest

Presentation Transcript

Inductive Learning from Imbalanced Data Sets

Introduction to C onditional R andom F ields

Imbalanced Data Set Learning with Synthetic Examples

imbalanced data

N orthern F orest F utures P roject

An Empirical Study of Optimizations in Yogi

Tropical R ain F orest

Tropical R ain F orest Biome

Semi-supervised Learning on Partially Labeled Imbalanced Data

The Deciduous F orest

Temperate Deciduous F orest

F orest temple.

Experimental Perspectives on Learning from Imbalanced Data

An Empirical Investigation of Learning from the Semantic Web

Inductive Learning from Imbalanced Data Sets

Generating Well-Behaved Learning Curves : An Empirical Study

An Empirical Study of Exposure at Default

imbalanced data

Learning from Imbalanced Data Prof. Haibo He Electrical Engineering

An Empirical Study of In-Class Labs on Student Learning of Linear Data Structures

Fog Computing An Empirical Study