1 / 11

An Empirical Study of Learning from Imbalanced Data Using R andom F orest

An Empirical Study of Learning from Imbalanced Data Using R andom F orest. Presenter : Ai-Chen Liao Authors : Taghi M. Khoshgofattr, Moiz Golawala, and Jason Van Hulse. 2007 . ICTAI . Page : 310 - 317. Outline. Motivation Objective Method Experiment

toya
Download Presentation

An Empirical Study of Learning from Imbalanced Data Using R andom F orest

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Empirical Study of Learning from Imbalanced Data Using Random Forest Presenter : Ai-Chen Liao Authors : Taghi M. Khoshgofattr, Moiz Golawala, and Jason Van Hulse 2007 . ICTAI . Page : 310 - 317

  2. Outline • Motivation • Objective • Method • Experiment • Experimental Result • Conclusion • Comments

  3. A forest Motivation A tree

  4. Motivation • RF is a relatively new learner, only preliminary experimentation on the construction of random forest classifiers in the context of imbalanced data has been reported in previous work. What should be the recommended default number of trees in the ensemble? Whatshould the recommended value be for thenumber ofattributes? How does the RF learner perform onimbalanceddata when compared with othercommonly-used learners? NB, SVM, KNN, C4.5, etc. 4

  5. Objective • This work, is the first to conduct comprehensive experimentation with the RF learner in Weka and recommend empirically proven default values for the numTrees and numFeatures parameters.

  6. Method ─ RF Dataset : … 取後放回 取後放回 1 2 1 4 2 5 3 6

  7. Metrics: The area under the ROC curve (AUC) The Kolmogorov-Smirnov (KS) Method ─ Experimental Datasets

  8. numTrees numFeatures Experimental ResultsPhase 1: Selecting an Appropriate RF Learner

  9. Good ! Good ! Experimental ResultsPhase 2: Comparison of RF-100 to Other Learners

  10. Conclusion • The contribution of this study is to provide an extensive empirical evaluation of RF learners built from imbalanced data. • The parameters for the RF learners were chosen to ensure good performance in many different circumstances and to be reasonable for the imbalanced datasets.

  11. Comments • Advantage • Building many learners in these experiments let me believe in the reliability of their experimental results. • Drawback • Due to space restrictions many experiments results are not included here. • Application • Handling imbalanced data

More Related