1 / 26

Extreme Re-balancing for SVMs: a case study

Extreme Re-balancing for SVMs: a case study. Advisor : Dr. Hsu Reporter : Wen-Hsiang Hu Author : Bhavani Raskutti and Adam Kowalczyk. Sigkdd Explorations. Outline. Motivation Objective Related Research Support Vector Machines Re-balancing of the Data Sample Balancing

vanna
Download Presentation

Extreme Re-balancing for SVMs: a case study

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Extreme Re-balancing for SVMs: a case study Advisor :Dr. Hsu Reporter:Wen-Hsiang Hu Author:Bhavani Raskutti and Adam Kowalczyk Sigkdd Explorations

  2. Outline • Motivation • Objective • Related Research • Support Vector Machines • Re-balancing of the Data • Sample Balancing • Weight Balancing • Experimental • Discussion • Conclusion • Personal Opinion

  3. Motivation • A standard recipe for two class discrimination is to take examples from both classes, then generate a model for discriminating them. However, there are many applications were obtaining examples of a second class is difficult. • e.g. classifying sites of “interest” to a web surfer • There are situations when the data has heavily unbalanced representatives of the two classes of interest, • e.g. fraud detection and information filtering

  4. Objective • Get better performance by one-class learners

  5. Related Research (1/2) • Many solutions have been proposed to address the imbalance problem including sampling and weighting examples. • Typically, these methods focus on cases when the imbalance ratio of minority to majority class is around 10:90 • In this paper, we focus on extreme imbalance in very high dimensional input spaces , where at the learning stage the minority class consists of around 1-3% of data.

  6. Related Research (2/2) • In both cases (image retrieval and document classification) • One-class models are much worse than the two-class models • In this paper, we show that for certain problems such as the gene knock-out experiments for understanding AHR(芳香巠基碳水化合物接受器) signalling pathway • minority one-class SVMs significantly outperform models learnt using examples from both classes.

  7. Support Vector Machines (1/4) • Given a training sequence (xi,yi) of binary n-vectorsand bipolar labels • Our aim is to find a “good” discriminating function • kernel machine:

  8. Support Vector Machines (2/4)

  9. Support Vector Machines (3/4)

  10. Support Vector Machines (4/4) • If the kernel k satisfies the Mercer theorem assumptions[7;24;25] then for the minimiser of (2) we have where • We shall be using the popular polynomial kernel

  11. Re-balancing of the Data -Sample Balancing • a • a • a 0:1

  12. Re-balancing of the Data -Weight Balancing • a • The case of “balanced proportions” achieved for B= 0. B= +1 representing the case of learning from positive examples only. Similarly, learning from negative class only is achieved for B= -1. is a parameter called a balance factor

  13. Experiments- Real World Data Collections • AHR-data set used for task 2 of KDD Cup 2002 • 芳香巠基碳水化合物的資料集 • for cancer research • three class: change, control, nc • Reuters data • 12902 documents

  14. Performance Measures • We have used AROC, the Area under the Receiver Operating Characteristic (ROC) curve as our main performance measure. • The trivial uniform random predictor has AROC of 0.5, while a perfect predictor has an AROC of 1. Xi from the negative class Xj from the positive class

  15. Experiments with Real World Data • The sizes of the data split training:test were • 50%:50% for the Reuters data • 70%:30% for the AHR-data

  16. Impact of Regularization Constant positive 1-calss – – – – – – – balanced 2-class –‧–‧–‧– un-balanced 2-class …………… negative 1-class

  17. Experiments with Sample Balancing

  18. Impact of feature selection (1/2) • feature selection methods: • DocFreq (Document frequency thresholding): 1 • ChiSqua(χ2): The measures the lack of independence between a feature and a class of interest. • MutInfo (Mutual Information) • InfGain (Information gain): term goodness measure • We have used all of the minority cases and sampled the majority cases at different mixture ratios (MajorityOnly sample balancing).

  19. Impact of feature selection (2/2) two

  20. Experiments with Weight Balancing • In order to understand if the impact of negative examples may be reduced using the balance factor B in Equation (4) • Tests on AHR data • Tests on Reuters

  21. Tests on AHR data • B= 0 : balanced 2-class • B= +1 : positive 1-class • B= -1 : negative 1-class

  22. Tests on Reuters ------- balanced 2-class positive 1-class

  23. Experiments with Synthetic Data • S1: ninf=1; nnoise=999 • S2: ninf=10; nnoise=990 • S3 : ninf=1; nnoise=19 polynomial kernels : linear kernel polynomial kernels: non-linear kernel two

  24. Discussion

  25. Conclusion • The Reuters dataset • provides quite good results but using both classes always produces better results • The AHR data set • The positive one-class learners performing significantly better than two-class learners. • One-class learning from positive class examples can be a very robust classification technique when dealing with very unbalanced data and high dimensional noisy feature space.

  26. Personal Opinion • Strength • many experiments • Weakness • equations are not clear • Application • SVM • document classification • Image retrieval

More Related