1 / 30

A Comparative Study of Kernel Methods for Classification Applications

A Comparative Study of Kernel Methods for Classification Applications. Yan Liu Oct 21, 2003. Introduction. Support Vector Machines Text classification Protein classification Various kernels Standard kernels Linear kernels, polynomial kernels, RBF kernels

erik
Download Presentation

A Comparative Study of Kernel Methods for Classification Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003

  2. Introduction • Support Vector Machines • Text classification • Protein classification • Various kernels • Standard kernels • Linear kernels, polynomial kernels, RBF kernels • Other application-oriented kernels • Latent semantic kernels • Fisher-kernels, string kernels and etc • Problem Definition • Rare-class problem (unbalanced data) • Noisy data problem • Multi-label problem

  3. Text Classification • Kernels • Linear kernels • Latent semantic kernels • Problem Focus • Rare-class problem • Multi-label problem • Noisy data problem • Dataset • Reuters21578 dataset

  4. Data Analysis: Reuters-21578 • The corpus consists of 7769 document in training and 3019 document in testing mapped to 90 categories • Rare-class problem (Unbalanced Data)

  5. Data Analysis: Reuters-21578 • Multi-label problem • Definition: one document belongs more than one categories • The averaged doc-to-category ratio is 1.271 for training set

  6. Methodology and Schedule • Analyze the properties of the application data and propose conjectures on the possible behaviors • Projection from high-dimensional data to low-dimensional • Singular Value Decomposition (SVD) • Reduced-Rank Linear Discriminative Analysis (LDA) • Propose hypothesis • Work on synthetic datasets to testify hypothesis • Generate low-dimensional synthetic data with similar properties of real data • Testify hypothesis • Map from synthetic data to real application data

  7. Case 1: Multi-label ProblemReuters-21578 • Conceptually two cases: (1) whole v.s. part • Wheat v.s. Grain

  8. Case 1: Multi-label ProblemSynthetic Data • Data generation • Gaussian mixture models • 200 data points in total • Class 1: red • Class 2: green • Class 1 & 2: blue • Hypothesis • Linear kernel: predict everything as class 1 • LSI kernel: hard to say, maybe similar as linear kernel? • RBF: fit the data better than linear kernel?

  9. Case 1: Multi-label ProblemResults on Synthetic Data • Linear kernel results: • Class 1: Prec: 0.985000 Rec: 1.000000 F1: 0.992443 • Class 2: Prec: 0 Rec: 0 F1: 0 • Class 1& 2: Rec: 0, Prec: 0 • Discussion • The results on Class 1&2 depends on the proportion mp • mp = # of multi-label examples/ # of training examples • If mp > 0.5, then Rec = 1.00, Prec = mp • If mp < 0.5, then Rec = 0, Prec = 0

  10. Case 1: Multi-label ProblemResults on Synthetic Data • LSI kernel results: • Exactly the same as linear kernel • Class 1: Prec: 0.985000 Rec: 1.000000 F1: 0.992443 • Class 2: Prec: 0 Rec: 0 F1: 0 • Class 1& 2: Rec: 0, Prec: 0 • Discussion • It seems that LSI performs similarly as the linear kernel • In the real application, LSI might have different behaviors

  11. Case 1: Multi-label ProblemResults on Synthetic Data • RBF kernel results • Class1: Prec: 0.985000 Rec: 1.000000 F1: 0.992443 • Class 2 :Prec: 0.854167 Rec: 0.512500 F1: 0.640625 • Class 1 & 2: Prec: 0.791667 Rec: 0.493506 • Discussion • RBF kernel fits the data very well

  12. Case 1: Multi-label ProblemReuters-21578 • Conceptually two cases: (2) Share concepts • Wheat v.s. Soy-bean

  13. Case 1: Multi-label ProblemSynthetic Data • Data generation • Gaussian mixture models • 200 data points in total • Class 1: red • Class 2: green • Class 1 & 2: blue • Hypothesis • Linear kernel: might work well for this case? • LSI kernel:also might work for this case? • RBF: might overfit?

  14. Case 1: Multi-label ProblemResults on Synthetic Data • Linear kernel results: • Class 1: Prec: 0.918699 Rec: 0.869231 F1: 0.893281 • Class 2: Prec: 0.938462 Rec: 0.938462 F1: 0.938462 • Class 1& 2: Rec: 0.300000, Prec: 0.391304

  15. Case 1: Multi-label ProblemResults on Synthetic Data • LSI kernel results: • Class 1: Prec: 0.928000 Rec: 0.892308 F1: 0.909804 • Class 2: Prec: 0.938462 Rec: 0.938462 F1: 0.938462 • Class 1& 2: Rec: 0.366667, Prec: 0.440000

  16. Case 1: Multi-label ProblemResults on Synthetic Data • RBF kernel results: • Class 1: Prec: 0.934426 Rec: 0.876923 F1: 0.904762 • Class 2: Prec: 0.938462 Rec: 0.938462 F1: 0.938462 • Class 1& 2: Rec: 0.333333, Prec: 0.454545

  17. Case 1: Multi-label ProblemResults on Synthetic Data • Discussion on results • Linear kernel performs reasonably well • LSI kernel gains more than linear kernel by separating the data in the right direction • RBF kernel tends to fit the data

  18. Case 2: Rare class problemReuters-21578 • CPU v.s. Wheat

  19. Case 2: Rare class problemSynthetic Data • Data generation • Gaussian mixture models • 103 data points in total • Class 1: red • Class 2: green • Hypothesis • Both linear kernel and LSI kernel seem to perform reasonably well • RBF might overfit?

  20. Case 2: Rare class problemSynthetic Data Results • Results • Question: where is the problem? Linear LSI RBF

  21. Case 2: Rare class problem Synthetic Data Results • Discussion • The problem lies in the SVM classifier instead of the kernel. • SVM tries to maximize the margin. • Solution • Set the cost-function in SVM classifier • Tune threshold instead of using the default 0 • Up-sampling, down-sampling, and ensemble approaches • The analysis for different kernels will be difficult

  22. Case 3: Noisy data problemSynthetic Data • Data generation • Gaussian mixture models • 200 data points in total • Class 1: red • Class 2: green • Noise data: blue • Hypothesis • Linear kernel tends to be robust to noise • Little change for LSI kernel since the transformation is independent of the class labels • RBF might overfit?

  23. Case 3: Noisy data problemSynthetic Data Results • Results • Linear kernel and LSI kernel are robust to the noise • RBF kernel tends to overfit Linear LSI RBF

  24. Summary • Multi-class problem • Case 1:whole v.s. part • Linear and LSI depends on the data distribution, but can work a lot better if we know the category hierarchy • RBF seems to work better • Case 2: share concepts • LSI works a little bit better than linear kernel • Rare class problem • Problem lies in the SVM classifier, more serious on the thresholding problem • Noisy data • Linear kernel and LSI are robust to noise • RBF might overfit

  25. Next step • Work on the real application dataset and testify the hypothesis • Reuters-21578 • A subset of RCV-1 • Focus more on the multi-label problem

  26. Protein Family Classification • Kernel selection • Fisher-kernels • String kernels • Problem Focus • Rare-class problem • Noisy data problem • Dataset • GPCR family classification dataset

  27. Data Analysis: GPCR family classification • The dataset consists of  1356 sequences by 13 classes, one sequence has one and only one label. • Rare-class problem (Unbalanced Data)

  28. Kernel methods revisted • Fisher-kernel • Build a HMM model for each family • Compute the fisher scores for each parameter in the HMM • Use scores as features and predict by SVM with RBF kernel • String kernel • K-spectrum Kernel: • all possible contiguous subsequences of length k (k = 3, 4) • Similar as using N-gram • Mismatch string kernel • An extension of string kernel that allows mismatch • K = 5, 6

  29. Proposed Kernel-PSA kernel • Intuition • The kernel defines the similarity between two sequences in the Hilbert feature space • The similarity between two sequences is one of the basic problem in bioinformatics and well-studied. • Proposed kernel • K(x,y) is the pairwise sequence alignment scores

  30. Experimental results and on-going work • Experimental results • Two-way cross-validation • Pairwise sequence alignment using ClustalW • An accuracy of 0.9550 for the GPCR family classification over 13 classes and 0.9834 over Class ABCDE • SVM converges very fast • On-going work • Proof of semi-definite • Connection between string kernel and fisher kernel • Experiments on other datasets

More Related