1 / 23

RSVM: Reduced Support Vector Machines

RSVM: Reduced Support Vector Machines. Y.-J. Lee & O. L. Mangasarian. University of Wisconsin-Madison. First SIAM International Conference on Data Mining Chicago, April 6, 2001. Computational: Handling massive kernel matrix: . Much smaller rectangular matrix. Reduced kernel :.

aman
Download Presentation

RSVM: Reduced Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian University of Wisconsin-Madison First SIAM International Conference on Data Mining Chicago, April 6, 2001

  2. Computational: Handling massive kernel matrix: Much smaller rectangular matrix • Reduced kernel : : 1% to 10% of • e.g. 32,562-point dataset classified in 17 minutes compared to 2.15 hours by a standard algorithm (SMO) Outline of Talk • What is a support vector machine (SVM) classifier? • The smooth support vector machine (SSVM) • A new SVM solvable without an optimization package • Difficulties with nonlinear SVMs: • Storage: Separating surface depends on almost entire dataset • Reduced Support Vector Machines (RSVMs) • Speeds computation & reduces storage • Numerical Results

  3. What is a Support Vector Machine? • An optimally defined surface • Typically nonlinear in the input space • Linear in a higher dimensional space • Implicitly defined by a kernel function

  4. What are Support Vector Machines Used For? • Classification • Regression & Data Fitting • Supervised & Unsupervised Learning (Will concentrate on classification)

  5. Geometry of the Classification Problem2-Category Linearly Separable Case A+ A-

  6. Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-

  7. Solve the quadratic program for some : min (QP) s. t. , where , denotes or membership. • Marginis maximized by minimizing Support Vector Machines Formulation

  8. min (QP) s. t. At the solution of (QP) : , where Hence (QP) is equivalent to the nonsmooth SVM: min SVM as an Unconstrained Minimization Problem

  9. Replacing the plus function in the nonsmooth , gives our SSVM: SVM by the smooth min is an accurate smoothapproximation ,obtained by integrating the sigmoid function of neural networks. (sigmoid = smoothed step) Here, of • The solution of SSVM converges to the solution of goes to infinity. nonsmooth SVM as (Typically, ) SSVM: The Smooth Support Vector Machine

  10. in SSVM: • Use a nonlinear kernel min is fully dense • The kernel matrix • Each iteration solves m+1 linear equations in m+1 variables Nonlinear Smooth Support Vector Machine Nonlinear Separating Surface: • Use Newton algorithm to solve the problem • Nonlinear separating surface depends on entire dataset :

  11. is an integer: : • Polynomial Kernel ) (Linear Kernel : : • Gaussian (Radial Basis) Kernel Examples of Kernels

  12. isfully dense • The nonlinear kernel • Runs out of memory while storing kernel matrix numbers • Long CPU time to compute • Computational complexity depends on • Complexity of nonlinear SSVM Difficulties with Nonlinear SVM for Large Problems • Separating surface depends on almost entire dataset • Need to store the entire dataset after solving the problem

  13. of • Choose a small random sample • The small random sample is a representative sample of the entire dataset is1% to 10%of the rows of • Typically by • Replace with corresponding in nonlinear SSVM numbers for • Only need to compute and store the rectangular kernel • Computational complexity reduces to • The nonlinear separator only depends on gives lousy results! Using Overcoming Computational & Storage DifficultiesUse a Rectangular Kernel

  14. of (i) Choose a random subset matrix entire data matrix (ii) Solvethe following problem by the Newton method with corresponding : min (iii) The separating surface is defined by the optimal in step(ii): solution Reduced Support Vector Machine AlgorithmNonlinear Separating Surface:

  15. is a representative sample of the entire dataset • Need not be a subset of • A good selectionof may generate a classifier using very small : • Possible ways to choose random rows from the entire dataset • Choose such that the distance between its rows • Choose exceeds a certain tolerance and as • Use k cluster centers of How to Choose in RSVM?

  16. A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points inSeparate 486 Asterisks from514 Dots

  17. Conventional SVM Result on Checkerboard Using 50 Randomly Selected Points Out of 1000

  18. RSVM Result on Checkerboard Using SAME 50 Random Points Out of 1000

  19. RSVM on Moderate Sized Problems(Best Test Set Correctness %, CPU seconds)

  20. RSVM on Large UCI Adult DatasetStandard Deviation over 50 Runs = 0.001

  21. CPU Times on UCI Adult DatasetRSVM, SMO and PCGC with a Gaussian Kernel

  22. CPU Time Comparison on UCI DatasetRSVM, SMO and PCGC with a Gaussian Kernel Time( CPU sec. ) Training Set Size

  23. Same or better than full dataset • Much better than randomly chosen subset • Rectangular kernel : Conclusion • RSVM : An effective classifier for large datasets • Classifier uses 10% or less of dataset • Can handle massive datasets • Much faster than other algorithms • Test set correctness: • Novel practical idea • Applicable to all nonlinear kernel problems

More Related