1 / 22

Nonlinear Data Discrimination via Generalized Support Vector Machines

Nonlinear Data Discrimination via Generalized Support Vector Machines. David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison. www.cs.wisc.edu/~musicant. Outline. The linear support vector machine (SVM) Linear kernel Generalized support vector machine (GSVM)

mick
Download Presentation

Nonlinear Data Discrimination via Generalized Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Nonlinear Data Discriminationvia Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison www.cs.wisc.edu/~musicant

  2. Outline • The linear support vector machine (SVM) • Linear kernel • Generalized support vector machine (GSVM) • Nonlinear indefinite kernel • Linear Programming Formulation of GSVM • MINOS • Quadratic Programming Formulation of GSVM • Successive Overrelaxation (SOR) • Numerical comparisons • Conclusions

  3. The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case A+ A- Separating Surface:

  4. Separate by two bounding planes: such that: • More succinctly:where e is a vector of ones. The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case • Given m points in the n dimensional space Rn • Represented by an mx n matrix A • Membership of each point Ai in the classes 1 or -1 is specified by: • An m x m diagonal matrix D with along its diagonal

  5. Preliminary Attempt at the (Linear) Support Vector Machine:Robust Linear Programming • Solve the following mathematical program: where y = nonnegative error (slack) vector • Note: y = 0 if convex hulls of A+ and A- do not intersect.

  6. The (Linear) Support Vector MachineMaximize Margin Between Separating Planes A+ A-

  7. The (Linear) Support Vector Machine Formulation • Solve the following mathematical program: where y = nonnegative error (slack) vector • Note: y = 0 if convex hulls of A+ and A- do not intersect.

  8. Linear Support Vector Machine (linear separating surface ) • By “duality”, set (linear separating surface ) • Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel . Nonlinear separating surface: GSVM: Generalized Support Vector MachineLinear Programming Formulation

  9. Examples of Kernels • Examples • Polynomial Kernel • denotes componentwise exponentiation as in MATLAB • Radial Basis Kernel • Neural Network Kernel` • denotes the step functioncomponentwise.

  10. A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points in R2Separate 486 Asterisks from 514 Dots

  11. Previous Work

  12. Polynomial Kernel:

  13. Large Margin Classifier(SOR) Reformulation in Space A+ A-

  14. (SOR) Linear Support Vector MachineQuadratic Programming Formulation • Solve the following mathematical program: • The quadratic term here maximizes the distance between the bounding planes in the space

  15. Substitute in a kernel for the AA’ term: • Linear separating surface: Introducing a Nonlinear Kernel • The Wolfe Dual for the SOR Linear SVM is: • Linear separating surface:

  16. SVM Optimality Conditions • Define • Then dual SVM becomes much simpler! • Gradient Projection necessary & sufficient optimality condition: • denotes projecting u onto the region

  17. SOR Algorithm & Convergence • Above optimality conditions lead to the SOR algorithm: • Remember, optimality conditions are expressed as: • SOR Linear Convergence [Luo-Tseng 1993]: • The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem • The objective function values converge Q-linearlyto

  18. Numerical Testing • Comparison of Linear & Nonlinear Kernels using • Linear Programming • Quadratic Programming - SOR Formulations • Data Sets: • UCI Liver Disorders: 345 points in R6 • Bell Labs Checkerboard: 1000 points in R2 • Gaussian Synthetic: 1000 points in R32 • SCDS Synthetic: 1 million points in R32 • Massive Synthetic: 10 million points in R32 • Machines: • Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM • Total: 64 Processors, 8 Gig RAM

  19. Comparison of Linear & Nonlinear SVMsLinear Programming Generated • Nonlinear kernels yield better training and testing set correctness

  20. SOR Results • Comparison of linear and nonlinear kernels • Examples of training on massive data: • 1 million point dataset generated by SCDS generator: • Trained completely in 9.7 hours • Tuning set reached 99.7% of final accuracy in 0.3 hours • 10 million point randomly generated dataset: • Tuning set reached 95% of final accuracy in 14.3 hours • Under 10,000 iterations

  21. Conclusions • Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs • Nonlinear separating surfaces improve generalization over linear ones • SOR can handle very large problems not (easily) solveable by other methods • SOR scales up with virtually no changes • Future directions • Parallel SOR for very large problems not resident in memory • Massive multicategory discrimination via SOR • Support vector regression

  22. Questions?

More Related