Loading in 2 Seconds...

Nonlinear Data Discrimination via Generalized Support Vector Machines

Loading in 2 Seconds...

- By
**mick** - Follow User

- 124 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Nonlinear Data Discrimination via Generalized Support Vector Machines' - mick

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Nonlinear Data Discriminationvia Generalized Support Vector Machines

David R. Musicant and Olvi L. Mangasarian

University of Wisconsin - Madison

www.cs.wisc.edu/~musicant

Outline

- The linear support vector machine (SVM)
- Linear kernel
- Generalized support vector machine (GSVM)
- Nonlinear indefinite kernel
- Linear Programming Formulation of GSVM
- MINOS
- Quadratic Programming Formulation of GSVM
- Successive Overrelaxation (SOR)
- Numerical comparisons
- Conclusions

Separate by two bounding planes: such that:

- More succinctly:where e is a vector of ones.

- Given m points in the n dimensional space Rn
- Represented by an mx n matrix A
- Membership of each point Ai in the classes 1 or -1 is specified by:
- An m x m diagonal matrix D with along its diagonal

Preliminary Attempt at the (Linear) Support Vector Machine:Robust Linear Programming

- Solve the following mathematical program:

where y = nonnegative error (slack) vector

- Note: y = 0 if convex hulls of A+ and A- do not intersect.

The (Linear) Support Vector Machine Formulation

- Solve the following mathematical program:

where y = nonnegative error (slack) vector

- Note: y = 0 if convex hulls of A+ and A- do not intersect.

Linear Support Vector Machine (linear separating surface )

- By “duality”, set (linear separating surface )

- Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel . Nonlinear separating surface:

Examples of Kernels

- Examples
- Polynomial Kernel
- denotes componentwise exponentiation as in MATLAB
- Radial Basis Kernel
- Neural Network Kernel`
- denotes the step functioncomponentwise.

A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points in R2Separate 486 Asterisks from 514 Dots

(SOR) Linear Support Vector MachineQuadratic Programming Formulation

- Solve the following mathematical program:

- The quadratic term here maximizes the distance between the bounding planes in the space

Substitute in a kernel for the AA’ term:

- Linear separating surface:

- The Wolfe Dual for the SOR Linear SVM is:

- Linear separating surface:

SVM Optimality Conditions

- Define
- Then dual SVM becomes much simpler!

- Gradient Projection necessary & sufficient optimality condition:

- denotes projecting u onto the region

SOR Algorithm & Convergence

- Above optimality conditions lead to the SOR algorithm:

- Remember, optimality conditions are expressed as:

- SOR Linear Convergence [Luo-Tseng 1993]:
- The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem
- The objective function values converge Q-linearlyto

Numerical Testing

- Comparison of Linear & Nonlinear Kernels using
- Linear Programming
- Quadratic Programming - SOR Formulations
- Data Sets:
- UCI Liver Disorders: 345 points in R6
- Bell Labs Checkerboard: 1000 points in R2
- Gaussian Synthetic: 1000 points in R32
- SCDS Synthetic: 1 million points in R32
- Massive Synthetic: 10 million points in R32
- Machines:
- Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM
- Total: 64 Processors, 8 Gig RAM

Comparison of Linear & Nonlinear SVMsLinear Programming Generated

- Nonlinear kernels yield better training and testing set correctness

SOR Results

- Comparison of linear and nonlinear kernels

- Examples of training on massive data:
- 1 million point dataset generated by SCDS generator:
- Trained completely in 9.7 hours
- Tuning set reached 99.7% of final accuracy in 0.3 hours
- 10 million point randomly generated dataset:
- Tuning set reached 95% of final accuracy in 14.3 hours
- Under 10,000 iterations

Conclusions

- Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs
- Nonlinear separating surfaces improve generalization over linear ones
- SOR can handle very large problems not (easily) solveable by other methods
- SOR scales up with virtually no changes
- Future directions
- Parallel SOR for very large problems not resident in memory
- Massive multicategory discrimination via SOR
- Support vector regression

Download Presentation

Connecting to Server..