Nonlinear data discrimination via generalized support vector machines
Download
1 / 22

Nonlinear Data Discrimination via Generalized Support Vector Machines - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Nonlinear Data Discrimination via Generalized Support Vector Machines. David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison. www.cs.wisc.edu/~musicant. Outline. The linear support vector machine (SVM) Linear kernel Generalized support vector machine (GSVM)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nonlinear Data Discrimination via Generalized Support Vector Machines' - mick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Nonlinear data discrimination via generalized support vector machines l.jpg

Nonlinear Data Discriminationvia Generalized Support Vector Machines

David R. Musicant and Olvi L. Mangasarian

University of Wisconsin - Madison

www.cs.wisc.edu/~musicant


Outline l.jpg
Outline

  • The linear support vector machine (SVM)

    • Linear kernel

  • Generalized support vector machine (GSVM)

    • Nonlinear indefinite kernel

  • Linear Programming Formulation of GSVM

    • MINOS

  • Quadratic Programming Formulation of GSVM

    • Successive Overrelaxation (SOR)

  • Numerical comparisons

  • Conclusions


The discrimination problem the fundamental 2 category linearly separable case l.jpg
The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case

A+

A-

Separating Surface:


The discrimination problem the fundamental 2 category linearly separable case4 l.jpg

The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case

  • Given m points in the n dimensional space Rn

  • Represented by an mx n matrix A

  • Membership of each point Ai in the classes 1 or -1 is specified by:

    • An m x m diagonal matrix D with along its diagonal


Preliminary attempt at the linear support vector machine robust linear programming l.jpg
Preliminary Attempt at the (Linear) Support Vector Machine: that:Robust Linear Programming

  • Solve the following mathematical program:

where y = nonnegative error (slack) vector

  • Note: y = 0 if convex hulls of A+ and A- do not intersect.


The linear support vector machine maximize margin between separating planes l.jpg
The (Linear) Support Vector Machine that:Maximize Margin Between Separating Planes

A+

A-


The linear support vector machine formulation l.jpg
The (Linear) Support Vector Machine Formulation that:

  • Solve the following mathematical program:

where y = nonnegative error (slack) vector

  • Note: y = 0 if convex hulls of A+ and A- do not intersect.


Gsvm generalized support vector machine linear programming formulation l.jpg

  • By “duality”, set (linear separating surface )

  • Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel . Nonlinear separating surface:

GSVM: Generalized Support Vector MachineLinear Programming Formulation


Examples of kernels l.jpg
Examples of Kernels )

  • Examples

    • Polynomial Kernel

      • denotes componentwise exponentiation as in MATLAB

    • Radial Basis Kernel

    • Neural Network Kernel`

      • denotes the step functioncomponentwise.


Slide10 l.jpg
A Nonlinear Kernel Application )Checkerboard Training Set: 1000 Points in R2Separate 486 Asterisks from 514 Dots




Large margin classifier sor reformulation in space l.jpg
Large Margin Classifier )(SOR) Reformulation in Space

A+

A-


Sor linear support vector machine quadratic programming formulation l.jpg
(SOR) Linear Support Vector Machine )Quadratic Programming Formulation

  • Solve the following mathematical program:

  • The quadratic term here maximizes the distance between the bounding planes in the space


Introducing a nonlinear kernel l.jpg

  • Linear separating surface:

Introducing a Nonlinear Kernel

  • The Wolfe Dual for the SOR Linear SVM is:

  • Linear separating surface:


Svm optimality conditions l.jpg
SVM Optimality Conditions )

  • Define

  • Then dual SVM becomes much simpler!

  • Gradient Projection necessary & sufficient optimality condition:

  • denotes projecting u onto the region


Sor algorithm convergence l.jpg
SOR Algorithm & Convergence )

  • Above optimality conditions lead to the SOR algorithm:

  • Remember, optimality conditions are expressed as:

  • SOR Linear Convergence [Luo-Tseng 1993]:

    • The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem

    • The objective function values converge Q-linearlyto


Numerical testing l.jpg
Numerical Testing )

  • Comparison of Linear & Nonlinear Kernels using

    • Linear Programming

    • Quadratic Programming - SOR Formulations

  • Data Sets:

    • UCI Liver Disorders: 345 points in R6

    • Bell Labs Checkerboard: 1000 points in R2

    • Gaussian Synthetic: 1000 points in R32

    • SCDS Synthetic: 1 million points in R32

    • Massive Synthetic: 10 million points in R32

  • Machines:

    • Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM

      • Total: 64 Processors, 8 Gig RAM


Comparison of linear nonlinear svms linear programming generated l.jpg
Comparison of Linear & Nonlinear SVMs )Linear Programming Generated

  • Nonlinear kernels yield better training and testing set correctness


Sor results l.jpg
SOR Results )

  • Comparison of linear and nonlinear kernels

  • Examples of training on massive data:

    • 1 million point dataset generated by SCDS generator:

      • Trained completely in 9.7 hours

      • Tuning set reached 99.7% of final accuracy in 0.3 hours

    • 10 million point randomly generated dataset:

      • Tuning set reached 95% of final accuracy in 14.3 hours

      • Under 10,000 iterations


Conclusions l.jpg
Conclusions )

  • Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs

  • Nonlinear separating surfaces improve generalization over linear ones

  • SOR can handle very large problems not (easily) solveable by other methods

  • SOR scales up with virtually no changes

  • Future directions

    • Parallel SOR for very large problems not resident in memory

    • Massive multicategory discrimination via SOR

    • Support vector regression