nonlinear data discrimination via generalized support vector machines
Download
Skip this Video
Download Presentation
Nonlinear Data Discrimination via Generalized Support Vector Machines

Loading in 2 Seconds...

play fullscreen
1 / 22

Nonlinear Data Discrimination via Generalized Support Vector Machines - PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Nonlinear Data Discrimination via Generalized Support Vector Machines. David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison. www.cs.wisc.edu/~musicant. Outline. The linear support vector machine (SVM) Linear kernel Generalized support vector machine (GSVM)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Nonlinear Data Discrimination via Generalized Support Vector Machines' - mick


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
nonlinear data discrimination via generalized support vector machines

Nonlinear Data Discriminationvia Generalized Support Vector Machines

David R. Musicant and Olvi L. Mangasarian

University of Wisconsin - Madison

www.cs.wisc.edu/~musicant

outline
Outline
  • The linear support vector machine (SVM)
    • Linear kernel
  • Generalized support vector machine (GSVM)
    • Nonlinear indefinite kernel
  • Linear Programming Formulation of GSVM
    • MINOS
  • Quadratic Programming Formulation of GSVM
    • Successive Overrelaxation (SOR)
  • Numerical comparisons
  • Conclusions
the discrimination problem the fundamental 2 category linearly separable case
The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case

A+

A-

Separating Surface:

the discrimination problem the fundamental 2 category linearly separable case4

Separate by two bounding planes: such that:

  • More succinctly:where e is a vector of ones.
The Discrimination ProblemThe Fundamental 2-Category Linearly Separable Case
  • Given m points in the n dimensional space Rn
  • Represented by an mx n matrix A
  • Membership of each point Ai in the classes 1 or -1 is specified by:
    • An m x m diagonal matrix D with along its diagonal
preliminary attempt at the linear support vector machine robust linear programming
Preliminary Attempt at the (Linear) Support Vector Machine:Robust Linear Programming
  • Solve the following mathematical program:

where y = nonnegative error (slack) vector

  • Note: y = 0 if convex hulls of A+ and A- do not intersect.
the linear support vector machine formulation
The (Linear) Support Vector Machine Formulation
  • Solve the following mathematical program:

where y = nonnegative error (slack) vector

  • Note: y = 0 if convex hulls of A+ and A- do not intersect.
gsvm generalized support vector machine linear programming formulation

Linear Support Vector Machine (linear separating surface )

  • By “duality”, set (linear separating surface )
  • Nonlinear Support Vector Machine: Replace AA’ by nonlinear kernel . Nonlinear separating surface:
GSVM: Generalized Support Vector MachineLinear Programming Formulation
examples of kernels
Examples of Kernels
  • Examples
    • Polynomial Kernel
      • denotes componentwise exponentiation as in MATLAB
    • Radial Basis Kernel
    • Neural Network Kernel`
      • denotes the step functioncomponentwise.
slide10
A Nonlinear Kernel ApplicationCheckerboard Training Set: 1000 Points in R2Separate 486 Asterisks from 514 Dots
sor linear support vector machine quadratic programming formulation
(SOR) Linear Support Vector MachineQuadratic Programming Formulation
  • Solve the following mathematical program:
  • The quadratic term here maximizes the distance between the bounding planes in the space
introducing a nonlinear kernel

Substitute in a kernel for the AA’ term:

  • Linear separating surface:
Introducing a Nonlinear Kernel
  • The Wolfe Dual for the SOR Linear SVM is:
  • Linear separating surface:
svm optimality conditions
SVM Optimality Conditions
  • Define
  • Then dual SVM becomes much simpler!
  • Gradient Projection necessary & sufficient optimality condition:
  • denotes projecting u onto the region
sor algorithm convergence
SOR Algorithm & Convergence
  • Above optimality conditions lead to the SOR algorithm:
  • Remember, optimality conditions are expressed as:
  • SOR Linear Convergence [Luo-Tseng 1993]:
    • The iterates of the SOR algorithm converge R-linearly to a solution of the dual problem
    • The objective function values converge Q-linearlyto
numerical testing
Numerical Testing
  • Comparison of Linear & Nonlinear Kernels using
    • Linear Programming
    • Quadratic Programming - SOR Formulations
  • Data Sets:
    • UCI Liver Disorders: 345 points in R6
    • Bell Labs Checkerboard: 1000 points in R2
    • Gaussian Synthetic: 1000 points in R32
    • SCDS Synthetic: 1 million points in R32
    • Massive Synthetic: 10 million points in R32
  • Machines:
    • Cluster of 4 Sun Enterprise E6000 machines each consisting of 16 UltraSPARC II 250 MHz Processors with 2 Gig RAM
      • Total: 64 Processors, 8 Gig RAM
comparison of linear nonlinear svms linear programming generated
Comparison of Linear & Nonlinear SVMsLinear Programming Generated
  • Nonlinear kernels yield better training and testing set correctness
sor results
SOR Results
  • Comparison of linear and nonlinear kernels
  • Examples of training on massive data:
    • 1 million point dataset generated by SCDS generator:
      • Trained completely in 9.7 hours
      • Tuning set reached 99.7% of final accuracy in 0.3 hours
    • 10 million point randomly generated dataset:
      • Tuning set reached 95% of final accuracy in 14.3 hours
      • Under 10,000 iterations
conclusions
Conclusions
  • Linear programming and successive overrelaxation can generate complex nonlinear separating surfaces via GSVMs
  • Nonlinear separating surfaces improve generalization over linear ones
  • SOR can handle very large problems not (easily) solveable by other methods
  • SOR scales up with virtually no changes
  • Future directions
    • Parallel SOR for very large problems not resident in memory
    • Massive multicategory discrimination via SOR
    • Support vector regression
ad