Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001

Proximal Support Vector Machine ClassifiersKDD 2001San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of Wisconsin - Madison

Key Contributions • Fast new support vector machine classifier • An order of magnitude faster than standard classifiers • Extremely simple to implement • 4 lines of MATLAB code • NO optimization packages (LP,QP) needed

Outline of Talk • (Standard) Support vector machines (SVM) • Classify by halfspaces • Proximal support vector machines (PSVM) • Classify by proximity to planes • Linear PSVM classifier • Nonlinear PSVM classifier • Full and reduced kernels • Numerical results • Correctness comparable to standard SVM • Much faster classification! • 2-million points in 10-space in 21 seconds • Compared to over 10 minutes for standard SVM

Support Vector MachinesMaximizing the Margin between Bounding Planes A+ A-

Proximal Vector MachinesFitting the Data using two parallel Bounding Planes A+ A-

Given m points in n dimensional space • Represented by an m-by-n matrix A • Membership of each in class +1 or –1 specified by: • An m-by-m diagonal matrix D with +1 & -1 entries • Separate by two bounding planes, • More succinctly: where e is a vector of ones. Standard Support Vector MachineAlgebra of 2-Category Linearly Separable Case

Solve the quadratic program for some : min (QP) , s. t. where , denotes or membership. • Marginis maximized by minimizing Standard Support Vector Machine Formulation

min (QP) s. t. Solving for in terms of and gives: min PSVM Formulation We have from the QP SVM formulation: This simple, but critical modification, changes the nature of the optimization problem tremendously!!

Advantages of New Formulation • Objective function remains strongly convex • An explicit exact solution can be written in terms of the problem data • PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space • Exact leave-one-out-correctness can be obtained in terms of problem data

We want to solve: min Linear PSVM • Setting the gradient equal to zero, gives a nonsingular system of linear equations. • Solution of the system gives the desired PSVM classifier

Here, • The linear system to solve depends on: which is of the size is usually much smaller than Linear PSVM Solution

Input Define Calculate Solve Classifier: Linear Proximal SVM Algorithm

Linear PSVM: (Linear separating surface: ) : min (QP) s. t. . Maximizing the margin By QP “duality”, in the “dual space” , gives: min min • Replace by a nonlinear kernel Nonlinear PSVM Formulation

The nonlinear classifier: : • Gaussian (Radial Basis) Kernel • The represents the “similarity” -entryof of data points and The Nonlinear Classifier • Where K is a nonlinear kernel, e.g.:

Similar to the linear case, setting the gradient equal to zero, we obtain: Defining slightly different: • Here, the linear system to solve is of the size Nonlinear PSVM However, reduced kernels techniques can be used (RSVM) to reduce dimensionality.

Input Define Calculate Classifier: Classifier: Linear Proximal SVM Algorithm Non Solve

Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu)% PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma% [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Linear PSVM Comparisons with Other SVMsMuch Faster, Comparable Correctness

Linear PSVM vs LSVM 2-Million Dataset Over 30 Times Faster

Nonlinear PSVM: Spiral Dataset94 Red Dots & 94 White Dots

Nonlinear PSVM Comparisons * A rectangular kernel was used of size 8124 x 215

Conclusion • PSVM is an extremely simple procedure for generating linear and nonlinearclassifiers • PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space for a linear classifier • Comparable test set correctness to standard SVM • Much faster than standard SVMs : typically an order of magnitude less.

Future Work • Extension of PSVM to multicategory classification • Massive data classification using an incremental PSVM • Parallel formulation and implementation of PSVM

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001

Presentation Transcript

Support Vector Machine

Support vector machine

Broadband Wireless World Forum San Francisco – February 2001

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001

Support Vector Machine

Support Vector Machine

April 26, 2001

August- 2001

Support Vector Machine

Concave Minimization for Support Vector Machine Classifiers

Alternatives 2001: Research Plank Summary August 23 - 26, 2001 Philadelphia, Pennsylvania

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001

August 2001

DRIVING A SUCCESSFUL IPO San Francisco February 26, 2001

October 26, 2001

Support Vector Machine

KDD CUP 2001 Task 1: Thrombin

Support Vector Machine Classifiers

Support Vector Machine