Support Vector Machines: Hype or Hallelujah?

1 / 39

# Support Vector Machines: Hype or Hallelujah? - PowerPoint PPT Presentation

Support Vector Machines: Hype or Hallelujah?. Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst. http://www.rpi.edu/~bennek. Outline. Support Vector Machines for Classification Linear Discrimination Nonlinear Discrimination Extensions Application in Drug Design Hallelujah

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Support Vector Machines: Hype or Hallelujah?' - gavril

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Support Vector Machines:Hype or Hallelujah?

Kristin Bennett

Math Sciences Dept

Rensselaer Polytechnic Inst.

http://www.rpi.edu/~bennek

M2000

Outline
• Support Vector Machines for Classification
• Linear Discrimination
• Nonlinear Discrimination
• Extensions
• Application in Drug Design
• Hallelujah
• Hype

M2000

Support Vector Machines (SVM)

Key Ideas:

• “Maximize Margins”
• “Do the Dual”
• “Construct Kernels”

A methodology for inference based on Vapnik’s

Statistical Learning Theory.

M2000

Find using quadratic program

Many existing and new solvers.

M2000

Best Linear Separator:Supporting Plane Method

Maximize distance

Between two parallel

supporting planes

Distance

= “Margin”

=

M2000

Dual of Closest Points Method is Support Plane Method

Solution only depends on support vectors:

M2000

Statistical Learning Theory
• Misclassification error and the function complexity bound generalization error.
• Maximizing margins minimizes complexity.
• “Eliminates” overfitting.
• Solution depends only on Support Vectors not number of attributes.

M2000

Margins and Complexity

Skinny margin

is more flexible

thus more complex.

M2000

Margins and Complexity

Fat margin

is less complex.

M2000

Linearly Inseparable Case

Convex Hulls Intersect!

Same argument

won’t work.

M2000

Reduced Convex Hulls Don’t Intersect

upper bound D

M2000

Find Closest Points Then Bisect

No change except for D.

D determines number of Support Vectors.

M2000

Linearly Inseparable Case:Supporting Plane Method

Just add non-negative error

vector z.

M2000

Dual of Closest Points Method is Support Plane Method

Solution only depends on support vectors:

M2000

Nonlinear Classification: Map to higher dimensional space

IDEA: Map each point to higher dimensional feature space and

construct linear discriminant in the higher dimensional space.

Dual SVM becomes:

M2000

Generalized Inner Product

By Hilbert-Schmidt Kernels (Courant and Hilbert 1953)

for certain  and K, e.g.

M2000

Final Classification via Kernels

The Dual SVM becomes:

M2000

Final SVM Algorithm
• Solve Dual SVM QP
• Recover primal variable b
• Classify new x

Solution only depends on support vectors:

M2000

Support Vector Machines (SVM)
• Key Formulation Ideas:
• “Maximize Margins”
• “Do the Dual”
• “Construct Kernels”
• Generalization Error Bounds
• Practical Algorithms

M2000

SVM Extensions
• Regression
• Variable Selection
• Boosting
• Density Estimation
• Unsupervised Learning
• Novelty/Outlier Detection
• Feature Detection
• Clustering

M2000

Example in Drug Design
• Goal to predict bio-reactivity of molecules to decrease drug development time.
• Target is to predict the logarithm of inhibition concentration for site "A" on the Cholecystokinin (CCK) molecule.
• Constructs quantitative structure activity relationship (QSAR) model.

M2000

LCCKA Problem
• Training data – 66 molecules
• 323 original attributes are wavelet coefficients of TAE Descriptors.
• 39 subset of attributes selected by linear 1-norm SVM (with no kernels).
• For details see DDASSL project link off of http://www.rpi.edu/~bennek.
• Testing set results reported.

M2000

LCCK Prediction

Q2=.25

M2000

Many Other Applications
• Speech Recognition
• Data Base Marketing
• Quark Flavors in High Energy Physics
• Dynamic Object Recognition
• Knock Detection in Engines
• Protein Sequence Problem
• Text Categorization
• Breast Cancer Diagnosis
• See: http://www.clopinet.com/isabelle/Projects/

SVM/applist.html

M2000

Hallelujah!
• Generalization theory and practice meet
• General methodology for many types of problems
• Same Program + New Kernel = New method
• No problems with local minima
• Few model parameters. Selects capacity.
• Robust optimization methods.
• Successful Applications

BUT…

M2000

HYPE?
• Will SVMs beat my best hand-tuned method Z for X?
• Do SVM scale to massive datasets?
• How to chose C and Kernel?
• What is the effect of attribute scaling?
• How to handle categorical variables?
• How to incorporate domain knowledge?
• How to interpret results?

M2000

Support Vector Machine Resources
• http://www.support-vector.net/
• http://www.kernel-machines.org/
• Links off my web page:

http://www.rpi.edu/~bennek

M2000