- 177 Views
- Uploaded on
- Presentation posted in: General

Support Vector Machines: A Survey

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Support Vector Machines: A Survey

Qiang Yang, for ICDM 2006 Panel

http://www.cse.ust.hk/~qyang

Partially based on slides from Prof Andrew Moore at CMU: http://www.cs.cmu.edu/~awm/tutorials

For ICDM Panel on 10 Best Algorithms

Problem:

Given a set of objects with known descriptive variables, and from known classes, i

{(Xi, yi), i=1, 2, … t}

Find

a discriminative function f(x) such that f(Xi)=yi.

SVM today:

A must try for most applications

Mathematically well founded

Robust to noise (non-support vectors ignored)

Works even for dozens of training data

Among the most accurate algorithms

Has many extensions

Can be scaled up (ongoing work…)

Hard-Margin Linear Classifier

Maximize Margin

Support Vectors

Quadratic Programming

Soft-Margin Linear Classifier

Non-Linear Separable Problem and Kernels

XOR

Extension to

Regression for numerical class

Ranking rather than classification

SMO and Core vector machines

For ICDM Panel of 10 best algorithms

a

x

f

yest

f(x,w,b) = sign(w. x- b)

denotes +1

denotes -1

How would you classify this data?

For ICDM Panel of 10 best algorithms

a

x

f

yest

f(x,w,b) = sign(w. x- b)

denotes +1

denotes -1

Any of these would be fine..

..but which is best?

For ICDM Panel of 10 best algorithms

a

x

f

yest

f(x,w,b) = sign(w. x- b)

denotes +1

denotes -1

The maximum margin linear classifier is the linear classifier with the maximum margin.

Support Vectors are those datapoints that the margin pushes up against

For ICDM Panel of 10 best algorithms

- Can we understand the meaning of the SVM through a solid theoretical foundation?
- Can we extend the SVM formulation to handle cases where we allow errors to exist, when even the best hyperplane must admit some errors on the training data?
- Can we extend the SVM formulation so that it works in situations where the training data are not linearly separable?
- Can we extend the SVM formulation so that the task is to rank the instances in the likelihood of being a positive class member, rather than classification?
- Can we scale up the algorithm for finding the maximum margin hyperplanes to thousands and millions of instances?

For ICDM Panel of 10 best algorithms

- The problem of finding the maximum margin can be transformed to finding the roots of a Lagrangian
- Can be solved using quadratic programming (QP)

- Has solid theoretical foundations
- Future error < Training error + C*h1/2
- h=VC dimension, which is the max number of examples shattered by a function class f(x,a)

For ICDM Panel of 10 best algorithms

denotes +1

denotes -1

- When noise exists:
- Minimize
- w.w+ C (distance of error
- points to their
- correct place)

For ICDM Panel of 10 best algorithms

Q3: Non-linear transformation to Feature spaces

- General idea: introduce kernels

Φ: x→φ(x)

For ICDM Panel of 10 best algorithms

SVM for regression analysis

SV regression

SVM for ranking

Ranking SVM

Idea

For each order pair of instances (x1, x2) where x1 < x2 in ranking

Generate a new instance

<x1,x2,+1>

Train an SVM on the new data set

f(x,w,b) = w. x- b

For ICDM Panel of 10 best algorithms

- One of the initial drawbacks of SVM is its computational inefficiency.
- However, this problem is being solved with great success.
- SMO:
- break a large optimization problem into a series of smaller problems, each only involves a couple of carefully chosen variables
- The process iterates

- Core Vector Machines
- finding an approximate minimum enclosing ball of a set of instances.
- These instances, when mapped to an N-dimensional space, represent a core set
- Solving the SVM learning problem on these core sets can produce a good approximation solution in very fast speed.
- Train high quality SVM on millions of data in seconds

- SMO:

For ICDM Panel of 10 best algorithms

- An excellent tutorial on VC-dimension and Support Vector Machines:
- http://www.cs.cmu.edu/~awm/tutorials
- C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html

- The VC/SRM/SVM Bible: (Not for beginners including myself)
Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998

- Software: SVM-light, http://svmlight.joachims.org/, LibSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ SMO in Weka

For ICDM Panel of 10 best algorithms