- 82 Views
- Uploaded on
- Presentation posted in: General

Sparse Kernel Machines

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Sparse Kernel Machines

- Kernel function k(xn,xm) must be evaluated for all pairs of training points
- Advantageous (computationally) to look at kernel algorithms that have sparse solutions
- new predictions depend only on kernel function evaluated at a few training points
- possible solution: support vector machines (SVM)

- Setup:
- y(x) = w’φ(x) + b, φ(x) = feature-space transformation
Training data comprises

- N input data vectors x1,...,xN
- N target vectors t1,...,tNtn∈{-1,1}

- new samples classified according to sign of y(x)
- assume linearly separable data for the moment
- y(x)>0 for points of class +1 (i.e. tn = +1)
- y(x)<0 for points having tn = -1
- therefore,tny(xn) > 0 for all training points

- There may be many solutions to separate the data
SVM tries to minimize the generalization error by introducing the concept of a margin i.e. smallest distance between the decision boundary and any of the samples

- Decision boundary is chosen to be that for which the margin is maximized
- decision boundary thus only dependent on points closest to it.

- If we consider only points that are correctly classified, then we have
tny(xn) > 0 for all n

- The distance of xn to the decision surface is then
- tny(xn)
- w = tn(wTφ(xn) + b)
- w
- ● Optimize w and b to maximize this distance => solve
- (7.2)
- (7.3)
- ● Quite hard to solve => need to get around it

- An alternative, equivalent formulation of the support vector machine, known as the ν-SVM

the parameter ν, which replaces C, can beinterpreted as both an upper bound on the fraction of margin errors (points for whichξn > 0 and hence which lie on the wrong side of the margin boundary and which mayor may not be misclassified) and a lower bound on the fraction of support vectors.