Sparse Kernel Machines. Introduction. Kernel function k(xn,xm) must be evaluated for all pairs of training points Advantageous (computationally) to look at kernel algorithms that have sparse solutions new predictions depend only on kernel function evaluated at a few training points
Sparse Kernel Machines
Training data comprises
SVM tries to minimize the generalization error by introducing the concept of a margin i.e. smallest distance between the decision boundary and any of the samples
tny(xn) > 0 for all n
the parameter ν, which replaces C, can beinterpreted as both an upper bound on the fraction of margin errors (points for whichξn > 0 and hence which lie on the wrong side of the margin boundary and which mayor may not be misclassified) and a lower bound on the fraction of support vectors.