Support Vector Machines: A Survey

Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel http://www.cse.ust.hk/~qyang Partially based on slides from Prof Andrew Moore at CMU: http://www.cs.cmu.edu/~awm/tutorials For ICDM Panel on 10 Best Algorithms

Problem: Given a set of objects with known descriptive variables, and from known classes, i {(Xi, yi), i=1, 2, … t} Find a discriminative function f(x) such that f(Xi)=yi. SVM today: A must try for most applications Mathematically well founded Robust to noise (non-support vectors ignored) Works even for dozens of training data Among the most accurate algorithms Has many extensions Can be scaled up (ongoing work…) Hard-Margin Linear Classifier Maximize Margin Support Vectors Quadratic Programming Soft-Margin Linear Classifier Non-Linear Separable Problem and Kernels XOR Extension to Regression for numerical class Ranking rather than classification SMO and Core vector machines Support Vector Machines For ICDM Panel of 10 best algorithms

a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 How would you classify this data? For ICDM Panel of 10 best algorithms

a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 Any of these would be fine.. ..but which is best? For ICDM Panel of 10 best algorithms

a Maximum Margin x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the maximum margin. Support Vectors are those datapoints that the margin pushes up against For ICDM Panel of 10 best algorithms

SVM: Questions • Can we understand the meaning of the SVM through a solid theoretical foundation? • Can we extend the SVM formulation to handle cases where we allow errors to exist, when even the best hyperplane must admit some errors on the training data? • Can we extend the SVM formulation so that it works in situations where the training data are not linearly separable? • Can we extend the SVM formulation so that the task is to rank the instances in the likelihood of being a positive class member, rather than classification? • Can we scale up the algorithm for finding the maximum margin hyperplanes to thousands and millions of instances? For ICDM Panel of 10 best algorithms

Q1. Support Vector Machines: Foundations • The problem of finding the maximum margin can be transformed to finding the roots of a Lagrangian • Can be solved using quadratic programming (QP) • Has solid theoretical foundations • Future error < Training error + C*h1/2 • h=VC dimension, which is the max number of examples shattered by a function class f(x,a) For ICDM Panel of 10 best algorithms

denotes +1 denotes -1 Q2: extension to allow errors • When noise exists: • Minimize • w.w+ C (distance of error • points to their • correct place) For ICDM Panel of 10 best algorithms

Q3: Non-linear transformation to Feature spaces • General idea: introduce kernels Φ: x→φ(x) For ICDM Panel of 10 best algorithms

SVM for regression analysis SV regression SVM for ranking Ranking SVM Idea For each order pair of instances (x1, x2) where x1 < x2 in ranking Generate a new instance <x1,x2,+1> Train an SVM on the new data set Q4: extension to other tasks? f(x,w,b) = w. x- b For ICDM Panel of 10 best algorithms

Q5: Scale Up? • One of the initial drawbacks of SVM is its computational inefficiency. • However, this problem is being solved with great success. • SMO: • break a large optimization problem into a series of smaller problems, each only involves a couple of carefully chosen variables • The process iterates • Core Vector Machines • finding an approximate minimum enclosing ball of a set of instances. • These instances, when mapped to an N-dimensional space, represent a core set • Solving the SVM learning problem on these core sets can produce a good approximation solution in very fast speed. • Train high quality SVM on millions of data in seconds For ICDM Panel of 10 best algorithms

References • An excellent tutorial on VC-dimension and Support Vector Machines: • http://www.cs.cmu.edu/~awm/tutorials • C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html • The VC/SRM/SVM Bible: (Not for beginners including myself) Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998 • Software: SVM-light, http://svmlight.joachims.org/, LibSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ SMO in Weka For ICDM Panel of 10 best algorithms

Support Vector Machines: A Survey