1 / 12

Support Vector Machines: A Survey

Support Vector Machines: A Survey. Qiang Yang, for ICDM 2006 Panel http://www.cse.ust.hk/~qyang Partially based on slides from Prof Andrew Moore at CMU: http://www.cs.cmu.edu/~awm/tutorials. Problem: Given a set of objects with known descriptive variables, and from known classes, i

stash
Download Presentation

Support Vector Machines: A Survey

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel http://www.cse.ust.hk/~qyang Partially based on slides from Prof Andrew Moore at CMU: http://www.cs.cmu.edu/~awm/tutorials For ICDM Panel on 10 Best Algorithms

  2. Problem: Given a set of objects with known descriptive variables, and from known classes, i {(Xi, yi), i=1, 2, … t} Find a discriminative function f(x) such that f(Xi)=yi. SVM today: A must try for most applications Mathematically well founded Robust to noise (non-support vectors ignored) Works even for dozens of training data Among the most accurate algorithms Has many extensions Can be scaled up (ongoing work…) Hard-Margin Linear Classifier Maximize Margin Support Vectors Quadratic Programming Soft-Margin Linear Classifier Non-Linear Separable Problem and Kernels XOR Extension to Regression for numerical class Ranking rather than classification SMO and Core vector machines Support Vector Machines For ICDM Panel of 10 best algorithms

  3. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 How would you classify this data? For ICDM Panel of 10 best algorithms

  4. a Linear Classifiers x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 Any of these would be fine.. ..but which is best? For ICDM Panel of 10 best algorithms

  5. a Maximum Margin x f yest f(x,w,b) = sign(w. x- b) denotes +1 denotes -1 The maximum margin linear classifier is the linear classifier with the maximum margin. Support Vectors are those datapoints that the margin pushes up against For ICDM Panel of 10 best algorithms

  6. SVM: Questions • Can we understand the meaning of the SVM through a solid theoretical foundation? • Can we extend the SVM formulation to handle cases where we allow errors to exist, when even the best hyperplane must admit some errors on the training data? • Can we extend the SVM formulation so that it works in situations where the training data are not linearly separable? • Can we extend the SVM formulation so that the task is to rank the instances in the likelihood of being a positive class member, rather than classification? • Can we scale up the algorithm for finding the maximum margin hyperplanes to thousands and millions of instances? For ICDM Panel of 10 best algorithms

  7. Q1. Support Vector Machines: Foundations • The problem of finding the maximum margin can be transformed to finding the roots of a Lagrangian • Can be solved using quadratic programming (QP) • Has solid theoretical foundations • Future error < Training error + C*h1/2 • h=VC dimension, which is the max number of examples shattered by a function class f(x,a) For ICDM Panel of 10 best algorithms

  8. denotes +1 denotes -1 Q2: extension to allow errors • When noise exists: • Minimize • w.w+ C (distance of error • points to their • correct place) For ICDM Panel of 10 best algorithms

  9. Q3: Non-linear transformation to Feature spaces • General idea: introduce kernels Φ: x→φ(x) For ICDM Panel of 10 best algorithms

  10. SVM for regression analysis SV regression SVM for ranking Ranking SVM Idea For each order pair of instances (x1, x2) where x1 < x2 in ranking Generate a new instance <x1,x2,+1> Train an SVM on the new data set Q4: extension to other tasks? f(x,w,b) = w. x- b For ICDM Panel of 10 best algorithms

  11. Q5: Scale Up? • One of the initial drawbacks of SVM is its computational inefficiency. • However, this problem is being solved with great success. • SMO: • break a large optimization problem into a series of smaller problems, each only involves a couple of carefully chosen variables • The process iterates • Core Vector Machines • finding an approximate minimum enclosing ball of a set of instances. • These instances, when mapped to an N-dimensional space, represent a core set • Solving the SVM learning problem on these core sets can produce a good approximation solution in very fast speed. • Train high quality SVM on millions of data in seconds For ICDM Panel of 10 best algorithms

  12. References • An excellent tutorial on VC-dimension and Support Vector Machines: • http://www.cs.cmu.edu/~awm/tutorials • C.J.C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery, 2(2):955-974, 1998. http://citeseer.nj.nec.com/burges98tutorial.html • The VC/SRM/SVM Bible: (Not for beginners including myself) Statistical Learning Theory by Vladimir Vapnik, Wiley-Interscience; 1998 • Software: SVM-light, http://svmlight.joachims.org/, LibSVM, http://www.csie.ntu.edu.tw/~cjlin/libsvm/ SMO in Weka For ICDM Panel of 10 best algorithms

More Related