1 / 19

Ch. Eick : Support Vector Machines: The Main Ideas

Ch. Eick : Support Vector Machines: The Main Ideas. Reading Material Support Vector Machines: Textbook First 3 columns of Smola / Schönkopf article on SV Regression http://en.wikipedia.org/wiki/Kernel_trick. Likelihood- vs. Discriminant-based Classification.

cale
Download Presentation

Ch. Eick : Support Vector Machines: The Main Ideas

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: Textbook First 3 columns of Smola/Schönkopf article on SV Regression http://en.wikipedia.org/wiki/Kernel_trick

  2. Likelihood- vs. Discriminant-based Classification • Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x) gi(x) = log P(Ci|x) • Discriminant-based: Assume a model for gi(x|Φi); no density estimation • Prototype-based: Make classification decisions based on nearest prototypes without constructing decision boundaries (kNN, kMeans approach) • Estimating the boundaries is enough; no need to accurately estimate the densities/probability inside the boundaries; we are just interested in learning decision boundaries (lines for which the densities of two classes is the same), and many popular classification techniques learn decision boundaries without explicitly constructing density functions. Eick: Support Vector Machines: The Main Ideas

  3. SVMs use a single hyperplane; one Possible Solution http://en.wikipedia.org/wiki/Hyperplane Support Vector Machines Eick: Support Vector Machines: The Main Ideas

  4. Another possible solution Support Vector Machines Eick: Support Vector Machines: The Main Ideas

  5. Other possible solutions Support Vector Machines Eick: Support Vector Machines: The Main Ideas

  6. Which one is better? B1 or B2? How do you define better? Support Vector Machines Eick: Support Vector Machines: The Main Ideas

  7. Find a hyperplane maximizing the margin => B1 is better than B2 Support Vector Machines Eick: Support Vector Machines: The Main Ideas

  8. Key Properties of Support Vector Machines • Use a single hyperplane which subdivides the space into two half-spaces, one which is occupied by Class1 and the other by Class2 • They maximize the margin of the decision boundary using quadratic optimization techniques which find the optimal hyperplane. • When used in practice, SVM approaches frequently map (using ) the examples to a higher dimensional space and find margin maximal hyperplanes in the mapped space, obtaining decision boundaries which are not hyperplanes in the original space. • Moreover, versions of SVMs exist that can be used when linear separability cannot be accomplished. Eick: Support Vector Machines: The Main Ideas

  9. Examples are: (x1,..,xn,y) with y{-1,1} Support Vector Machines L2 Norm: http://en.wikipedia.org/wiki/L2_norm#Euclidean_norm Dot-Product: http://en.wikipedia.org/wiki/Dot_product Eick: Support Vector Machines: The Main Ideas

  10. Support Vector Machines • We want to maximize: • Which is equivalent to minimizing: • But subjected to the following N constraints: • This is a constrained convex quadratic optimization problem that can be solved in polynominal time • Numerical approaches to solve it (e.g., quadratic programming) exist • The function to be optimized has only a single minimum no local minimum problem Dot-Product: http://en.wikipedia.org/wiki/Dot_product Eick: Support Vector Machines: The Main Ideas

  11. Support Vector Machines • What if the problem is not linearly separable? Eick: Support Vector Machines: The Main Ideas

  12. Linear SVM for Non-linearly Separable Problems Measures prediction error No kernel Parameter • What if the problem is not linearly separable? • Introduce slack variables • Need to minimize: • Subject to (i=1,..,N): • C is chosen using a validation set trying to keep the margins wide while keeping the training error low. Inverse size of margin between hyperplanes Slack variable allows constraint violation to a certain degree Eick: Support Vector Machines: The Main Ideas

  13. Nonlinear Support Vector Machines • What if decision boundary is not linear? Non-linear function Alternative 1: Use technique that Employs non-linear decision boundaries Eick: Support Vector Machines: The Main Ideas

  14. Nonlinear Support Vector Machines • Transform data into higher dimensional space • Find the best hyperplane using the methods introduced earlier Alternative 2: Transform into a higher dimensional attribute space and find linear decision boundaries in this space Eick: Support Vector Machines: The Main Ideas

  15. Nonlinear Support Vector Machines • Choose a non-linear function f to transform into a different, usually higher dimensional, attribute space • Minimize • but subjected to the following N constraints: Find a good hyperplane in the transformed space Remark: The Soft Margin SVM can be generalized similarly. Eick: Support Vector Machines: The Main Ideas

  16. Example: Polynomial Kernel Function Polynomial Kernel Function: F(x1,x2)=(x12,x22,sqrt(2)*x1,sqrt(2)*x2,1) K(u,v)=F(u)F(v)= (uv + 1)2 A Support Vector Machine with polynomial kernel function classifies a new example z as follows: sign((Sliyi*F(xi)F(z))+b) = sign((Sliyi *(xiz +1)2))+b) Remark: li and b are determined using the methods for linear SVMs that were discussed earlier Kernel function trick: perform computations in the original space, although we solve an optimization problem in the transformed space  more efficient; more detailsTopic14.

  17. Other Material on SVMs • http://www.youtube.com/watch?v=27RQRUR7Ubc Support Vector Machines in Rapid Miner • http://stackoverflow.com/questions/1072097/pointers-to-some-good-svm-tutorial • http://www.csie.ntu.edu.tw/~cjlin/libsvm/ • http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html • Adaboost/SVM Relationship Lecture: http://videolectures.net/mlss05us_rudin_da/ Eick: Support Vector Machines: The Main Ideas

  18. Summary Support Vector Machines • Support vector machines learn hyperplanes that separate two classes maximizing the margin between them (the empty space between the instances of the two classes). • Support vector machines introduce slack variables—in the case that classes are not linear separable—trying to maximize margins while keeping the training error low. • The most popular versions of SVMs use non-linear kernel functions and map the attribute space into a higher dimensional space to facilitate finding “good” linear decision boundaries in the modified space. • Support vector machines find “margin optimal” hyperplanes by solving a convex quadratic optimization problem. However, this optimization process is quite slow and support vector machines tend to fail if the number of examples goes beyond 500/5000/50000… • In general, support vector machines accomplish quite high accuracies, if compared to other techniques. • In the last 10 years, support vector machines have been generalized for other tasks such as regression, PCA, outlier detection,… Eick: Support Vector Machines: The Main Ideas

  19. Kernels—What can they do for you? • Some machine learning/statistical problems only depend on the dot-product of the objects in the dataset O={x1,..,xn} and not on other characteristics of the objects in the dataset; in other words, those techniques only depend on the gram matrix of O which stores x1x1, x1x2,…xnxn(http://en.wikipedia.org/wiki/Gramian_matrix) . • These techniques can be generalized by mapping the dataset into a higher dimensional space as long as the non-linear mapping  can be kernelized; that is, a kernel function K can be found such that: K(u,v)= (u)(v) In this case the results are computed in the mapped space based on K(x1,x1), K(x1,x2),…,K(xn,xn) which is called the kernel trick: http://en.wikipedia.org/wiki/Kernel_trick • Kernels have been successfully used to generalize PCA, K-means, support vector machines, and many other techniques, allowing them to use non-linear coordinate systems, more complex decision boundaries, or more complex cluster boundaries. • We will revisit kernels later discussing transparencies 13-25, 30-35 of the Vasconcelos lecture. Eick: Support Vector Machines: The Main Ideas

More Related