1 / 21

Support Vector Machines

Support Vector Machines. CMPUT 466/551 Nilanjan Ray. Agenda. Linear support vector classifier Separable case Non-separable case Non-linear support vector classifier Kernels for classification SVM as a penalized method Support vector regression.

julius
Download Presentation

Support Vector Machines

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Support Vector Machines CMPUT 466/551 Nilanjan Ray

  2. Agenda • Linear support vector classifier • Separable case • Non-separable case • Non-linear support vector classifier • Kernels for classification • SVM as a penalized method • Support vector regression

  3. Linear Support Vector Classifier: Separable Case Primal problem Dual problem (simpler optimization) Compare the implementation simple_svm.m Dual problem in matrix vector form:

  4. Linear SVC (AKA Optimal Hyperplane)… After solving the dual problem we obtain i‘s; how do construct the hyperplane from here? To obtain  use the equation: How do we obtain 0 ? We need the complementary slackness criteria, which are the results of Karush-Kuhn-Tucker (KKT) conditions for the primal optimization problem. Complementary slackness means: Training points corresponding to non-negative i‘s are support vectors. 0 is computed from for which i‘s are non-negative.

  5. Optimal Hyperplane/Support Vector Classifier In interesting interpretation from the equality constraint in the dual problem is as follows. i are forces on both sides of the hyperplane, and the net force is zero on the hyperplane.

  6. Linear Support Vector Classifier: Non-separable Case

  7. From Separable to Non-separable In the non-separable case the margin width is: , and if in addition , then the margin width is 1. This is the reason that in the primal problem we have the following inequality constraints: (1) These inequality constraints ensure that there is no point in the margin area. For the non-separable case, such constraints must be violated, and it is modified to: So, the primary optimization problem becomes: The positive parameter  controls the extent to which points are allowed to violate (1)

  8. Non-separable Case: Finding Dual Function • Lagrangian function minimization: • Solve: • Substitute (1), (2) and (3) in L to form the dual function: (1) (2) (3)

  9. Dual optimization: dual variables to primal variables After solving the dual problem we obtain i‘s; how do we construct the hyperplane from here? To obtain  use the equation: How do we obtain 0 ? complementary slackness conditions for the primal optimization problem: Training points corresponding to non-negative i‘s are support vectors. 0 is computed from for which: (Average is taken from such points) is chosen by cross-validation. should be typically greater than 1/N.

  10. Example: Non-separable Case

  11. Non-linear support vector classifier Let’s take a look at dual cost function for the optimal separating hyperplane: Let’s take a look at the solution of optimal separating hyperplane in terms of dual variables: An invaluable observation: all these equations involve “feature points” in “inner products”

  12. Non-linear support vector classifier… An invaluable observation: all these equations involve “feature points” in “inner products” This feature is particularly very convenient when the input feature space has a large dimension As for example, consider that we want a classifier which is additive in the feature component, not linear. Such a classifier is expected to perform better on problems with non-linear classification boundary. hi are non-linear functions of the input feature. Ex. input space: x=(x1,x2), and h’s are second order polynomials: So that the classifier is now non-linear: Because of the inner product feature, this non-linear classifier can still be computed by the methods for finding linear optimal hyperplane.

  13. Non-linear support vector classifier… Denote: The non-linear classifier: The dual cost function: The non-linear classifier in dual variables: Thus, in the dual variable space the non-linear classifer is expressed just with inner products!

  14. Non-linear support vector classifier… With the previous non-linear feature vector, The inner product takes a particularly interesting form: Computational savings: instead of 6 products, we compute 3 products Kernel function

  15. Kernel Functions So, if the inner product can be expressed in terms of a function symmetric function K: then we can apply the SV tool. Well not quite! We need another property of K called positive (semi) definiteness. Why?The dual function has an answer to this question. The maximization of the dual is convex when the matrix K is positive semi-definite Thus the kernel function K must satisfy two properties: symmetry and p.d.

  16. Kernel Functions… Thus we need such h(x)’s that define kernel function. In practice we don’t even need to define h(x)! All we need is the kernel function! Example kernel functions: dth degree polynomial Radial kernel Neural network The real question is now designing a kernel function

  17. Example

  18. SVM as a Penalty Method With the following optimization is equivalent to: SVM is a penalized optimization method for binary classification

  19. Negative Binomial Log-likelihood (LR Loss Function) Example This is essentially non-linear logistic regression

  20. SVM for Regression The penalty view of SVM leads to regression With the following optimization where, V(.) is a regression loss function.

  21. SV Regression: Loss Functions

More Related