1 / 8

SVM (II): Non-separable Cases

Learn about non-separable pattern classification in Support Vector Machines (SVM), including the additional term in the cost function and the primal and dual problem formulations. Explore the KKT condition and implications of minimizing the norm of the separating hyperplanes.

kcaldwell
Download Presentation

SVM (II): Non-separable Cases

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 18. SVM (II): Non-separable Cases

  2. Outline Non-separable pattern classification Additional term in cost function Primal problem formulation Dual problem formulation KKT condition (C) 2001 by Yu Hen Hu

  3. Implication of Minimizing ||w|| Let D denote the diameter of the smallest hyper-ball thatencloses all the input training vectors {x1, x2, …, xN}. The setof optimal hyper-planes described by the equation WoTx + bo = 0 has a VC-dimension h bounded from above as h  min { D2/2, m0} + 1 where m0 is the dimension of the input vectors, and  = 2/||wo|| is the margin of the separation of the hyper-planes. VC-dimension determines the complexity of the classifier structure, and usually the smaller the better. (C) 2001 by Yu Hen Hu

  4. Non-separable Cases Recall that in linearly separable case, each training sample pair (xi, di) represents a linear inequality constraint di(wTxi + b)  1, i = 1, 2, …, N (*) If the training samples are not linearly separable, the constraint can be modified to yield a soft constraint: di(wTxi + b)  1i , i = 1, 2, …, N (**) {i; 1  i  N} are known as slack variables. Note that originally, (*) is a normalized version of di g(xi)/|w| . With the slack variable I, that eq. becomes di g(xi)/|w| (1i). Hence with the slack variable, we allow some samples xi fall within the gap. Moreover, if i > 1, then the corresponding (xi, di) is mis-classified because the sample will fall on the wrong side of the hyper-plane H. (C) 2001 by Yu Hen Hu

  5. Since i > 1 implies mis-classification, the cost function must include a term to minimize the number of samples that are mis-classified: where  is a Lagrange multiplier. But this formulationis non-convex and a solution is difficult to find using existing nonlinear optimization algorithms. Hence, we may instead use an approximated cost function With this approximated cost function, the goal is to maximize  (minimize ||W||) while minimize i ( 0). i: not counted if xi outside gap and on the correct side. 0 < i < 1: xi inside gap, but on the correct side. i > 1: xi on the wrong side (inside or outside gap). Non-Separable Case  0 1 (C) 2001 by Yu Hen Hu

  6. Primal Problem Formulation Primal Optimization Problem Given {(xi, di);1 i N}. Find w, b such that is minimized subject to the constraints • i 0, and • di(wTxi + b)  1i for i = 1, 2, …, N. Using i and i as Lagrange multipliers, the unconstrained cost function becomes (C) 2001 by Yu Hen Hu

  7. Dual Problem Formulation Note that Dual Optimization Problem Given {(xi, i);1 i N}. FindLagrange multipliers {i; 1  i  N} such that is maximized subject to the constraints • 0 i C (a user-specified positive number) and (C) 2001 by Yu Hen Hu

  8. Solution to the Dual Problem By the Karush-Kuhn-Tucker condition: for i = 1, 2, …, N, (i) i [di (wTxi + b)  1 + i] = 0 (*) (ii) ii = 0 At optimal point i +i = C. Thus, onemay deduce that if 0 < i< C, then i = 0 and di(wTxi+b) = 1 if i= C, then i 0 and di(wTxi+b) = 1- i 1 if i= 0, then di(wTxi+b) 1: xi is not a support vector Finally, the optimal solutions are: where Io = {i; 0 < i< C} (C) 2001 by Yu Hen Hu

More Related