Support Vector Machine

Le Do Hoang Nam – CNTN08 Support Vector Machine

Linear Programming • General Form with x in Rn • Linear objective, Linear constraints, …

Linear Programming • An example: The Diet Problem • How to come up with a cheapest meal that meets all nutrition standards?

Linear Programming • Let x1, x2 and x3 be the amount in kilos of carrot, cabbage and cucumber in the dish. • Mathematically,

Linear Programming • In canonical form: • How to solve? • Simplex. • Newton method. • Gradient descend.

LP and Classification • Given a set of N samples (mi, li) • mi is the feature set. • li = -1 or 1 is the label. • If a sample is correctly classified by a hyper-plane wTx + c then: li (wTmi + c) ≥ 1 linear function

LP and Classification • (w, c) is a good classification if it satisfies: li (wTmi + c) ≥ 1 , i = 1..n which are linear constraints  LP form:

LP and Classification • Without any objective function, we have ALL possible solutions: Class 2 Class 2 Class 1 Class 1

LP and Classification • If data is not linearly separable: •  Minimize number of errors Class 2 Class 1

LP and Classification • Our objective becomes: • But, cardinal function is non-linear  not an LP

LP and Classification • Cardinal function: f(x) • Solution:Approximate it with Hinge-loss function. 1 1 O x

LP and Classification • Hinge-loss function: • Or: f(x) 1 1 O x

LP and Classification • Classification problem now becomes: which can be solved as an LP

LP and Classification • Geometry view: wTx + c = 1 Class 2 εi mi wTx + c = -1 mj εj wTx + c = 0 Class 1

LP and Classification • Another problem: Some samples are uncertain Class 2 Class 1

LP and Classification • Solution: Maximum the margin d. Class 2 d Class 1

LP and Classification • All samples are outside the margin • All the distances from samples to boundary are bigger than d/2. That means:

LP and Classification • Because hyper-plane is homogenous, we choose w such as: • The objective function:

LP and Classification • The problem now becomes:

Support Vector Machine • Together with the error minimization, we have the SVM: • λ means the trade-off between error and robustness

Kernel Method

Support Vector Machine