1 / 120

Discriminative Machine Learning Topic 2: Linear Classification (SVM)

Discriminative Machine Learning Topic 2: Linear Classification (SVM). M. Pawan Kumar (Based on Prof. A. Zisserman’s course material). Slides available online http:// mpawankumar.info. Outline. Classification Binary Classification Multiclass Classification. Binary Classification.

mignont
Download Presentation

Discriminative Machine Learning Topic 2: Linear Classification (SVM)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Discriminative Machine LearningTopic 2: Linear Classification (SVM) M. Pawan Kumar (Based on Prof. A. Zisserman’s course material) Slides available online http://mpawankumar.info

  2. Outline • Classification • Binary Classification • Multiclass Classification

  3. Binary Classification Input: x Output: y {-1,+1} Image Sentence DNA sequence …

  4. I applied for Oxford engineering Example Will she get a 1st class degree, if admitted?

  5. Features GCSE grades AS-level grades A-level grades Feature Vector Φ(x) Interview scores PAT scores …

  6. Example Is this an energy efficient house?

  7. Features Number of people Number of electrical items Annual income Feature Vector Φ(x) Political inclination Country, State, City …

  8. Example Is this a spam email?

  9. Features Spelling mistakes Word count URLs Feature Vector Φ(x) Sender Recipients …

  10. Multiclass Classification Input: x Output: y {1,2,…, C} Image Sentence DNA sequence …

  11. I applied for Oxford engineering Example What class degree will she get, if admitted?

  12. Features GCSE grades AS-level grades A-level grades Feature Vector Φ(x) Interview scores PAT scores …

  13. Example Which digit does the image depict?

  14. Features Scale the image to canonical size, say 28x28 Binarize the intensity values Concatenate the binary values Feature Vector Φ(x)

  15. Outline • Classification • Binary Classification • Multiclass Classification

  16. Binary Classification Dataset D = {(xi,yi), i = 1, … n} yi∈ {-1,+1} Classification via regression Loss function ∑i(wTΦ(xi) – yi)2 Builds on known method Not suitable for classification

  17. I applied for Oxford engineering Example What class degree will she get, if admitted?

  18. Example Say, interview score of 8+ implies 1st class Consider a positive sample Loss = (w*interview_score – 1)2 If w = 1/8, non-zero loss for 8.5, 9, 9.5,… If w = 1/9, non-zero loss for 8, 8.5, 9.5, …

  19. Outline • Classification • Binary Classification • Formulation • Support Vector Machine (SVM) • Max-Margin Interpretation • Optimization • Application • Multiclass Classification

  20. Formulation We first consider prediction Given a classifier, how do we use it to classify Then we’ll move on to learning Given training data, how do we learn a classifier

  21. Prediction Given x compute score for each y∈ {-1,+1} Function f: (x,y) → Real For example (wN)TΦ(x), if y = -1 argmaxy f(x,y) = y(f) = (wP)TΦ(x), if y = +1 Let us make this a bit more abstract

  22. Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) For example Φ(x) Ψ(x,-1) = 0 Vector of zeros Same size as Φ(x)

  23. Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) For example 0 Ψ(x,+1) = Φ(x)

  24. Prediction – Score Function Function f: Ψ(x,y)→ Real For example, wTΨ(x,y) Linear classifier 0 Φ(x) wN Ψ(x,+1) = Ψ(x,-1) = w = wP 0 Φ(x) -ve score = (wN)TΦ(x) +ve score = (wP)TΦ(x)

  25. Prediction – Score Function Function f: Ψ(x,y)→ Real For example, wTΨ(x,y) Linear classifier y(w) = argmaxywTΨ(x,y) Predicted class y(w)

  26. Prediction – Summary Given an input x,for each output y Define joint feature vector Ψ(x,y) Score function wTΨ(x,y) y(w) = argmaxywTΨ(x,y)

  27. Learning D = {(xi,yi), i = 1, … n} yi∈ {-1,+1} minw ∑iLoss(w; xi,yi) + λ Reg(w) Loss Function Regularization Suitable loss function for classification?

  28. Learning – Loss Function Consider one sample (xi,yi) yi(w): Prediction using parameters w

  29. Learning – Loss Function Consider one sample (xi,yi) yi(w) = argmaxywTΨ(x,y) 0, if yi = yi(w) Loss Δ(yi,yi(w)) = 1, if yi ≠ yi(w) 0-1 loss function

  30. Learning – Objective ∑i minw Δ(yi,yi(w)) + λ||w||2 Is this a sensible learning objective? Loss is highly non-convex in w Regularization plays no role Minimize a convex upper bound

  31. Learning – Objective wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi(w)) ≤ wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi) Why?

  32. Learning – Objective wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi(w)) ≤ wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi) ≤ maxy{ wTΨ(xi,y) + Δ(yi,y) } - wTΨ(xi,yi)

  33. Learning – Objective Convex? Regularization sensitive? Replace loss function with upper bound Minimize objective to obtain linear classifier maxy{ wTΨ(xi,y) + Δ(yi,y) } - wTΨ(xi,yi)

  34. Learning – Summary D = {(xi,yi), i = 1, … n} yi∈ {-1,+1} minw ∑i maxy{wTΨ(xi,y) +Δ(yi,y)} - wTΨ(xi,yi) + λ||w||2 Let us look at a specific example

  35. Outline • Classification • Binary Classification • Formulation • Support Vector Machine (SVM) • Max-Margin Interpretation • Optimization • Application • Multiclass Classification

  36. Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) Φ(x) Ψ(x,+1) = 1 Classifier doesn’t always pass through origin

  37. Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) 0 Ψ(x,-1) = 0 Vector of zeros Same size as Φ(x)

  38. Prediction – Score Function Score: (wS)TΨ(x,y) Linear classifier Φ(x) 0 w Ψ(x,+1) = Ψ(x,-1) = wS = 1 b 0 Weight vector

  39. Prediction – Score Function Score: (wS)TΨ(x,y) Linear classifier Φ(x) 0 w Ψ(x,+1) = Ψ(x,-1) = wS = 1 b 0 Bias

  40. Prediction – Score Function Score: (wS)TΨ(x,y) Linear classifier Φ(x) 0 w Ψ(x,+1) = Ψ(x,-1) = wS = 1 b 0 -ve score = 0 +ve score = wTΦ(x) + b Make prediction by maximizing score over {-1,+1}

  41. Prediction – Summary Weight vector: w Bias: b +1, if wTΦ(x) + b ≥ 0 Prediction y(w) = -1, otherwise y(w) = sign(wTΦ(x) + b)

  42. Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a positive sample yi = +1 If y = +1 0 If y = -1 1 – ((wS)TΨ(xi,+1))

  43. Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a positive sample yi = +1 If y = +1 0 If y = -1 1 – (wTΦ(xi) + b )

  44. Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a positive sample yi = +1 If y = +1 0 If y = -1 1 – yi(wTΦ(xi) + b )

  45. Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a negative sample yi = -1 If y = -1 0 If y = +1 1 + ((wS)TΨ(xi,+1))

  46. Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a negative sample yi = -1 If y = -1 0 If y = +1 1 + (wTΦ(xi) + b )

  47. Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a negative sample yi = -1 If y = -1 0 If y = +1 1 - yi(wTΦ(xi) + b )

  48. Learning Convex upper bound of the 0-1 loss function Hinge Loss max{0, 1 - yi(wTΦ(xi) + b )} yi(wTΦ(xi) + b )

  49. I applied for Oxford engineering Example What class degree will she get, if admitted?

  50. Example Say, interview score of 8+ implies 1st class Consider a positive sample Loss = max{0,1-1*(w*interview_score+b)} If w = 1/8 and b=0, loss=0 for 8, 8.5, 9, 9.5,… More suitable for classification than squared loss

More Related