Discriminative Machine Learning Topic 2: Linear Classification (SVM)

Discriminative Machine LearningTopic 2: Linear Classification (SVM) M. Pawan Kumar (Based on Prof. A. Zisserman’s course material) Slides available online http://mpawankumar.info

Outline • Classification • Binary Classification • Multiclass Classification

Binary Classification Input: x Output: y {-1,+1} Image Sentence DNA sequence …

I applied for Oxford engineering Example Will she get a 1st class degree, if admitted?

Features GCSE grades AS-level grades A-level grades Feature Vector Φ(x) Interview scores PAT scores …

Example Is this an energy efficient house?

Features Number of people Number of electrical items Annual income Feature Vector Φ(x) Political inclination Country, State, City …

Example Is this a spam email?

Features Spelling mistakes Word count URLs Feature Vector Φ(x) Sender Recipients …

Multiclass Classification Input: x Output: y {1,2,…, C} Image Sentence DNA sequence …

I applied for Oxford engineering Example What class degree will she get, if admitted?

Features GCSE grades AS-level grades A-level grades Feature Vector Φ(x) Interview scores PAT scores …

Example Which digit does the image depict?

Features Scale the image to canonical size, say 28x28 Binarize the intensity values Concatenate the binary values Feature Vector Φ(x)

Outline • Classification • Binary Classification • Multiclass Classification

Binary Classification Dataset D = {(xi,yi), i = 1, … n} yi∈ {-1,+1} Classification via regression Loss function ∑i(wTΦ(xi) – yi)2 Builds on known method Not suitable for classification

Example Say, interview score of 8+ implies 1st class Consider a positive sample Loss = (w*interview_score – 1)2 If w = 1/8, non-zero loss for 8.5, 9, 9.5,… If w = 1/9, non-zero loss for 8, 8.5, 9.5, …

Outline • Classification • Binary Classification • Formulation • Support Vector Machine (SVM) • Max-Margin Interpretation • Optimization • Application • Multiclass Classification

Formulation We first consider prediction Given a classifier, how do we use it to classify Then we’ll move on to learning Given training data, how do we learn a classifier

Prediction Given x compute score for each y∈ {-1,+1} Function f: (x,y) → Real For example (wN)TΦ(x), if y = -1 argmaxy f(x,y) = y(f) = (wP)TΦ(x), if y = +1 Let us make this a bit more abstract

Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) For example Φ(x) Ψ(x,-1) = 0 Vector of zeros Same size as Φ(x)

Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) For example 0 Ψ(x,+1) = Φ(x)

Prediction – Score Function Function f: Ψ(x,y)→ Real For example, wTΨ(x,y) Linear classifier 0 Φ(x) wN Ψ(x,+1) = Ψ(x,-1) = w = wP 0 Φ(x) -ve score = (wN)TΦ(x) +ve score = (wP)TΦ(x)

Prediction – Score Function Function f: Ψ(x,y)→ Real For example, wTΨ(x,y) Linear classifier y(w) = argmaxywTΨ(x,y) Predicted class y(w)

Prediction – Summary Given an input x,for each output y Define joint feature vector Ψ(x,y) Score function wTΨ(x,y) y(w) = argmaxywTΨ(x,y)

Learning D = {(xi,yi), i = 1, … n} yi∈ {-1,+1} minw ∑iLoss(w; xi,yi) + λ Reg(w) Loss Function Regularization Suitable loss function for classification?

Learning – Loss Function Consider one sample (xi,yi) yi(w): Prediction using parameters w

Learning – Loss Function Consider one sample (xi,yi) yi(w) = argmaxywTΨ(x,y) 0, if yi = yi(w) Loss Δ(yi,yi(w)) = 1, if yi ≠ yi(w) 0-1 loss function

Learning – Objective ∑i minw Δ(yi,yi(w)) + λ||w||2 Is this a sensible learning objective? Loss is highly non-convex in w Regularization plays no role Minimize a convex upper bound

Learning – Objective wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi(w)) ≤ wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi) Why?

Learning – Objective wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi(w)) ≤ wTΨ(xi,yi(w)) + Δ(yi,yi(w)) - wTΨ(xi,yi) ≤ maxy{ wTΨ(xi,y) + Δ(yi,y) } - wTΨ(xi,yi)

Learning – Objective Convex? Regularization sensitive? Replace loss function with upper bound Minimize objective to obtain linear classifier maxy{ wTΨ(xi,y) + Δ(yi,y) } - wTΨ(xi,yi)

Learning – Summary D = {(xi,yi), i = 1, … n} yi∈ {-1,+1} minw ∑i maxy{wTΨ(xi,y) +Δ(yi,y)} - wTΨ(xi,yi) + λ||w||2 Let us look at a specific example

Outline • Classification • Binary Classification • Formulation • Support Vector Machine (SVM) • Max-Margin Interpretation • Optimization • Application • Multiclass Classification

Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) Φ(x) Ψ(x,+1) = 1 Classifier doesn’t always pass through origin

Prediction – Joint Feature Vector Given input x and output y Joint feature vector Ψ(x,y) 0 Ψ(x,-1) = 0 Vector of zeros Same size as Φ(x)

Prediction – Score Function Score: (wS)TΨ(x,y) Linear classifier Φ(x) 0 w Ψ(x,+1) = Ψ(x,-1) = wS = 1 b 0 Weight vector

Prediction – Score Function Score: (wS)TΨ(x,y) Linear classifier Φ(x) 0 w Ψ(x,+1) = Ψ(x,-1) = wS = 1 b 0 Bias

Prediction – Score Function Score: (wS)TΨ(x,y) Linear classifier Φ(x) 0 w Ψ(x,+1) = Ψ(x,-1) = wS = 1 b 0 -ve score = 0 +ve score = wTΦ(x) + b Make prediction by maximizing score over {-1,+1}

Prediction – Summary Weight vector: w Bias: b +1, if wTΦ(x) + b ≥ 0 Prediction y(w) = -1, otherwise y(w) = sign(wTΦ(x) + b)

Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a positive sample yi = +1 If y = +1 0 If y = -1 1 – ((wS)TΨ(xi,+1))

Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a positive sample yi = +1 If y = +1 0 If y = -1 1 – (wTΦ(xi) + b )

Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a positive sample yi = +1 If y = +1 0 If y = -1 1 – yi(wTΦ(xi) + b )

Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a negative sample yi = -1 If y = -1 0 If y = +1 1 + ((wS)TΨ(xi,+1))

Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a negative sample yi = -1 If y = -1 0 If y = +1 1 + (wTΦ(xi) + b )

Learning Convex upper bound of the 0-1 loss function maxy{(wS)TΨ(xi,y) +Δ(yi,y)} – (wS)TΨ(xi,yi) Consider a negative sample yi = -1 If y = -1 0 If y = +1 1 - yi(wTΦ(xi) + b )

Learning Convex upper bound of the 0-1 loss function Hinge Loss max{0, 1 - yi(wTΦ(xi) + b )} yi(wTΦ(xi) + b )

Example Say, interview score of 8+ implies 1st class Consider a positive sample Loss = max{0,1-1*(w*interview_score+b)} If w = 1/8 and b=0, loss=0 for 8, 8.5, 9, 9.5,… More suitable for classification than squared loss

Discriminative Machine Learning Topic 2: Linear Classification (SVM)

Discriminative Machine Learning Topic 2: Linear Classification (SVM)

Presentation Transcript

“This is a Test. This is Only a Test!”

Software Testing

3D Test Issues

Test and Test Equipment December 2012 Hsin -Chu , Taiwan

Who wants to be a Millionaire?

Test Preparation, Test Taking Strategies, and Test Anxiety

Test Automation Tools: QF-Test and Selenium

System Test Specification

TDC ( Test Description Code)

Engine Condition Diagnosis

Chi-square test or c 2 test

200

Test del Software, con elementi di Verifica e Validazione, Qualità del Prodotto Software

Test of Significance

System Test Tools

Lesson 7