# Classification IV - PowerPoint PPT Presentation

1 / 31

Classification IV. Lecturer: Dr. Bo Yuan E-mail: yuanb@sz.tsinghua.edu.cn. Overview. Support Vector Machines. Linear Classifier. w. w · x + b =0. w · x + b <0. w · x + b >0. Distance to Hyperplane. x. x '. Selection of Classifiers. ?. Which classifier is the best?

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Classification IV

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Classification IV

Lecturer: Dr. Bo Yuan

E-mail: yuanb@sz.tsinghua.edu.cn

### Overview

• Support Vector Machines

w

w·x + b =0

w·x + b <0

w·x + b >0

x

x'

### Selection of Classifiers

?

Which classifier is the best?

All have the same training error.

### Unknown Samples

B

A

Classifier B divides the space more consistently (unbiased).

Support Vectors

Support Vectors

### Margins

• The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point.

• Intuitively, it is safer to choose a classifier with a larger margin.

• Wider buffer zone for mistakes

• The hyperplane is decided by only a few data points.

• Support Vectors!

• Select the classifier with the maximum margin.

• Linear Support Vector Machines (LSVM)

• Works very well in practice.

• How to specify the margin formally?

### Margins

“Predict Class = +1” zone

M=Margin Width

x+

X-

wx+b=1

“Predict Class = -1” zone

wx+b=0

wx+b=-1

### Objective Function

• Correctly classify all data points:

• Maximize the margin

• Minimize

• Subject to

Dual Problem

inner product

x2

(1, 1, +1)

x1

(0, 0, -1)

e11

e2

wx+b=1

e7

wx+b=0

wx+b=-1

x

0

x

0

x2

x

x2

x22

Φ: x→ φ(x)

x1

x12

x2

Φ: x→ φ(x)

x1

Constant Terms

Number of terms

Linear Terms

### Kernel Trick

• The linear classifier relies on dot products between vectors xi·xj

• If every data point is mapped into a high-dimensional space via some transformation Φ: x→ φ(x), the dot product becomes: φ(xi)·φ(xj)

• A kernel function is some function that corresponds to an inner product in some expanded feature space: K(xi, xj) = φ(xi)·φ(xj)

• Example: x=[x1,x2]; K(xi, xj) = (1 + xi ·xj)2

### String Kernel

Similarity between text strings: Car vs. Custard

### More Maths …

Lagrange Duality

Karush–Kuhn–Tucker Conditions

• Text Book

• NelloCristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.

• Online Resources

• http://www.kernel-machines.org/

• http://www.support-vector-machines.org/

• http://www.tristanfletcher.co.uk/SVM%20Explained.pdf

• http://www.csie.ntu.edu.tw/~cjlin/libsvm/

• A list of papers uploaded to the web learning portal

### Review

• What is the definition of margin in a linear classifier?

• Why do we want to maximize the margin?

• What is the mathematical expression of margin?

• How to solve the objective function in SVM?

• What are support vectors?

• What is soft margin?

• How does SVM solve nonlinear problems?

• What is so called “kernel trick”?

• What are the commonly used kernels?

### Next Week’s Class Talk

• Volunteers are required for next week’s class talk.

• Topic : SVM in Practice

• Hints:

• Applications

• Demos

• Multi-Class Problems

• Software

• A very popular toolbox: Libsvm

• Any other interesting topics beyond this lecture

• Length: 20 minutes plus question time