- By
**efrem** - Follow User

- 99 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Classification IV' - efrem

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Overview

- Support Vector Machines

Selection of Classifiers

?

Which classifier is the best?

All have the same training error.

How about generalization?

Margins

- The margin of a linear classifier is defined as the width that the boundary could be increased by before hitting a data point.
- Intuitively, it is safer to choose a classifier with a larger margin.
- Wider buffer zone for mistakes
- The hyperplane is decided by only a few data points.
- Support Vectors!
- Others can be discarded!

- Select the classifier with the maximum margin.
- Linear Support Vector Machines (LSVM)

- Works very well in practice.
- How to specify the margin formally?

Margins

“Predict Class = +1” zone

M=Margin Width

x+

X-

wx+b=1

“Predict Class = -1” zone

wx+b=0

wx+b=-1

Objective Function

- Correctly classify all data points:
- Maximize the margin
- Quadratic Optimization Problem
- Minimize
- Subject to

Solutions of w & b

inner product

Quadratic Basis Functions

Constant Terms

Number of terms

Linear Terms

Pure Quadratic Terms

Quadratic Cross-Terms

Calculation of Φ(xi )·Φ(xj)

Kernel Trick

- The linear classifier relies on dot products between vectors xi·xj
- If every data point is mapped into a high-dimensional space via some transformation Φ: x→ φ(x), the dot product becomes: φ(xi)·φ(xj)
- A kernel function is some function that corresponds to an inner product in some expanded feature space: K(xi, xj) = φ(xi)·φ(xj)
- Example: x=[x1,x2]; K(xi, xj) = (1 + xi ·xj)2

String Kernel

Similarity between text strings: Car vs. Custard

Reading Materials

- Text Book
- NelloCristianini and John Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods. Cambridge University Press, 2000.

- Online Resources
- http://www.kernel-machines.org/
- http://www.support-vector-machines.org/
- http://www.tristanfletcher.co.uk/SVM%20Explained.pdf
- http://www.csie.ntu.edu.tw/~cjlin/libsvm/
- A list of papers uploaded to the web learning portal

- Wikipedia & Google

Review

- What is the definition of margin in a linear classifier?
- Why do we want to maximize the margin?
- What is the mathematical expression of margin?
- How to solve the objective function in SVM?
- What are support vectors?
- What is soft margin?
- How does SVM solve nonlinear problems?
- What is so called “kernel trick”?
- What are the commonly used kernels?

Next Week’s Class Talk

- Volunteers are required for next week’s class talk.
- Topic : SVM in Practice
- Hints:
- Applications
- Demos
- Multi-Class Problems
- Software
- A very popular toolbox: Libsvm

- Any other interesting topics beyond this lecture

- Length: 20 minutes plus question time

Download Presentation

Connecting to Server..