Support Vector Machine (SVM)

1 / 29

# Support Vector Machine (SVM) - PowerPoint PPT Presentation

Support Vector Machine (SVM). Based on Nello Cristianini presentation http:// www.support-vector.net/tutorial.html. Basic Idea. Use Linear Learning Machine (LLM). Overcome the linearity constraints: Map to non-linearly to higher dimension. Select between hyperplans Use margin as a test

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Support Vector Machine (SVM)' - rowena

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Support Vector Machine (SVM)

Based on Nello Cristianini presentation

http://www.support-vector.net/tutorial.html

Basic Idea
• Use Linear Learning Machine (LLM).
• Overcome the linearity constraints:
• Map to non-linearly to higher dimension.
• Select between hyperplans
• Use margin as a test
• Generalization depends on the margin.
General idea

Transformed Problem

Original Problem

Kernel Based Algorithms
• Two separate learning functions
• Learning Algorithm:
• in an imbedded space
• Kernel function
• performs the embedding
Basic Example: Kernel Perceptron
• Hyperplane classification
• f(x)=<w,x>+b = <w’,x’>
• h(x)= sign(f(x))
• Perceptron Algorithm:
• Sample: (xi,ti), ti{-1,+1}
• If ti <wk,xi> < 0 THEN /* Error*/
• wk+1 = wk + ti xi
• k=k+1
Recall
• Margin of hyperplan w
• Mistake bound
Observations
• Solution is a linear combination of inputs
• w =  ai ti xi
• where ai >0
• Mistake driven
• Only points on which we make mistake influence!
• Support vectors
• The non-zero ai
Dual representation
• Rewrite basic function:
• f(x) = <w,x> +b =  ai ti <xi , x> +b
• w =  ai ti xi
• Change update rule:
• IF tj ( ai ti <xi , xj> +b) < 0
• THEN aj = aj+1
• Observation:
• Data only inside inner product!
Limitation of Perceptron
• Only linear separations
• Only converges for linearly separable data
• Only defined on vectorial data

Transformed Problem

Original Problem

The idea of a Kernel
• Embed data to a different space
• Possibly higher dimension
• Linearly separable in the new space.
Kernel Mapping
• Need only to compute inner-products.
• Mapping: M(x)
• Kernel: K(x,y) = < M(x) , M(y)>
• Dimensionality of M(x): unimportant!
• Need only to compute K(x,y)
• Using it in the embedded space:
• Replace <x,y> by K(x,y)
Example

x=(x1 , x2); z=(z1 ,z2); K(x,z) = (<x,z>)2

Polynomial Kernel

Transformed Problem

Original Problem

Example of Basic Kernels
• Polynomial
• K(x,z)= (<x,z> )d
• Gaussian
• K(x,z)= exp{- ||x-z||2 /2}
Kernel: Closure Properties
• K(x,z) = K1(x,z) + c
• K(x,z) = c*K1(x,z)
• K(x,z) = K1(x,z) * K2(x,z)
• K(x,z) = K1(x,z) + K2(x,z)
• Create new kernels using basic ones!
Support Vector Machines
• Linear Learning Machines (LLM)
• Use dual representation
• Work in the kernel induced feature space
• f(x) =  ai ti K(xi , x) +b
• Which hyperplane to select
Generalization of SVM
• PAC theory:
• error = O( Vcdim / m)
• Problem: Vcdim >> m
• No preference between consistent hyperplanes
Margin based bounds
• H: Basic Hypothesis class
• conv(H): finite convex combinations of H
• D: Distribution over X and {+1,-1}
• S: Sample of size m over D
Margin based bounds
• THEOREM: for every f in conv(H)
Maximal Margin Classifier
• Maximizes the margin
• Minimizes the overfitting due to margin selection.
• Increases margin
• Rather than reduce dimensionality
Margins
• Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi)

f(x)

Main trick in SVM
• Insist on functional marginal at least 1.
• Support vectors have margin 1.
• Geometric margin = 1 / || w||
• Proof.
SVM criteria
• Find a hyperplane (w,b)
• That Maximizes: || w ||2 = <w,w>
• Subject to:
• for all i
• ti (<w,xi>+b)  1
• Linear constraint.
• Unique Maximum.
• Polynomial time algorithms.
Dual Problem
• Maximize
• W(a) =  ai - 1/2 i,j ai ti aj tj K(xi , xj) +b
• Subject to
• i ai ti =0
• ai  0
Applications: Text
• Classify a text to given categories
• Sports, news, business, science, …
• Feature space
• Bag of words
• Huge sparse vector!
Applications: Text
• Practicalities:
• Mw(x) = tfw log (idfw) / K
• ftw= text frequency of w
• idfw= inverse document frequency
• idfw = # documents / # documents with w
• Inner product <M(x),M(z)>
• sparse vectors
• SVM: finds a hyperplan in “document space”