support vector machine svm n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Support Vector Machine (SVM) PowerPoint Presentation
Download Presentation
Support Vector Machine (SVM)

Loading in 2 Seconds...

play fullscreen
1 / 29

Support Vector Machine (SVM) - PowerPoint PPT Presentation


  • 193 Views
  • Uploaded on

Support Vector Machine (SVM). Based on Nello Cristianini presentation http:// www.support-vector.net/tutorial.html. Basic Idea. Use Linear Learning Machine (LLM). Overcome the linearity constraints: Map to non-linearly to higher dimension. Select between hyperplans Use margin as a test

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Support Vector Machine (SVM)' - rowena


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
support vector machine svm

Support Vector Machine (SVM)

Based on Nello Cristianini presentation

http://www.support-vector.net/tutorial.html

basic idea
Basic Idea
  • Use Linear Learning Machine (LLM).
  • Overcome the linearity constraints:
    • Map to non-linearly to higher dimension.
  • Select between hyperplans
    • Use margin as a test
  • Generalization depends on the margin.
general idea
General idea

Transformed Problem

Original Problem

kernel based algorithms
Kernel Based Algorithms
  • Two separate learning functions
  • Learning Algorithm:
    • in an imbedded space
  • Kernel function
    • performs the embedding
basic example kernel perceptron
Basic Example: Kernel Perceptron
  • Hyperplane classification
    • f(x)=<w,x>+b = <w’,x’>
    • h(x)= sign(f(x))
  • Perceptron Algorithm:
    • Sample: (xi,ti), ti{-1,+1}
    • If ti <wk,xi> < 0 THEN /* Error*/
    • wk+1 = wk + ti xi
    • k=k+1
recall
Recall
  • Margin of hyperplan w
  • Mistake bound
observations
Observations
  • Solution is a linear combination of inputs
    • w =  ai ti xi
    • where ai >0
  • Mistake driven
    • Only points on which we make mistake influence!
  • Support vectors
    • The non-zero ai
dual representation
Dual representation
  • Rewrite basic function:
    • f(x) = <w,x> +b =  ai ti <xi , x> +b
    • w =  ai ti xi
  • Change update rule:
    • IF tj ( ai ti <xi , xj> +b) < 0
    • THEN aj = aj+1
  • Observation:
    • Data only inside inner product!
limitation of perceptron
Limitation of Perceptron
  • Only linear separations
  • Only converges for linearly separable data
  • Only defined on vectorial data
the idea of a kernel

Transformed Problem

Original Problem

The idea of a Kernel
  • Embed data to a different space
  • Possibly higher dimension
  • Linearly separable in the new space.
kernel mapping
Kernel Mapping
  • Need only to compute inner-products.
  • Mapping: M(x)
  • Kernel: K(x,y) = < M(x) , M(y)>
  • Dimensionality of M(x): unimportant!
  • Need only to compute K(x,y)
  • Using it in the embedded space:
    • Replace <x,y> by K(x,y)
example
Example

x=(x1 , x2); z=(z1 ,z2); K(x,z) = (<x,z>)2

polynomial kernel
Polynomial Kernel

Transformed Problem

Original Problem

example of basic kernels
Example of Basic Kernels
  • Polynomial
    • K(x,z)= (<x,z> )d
  • Gaussian
    • K(x,z)= exp{- ||x-z||2 /2}
kernel closure properties
Kernel: Closure Properties
  • K(x,z) = K1(x,z) + c
  • K(x,z) = c*K1(x,z)
  • K(x,z) = K1(x,z) * K2(x,z)
  • K(x,z) = K1(x,z) + K2(x,z)
  • Create new kernels using basic ones!
support vector machines
Support Vector Machines
  • Linear Learning Machines (LLM)
  • Use dual representation
  • Work in the kernel induced feature space
    • f(x) =  ai ti K(xi , x) +b
  • Which hyperplane to select
generalization of svm
Generalization of SVM
  • PAC theory:
    • error = O( Vcdim / m)
    • Problem: Vcdim >> m
    • No preference between consistent hyperplanes
margin based bounds
Margin based bounds
  • H: Basic Hypothesis class
  • conv(H): finite convex combinations of H
  • D: Distribution over X and {+1,-1}
  • S: Sample of size m over D
margin based bounds1
Margin based bounds
  • THEOREM: for every f in conv(H)
maximal margin classifier
Maximal Margin Classifier
  • Maximizes the margin
  • Minimizes the overfitting due to margin selection.
  • Increases margin
    • Rather than reduce dimensionality
margins
Margins
  • Geometric Margin: mini ti f(xi)/ ||w|| Functional margin: mini ti f(xi)

f(x)

main trick in svm
Main trick in SVM
  • Insist on functional marginal at least 1.
    • Support vectors have margin 1.
  • Geometric margin = 1 / || w||
  • Proof.
svm criteria
SVM criteria
  • Find a hyperplane (w,b)
  • That Maximizes: || w ||2 = <w,w>
  • Subject to:
    • for all i
    • ti (<w,xi>+b)  1
quadratic programming
Quadratic Programming
  • Quadratic goal function.
  • Linear constraint.
  • Unique Maximum.
  • Polynomial time algorithms.
dual problem
Dual Problem
  • Maximize
    • W(a) =  ai - 1/2 i,j ai ti aj tj K(xi , xj) +b
  • Subject to
    • i ai ti =0
    • ai  0
applications text
Applications: Text
  • Classify a text to given categories
    • Sports, news, business, science, …
  • Feature space
    • Bag of words
    • Huge sparse vector!
applications text1
Applications: Text
  • Practicalities:
    • Mw(x) = tfw log (idfw) / K
    • ftw= text frequency of w
    • idfw= inverse document frequency
    • idfw = # documents / # documents with w
  • Inner product <M(x),M(z)>
    • sparse vectors
  • SVM: finds a hyperplan in “document space”