Svm support vectors machines l.jpg
Sponsored Links
This presentation is the property of its rightful owner.
1 / 22

SVM Support Vectors Machines PowerPoint PPT Presentation


  • 125 Views
  • Uploaded on
  • Presentation posted in: General

SVM Support Vectors Machines. Based on Statistical Learning Theory of Vapnik, Chervonenkis, Burges, Scholkopf, Smola, Bartlett, Mendelson, Cristianini Presented By: Tamer Salman. The addressed Problems. SVM can deal with three kinds of problems: Pattern Recognition / Classification.

Download Presentation

SVM Support Vectors Machines

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


SVMSupport Vectors Machines

Based on Statistical Learning Theory of Vapnik, Chervonenkis, Burges, Scholkopf, Smola, Bartlett, Mendelson, Cristianini

Presented By: Tamer Salman


The addressed Problems

  • SVM can deal with three kinds of problems:

    • Pattern Recognition / Classification.

    • Regression Estimation.

    • Density Estimation.


Pattern Recognition

  • Given:

    • A set of M labeled patterns:

    • The patterns are drawn i.i.d from an unknown P(X,Y).

    • A set of functions F.

  • Chose a function f in F, such that an unseen pattern x will be correctly classified with high probability?

  • Binary classification: Two classes, +1 and -1.


The Actual Risk

  • What is the probability for error of a function f?

    where c is some cost function on errors.

  • The risk is not computable due to dP(x,y).

  • A proper estimation must be found.


Linear Neural Network

Linear SVM

Linear SVMLinearly Separable Case

  • Linear SVM produces the maximal margin hyper plane, which is as far as possible from the closest training points.


Linearly Separable Case. Cont.

  • Given the training set, we seek w and b such that:

  • In Addition, we seek the maximal margin hyperplane.

    • What is the margin?

    • How do we maximize it?


Margin Maximization

  • The margin is the sum of distances of the two closest points from each side to the hyper plane.

  • The distance of the hyper plane (w,b) from the origin is w/b.

  • The margin is 2/||w||.

  • Maximizing the margin is equivalent to minimizing ½||w||².


Linear SVM. Cont.

  • The LaGrangian is:


Linear SVM. Cont.

  • Requiring the derivatives with respect to w,b to vanish yields:

  • KKT conditions yield:

  • Where:


Linear SVM. Cont.

  • The resulting separating function is:

  • Notes:

    • The points with α=0 do not affect the solution.

    • The points with α≠0 are called support vectors.

    • The equality conditions hold true only for the SVs.


Linear SVM. Non-separable case.

  • We introduce slack variables ξi and allow mistakes.

  • We demand:

  • And minimize:


Non-separable case. Cont.

  • The modifications yield the following problem:


Non Linear SVM

  • Note that the training data appears in the solution only in inner products.

  • If we pre-map the data into a higher and sparser space we can get more separability and a stronger separation family of functions.

  • The pre-mapping might make the problem infeasible.

  • We want to avoid pre-mapping and still have the same separation ability.

  • Suppose we have a simple function that operates on two training points and implements an inner product of their pre-mappings, then we achieve better separation with no added cost.


Mercer Kernels

  • A Mercer kernel is a function:

    for which there exists a function:

    such that:

  • A funtion k(.,.) is a Mercer kernel if

    for any function g(.), such that:

    the following holds true:


Some Mercer Kernels

  • Homogeneous Polynomial Kernels:

  • Non-homogeneous Polynomial Kernels:

  • Radial Basis Function (RBF) Kernels:


Solution of non-linear SVM

  • The problem:

  • The separating function:


Notes

  • The solutions of non-linear SVM is linear in H (Feature Space).

  • In non-linear SVM w exists in H.

  • The complexity of computing the kernel values is not higher than the complexity of the solution and can be done a priory in a kernel matrix.

  • SVM is suitable for large scale problems due to chunking ability.


Error Estimates

  • Due to the fact that the actual risk is not computable, we seek to estimate the error rate of a machine given a finite set of m patterns.

  • Empirical Risk.

  • Training and Testing.

  • k-fold cross validation.

  • Leave One out.


Error Bounds

  • We seek faster estimates of the solution.

  • The bound should be tight and informative.

  • Theoretical VC bound:

    Risk < Empirical Risk + Complexity (VC-dimension / m)

    Loose and not always informative.

  • Margin Radius bound:

    Risk < R² / margin²

    Where R is the radius of the smallest enclosing sphere of the data in feature space.

    Tight and informative.


Error Bounds. Cont.

Error

Bound

LOO Error

Parameter


Rademacher Complexity

  • One of the tightest sample-based bounds depend on the Rademacher Complexity term defined as follows:

    where:

    F is the class of functions mapping the domain of the input into R.

    Ep(x) expectation with respect to the probability distribution of the input data.

    Eσexpectation with respect to σi: independent uniform random variable of {±1}

  • Rademacher complexity is a measure of the ability of the class of resulting functions to classify the input samples if associated with a random class.


Rademacher Risk Bound

  • The following bound holds true with probability (1-δ):

    Where:

    Êm is the error on the input data measured through a loss function h(.) with Lipshitz constant L. That is:

    And the loss function can be one of:

    Vapnik’s:Bartlett & Mendelson’s:


  • Login