Sparse kernel machines
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Sparse Kernel Machines PowerPoint PPT Presentation


  • 59 Views
  • Uploaded on
  • Presentation posted in: General

Sparse Kernel Machines. Introduction. Kernel function k(xn,xm) must be evaluated for all pairs of training points Advantageous (computationally) to look at kernel algorithms that have sparse solutions new predictions depend only on kernel function evaluated at a few training points

Download Presentation

Sparse Kernel Machines

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Sparse kernel machines

Sparse Kernel Machines


Introduction

Introduction

  • Kernel function k(xn,xm) must be evaluated for all pairs of training points

  • Advantageous (computationally) to look at kernel algorithms that have sparse solutions

  • new predictions depend only on kernel function evaluated at a few training points

  • possible solution: support vector machines (SVM)


Maximum margin classifiers 1

Maximum Margin Classifiers (1)

  • Setup:

  • y(x) = w’φ(x) + b, φ(x) = feature-space transformation

    Training data comprises

    • N input data vectors x1,...,xN

    • N target vectors t1,...,tNtn∈{-1,1}

  • new samples classified according to sign of y(x)

  • assume linearly separable data for the moment

  • y(x)>0 for points of class +1 (i.e. tn = +1)

  • y(x)<0 for points having tn = -1

  • therefore,tny(xn) > 0 for all training points


Maximum margin classifiers 2

Maximum Margin Classifiers (2)

  • There may be many solutions to separate the data

    SVM tries to minimize the generalization error by introducing the concept of a margin i.e. smallest distance between the decision boundary and any of the samples

  • Decision boundary is chosen to be that for which the margin is maximized

  • decision boundary thus only dependent on points closest to it.


Maximum margin classifiers 3

Maximum Margin Classifiers (3)


Sparse kernel machines

  • If we consider only points that are correctly classified, then we have

    tny(xn) > 0 for all n

  • The distance of xn to the decision surface is then

  • tny(xn)

  • w = tn(wTφ(xn) + b)

  • w

  • ● Optimize w and b to maximize this distance => solve

  • (7.2)

  • (7.3)

  • ● Quite hard to solve => need to get around it


Maximum margin classifiers 4

Maximum Margin Classifiers (4)


7 1 1 overlapping class distributions

7.1.1 Overlapping class distributions


Sparse kernel machines

  • An alternative, equivalent formulation of the support vector machine, known as the ν-SVM


Sparse kernel machines

the parameter ν, which replaces C, can beinterpreted as both an upper bound on the fraction of margin errors (points for whichξn > 0 and hence which lie on the wrong side of the margin boundary and which mayor may not be misclassified) and a lower bound on the fraction of support vectors.


7 1 2 relation to logistic regression

7.1.2 Relation to logistic regression


7 1 3 multi class svm

7.1.3 Multi-class SVM


7 1 4 svms for regression

7.1.4 SVMs for regression


7 1 4 svms for regression1

7.1.4 SVMs for regression


7 1 4 svms for regression2

7.1.4 SVMs for regression


7 1 4 svms for regression3

7.1.4 SVMs for regression


7 1 4 svms for regression 7 1 5 computational learning theory

7.1.4 SVMs for regression &7.1.5 Computational learning theory


Relevance vector machines 1

Relevance Vector Machines (1)


Relevance vector machines 2 7 2 1 rvm for regression

Relevance Vector Machines (2)7.2.1 RVM for regression


Relevance vector machines 3 7 2 1 rvm for regression

Relevance Vector Machines (3)7.2.1 RVM for regression


Relevance vector machines 4 7 2 1 rvm for regression

Relevance Vector Machines (4)7.2.1 RVM for regression


Relevance vector machines 5 7 2 1 rvm for regression

Relevance Vector Machines (5)7.2.1 RVM for regression


Relevance vector machines 6 7 2 1 rvm for regression

Relevance Vector Machines (6)7.2.1 RVM for regression


Relevance vector machines 8 7 2 2 analysis of sparsity

Relevance Vector Machines (8)7.2.2 Analysis of sparsity


Relevance vector machines 9 7 2 3 rvm for classification

Relevance Vector Machines (9)7.2.3 RVM for classification


Relevance vector machines 10 7 2 3 rvm for classification

Relevance Vector Machines (10)7.2.3 RVM for classification


Relevance vector machines 11 7 2 3 rvm for classification

Relevance Vector Machines (11)7.2.3 RVM for classification


Relevance vector machines 12 7 2 3 rvm for classification

Relevance Vector Machines (12)7.2.3 RVM for classification


  • Login