tutorial interior point optimization methods in support vector machines training
Download
Skip this Video
Download Presentation
Tutorial: Interior Point Optimization Methods in Support Vector Machines Training

Loading in 2 Seconds...

play fullscreen
1 / 32

Tutorial: Interior Point Optimization Methods in Support Vector Machines Training - PowerPoint PPT Presentation


  • 229 Views
  • Uploaded on

Tutorial: Interior Point Optimization Methods in Support Vector Machines Training . Part 1: Fundamentals of SVMs Theodore Trafalis email: [email protected] ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999. Outline. Statistical Learning Theory Empirical Risk Minimization

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Tutorial: Interior Point Optimization Methods in Support Vector Machines Training' - meris


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
tutorial interior point optimization methods in support vector machines training

Tutorial:Interior Point Optimization Methodsin Support Vector Machines Training

Part 1: Fundamentals of SVMs

Theodore Trafalis

email: [email protected]

ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999

outline
Outline
  • Statistical Learning Theory
    • Empirical Risk Minimization
    • Structural Risk Minimization
  • Linear SVM and Linear Separable Case
    • Primal Optimization Problem
    • Dual Optimization Problem
  • Non-Linear Case
  • Support Vector Regression
  • Dual Problem for Regression
  • Kernel Functions in SVMs
  • Open Problem
statistical learning theory vapnik 1995 1998
Statistical Learning Theory (Vapnik 1995,1998)

Empirical Risk Minimization

  • Given a set of decision functions {f(x): l} ,

f : n [-1,1] where is set of abstract parameters.

  • Suppose (x1,y1), (x2,y2), ..., (xl, yl) are such that x n,

y {1,-1} are taken from an unknown distribution P(x,y).

We want to find a f* which minimizes the expected risk

functional

where f(x), { f(x): l}are called hypothesis and hypothesis space, respectively.

empirical risk minimization
Empirical Risk Minimization
  • The problem is that the distribution function P(x,y) is unknown. We can not compute the expected risk. Instead we compute the empirical risk
  • The idea behind minimizing empirical risk is that if Remp converges to the expected risk, then minimum of Remp may converge to the minimum of expected risk.
  • A typical uniform VC bound, which holds with probability 1-, has the following form
structural risk minimization
Structural Risk Minimization
  • A small value of the empirical risk does not necessarily imply a small value of expected risk.
  • Structural Risk Minimization Principle (SRM) (Vapnik 1982,1995): VC dimension and empirical risk should be minimized at the same time.
  • Need a nested structure of hypothesis space
    • H1 H2 H3 ...  Hn.....
    • with the property that h(n)  h(n+1), where h(n) is the VC dimension of the Hn.
  • Need to solve the following problem
linear svm and linear separable case
Linear SVM and Linear Separable Case
  • Assume that we are given a set S of points xin where each xi belongs to either of two classes defined by yi {1,-1}. The objective is to find a hyperplane that divides S leaving all the points of the same class on the same side while maximizing the minimum distance between either of the two classes and the hyperplane [Vapnik 1995].
  • Definition 1. The set S is linearly separable if there exists a w n and a b  such that
  • In order to make each decision surface corresponding to one unique pair (w,b), the following constraint is imposed.
relationship between vc dimension and the canonical hyperplane
Relationship between VC dimension and the canonical hyperplane.
  • Suppose all points x1, x2, x3, ... ,x1 lie in the n-unit dimensional sphere. The set

has a VC dimension h that satisfies the following bound

h  min {A2, L} + 1

  • Maximizing margin minimizes the function complexity
continued
continued
  • The distance from a point x to the hyperplane associated to the pair (w,b) is
  • The distance between canonical hyperplane and the closest point is
  • The goal of the SVM is to find, among all the hyperplanes that correctly

classify the data with the minimum norm, or minimum ||w||2.

Minimizing ||w||2 is equivalent to finding separating hyperplane for

which the distance between two classes, is maximized. This distance is

called margin.

computing saddle points
Computing Saddle Points
  • The Lagrangian is
  • Optimality conditions
optimal point
Optimal point
  • Support vector: a training vector for which
the idea of svm
The Idea of SVM

input space feature space

non linear case
Non-Linear Case
  • If the data are nonlinear separable, we map the input variable x into a higher dimensional feature space.
  • If we map the input space to the feature space, then we will obtain a hyperplane that separates the data into two groups in the feature space.
  • Kernel function
dual problem in nonlinear case
Dual problem in nonlinear case
  • replace the dot product of the inputs with the kernel function in linearly non separable case
support vector regression
Support Vector Regression
  • The e- insensitive support vector regression:find a function f(x) that has e deviation from the actually obtained target yi for all training data and at the same time is as flat as possible.If
  • Primal Regression Problem
soft margin formulation
Soft Margin Formulation
  • Soft Margin Formulation
  • C determines the trade off between the flatness of the f(x) and the amount up to which deviations larger than e are tolerated.
  • The e-insensitive loss function ||e (Vapnik 1995) is defined as
saddle point optimality conditions26
Saddle Point Optimality Conditions
  • Lagrangian function will help us to formulate the dual problem
  • Optimality Conditions
dual problem for regression
Dual Problem for Regression
  • Dual Problem
  • Solving
kkt optimality conditions and b
KKT Optimality Conditions and b*
  • KKT Optimality Conditions
  • only samples (xi,yi) with corresponding li = C lie outside the e-insensitive tube around f. If li is nonzero, then l*i is zero and vice versa. Finally if li is in (0,C) then corresponding  is zero.
  • b can be computed as follows
qp sv regression problem in feature space
QP SV Regression Problem in Feature Space
  • Mapping in the feature space we obtain the following quadratic SV regression problem
  • At the optimal solution, we obtain
kernel functions in svms
Kernel Functions in SVMs
  • An inner product in feature space has an equivalent kernel in input space
  • Any symmetric positive semi-definite function (Smola 1998), which satisfies the Mercer\'s Conditions can be used as kernel function in the SVM context. Mercer\'s Conditions can be written as
some kernel functions
Some kernel functions
  • Polynomial type:
  • Gaussian Radial Basis Function (GRBF):
  • Exponential Radial Basis Function:
  • Multi-Layer Perceptron:
  • Fourier Series:
open problem
Open Problem
  • We have more than one kernel to map the input space into feature space.
  • Question: which kernel functions provide good generalization for a particular problem?
  • Some validation techniques, such as bootstrapping, and cross-validation can be used to determine a good kernel
  • Even when we decide for a kernel function, we have to compute the parameters of the kernel (e.g RBF has a parameter s and one has to decide the value of the s before the experiment).
  • No theory yet for selection of optimal kernels (Smola 1988, Amari 1999)
  • For a more extensive literature and software in SVMs check the web page http://svm.first.gmd.de/
ad