Loading in 2 Seconds...

Tutorial: Interior Point Optimization Methods in Support Vector Machines Training

Loading in 2 Seconds...

- By
**meris** - Follow User

- 229 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'Tutorial: Interior Point Optimization Methods in Support Vector Machines Training' - meris

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Tutorial:Interior Point Optimization Methodsin Support Vector Machines Training

Part 1: Fundamentals of SVMs

Theodore Trafalis

email: [email protected]

ANNIE’99, St. Louis, Missouri, U.S.A, Nov. 7, 1999

Outline

- Statistical Learning Theory
- Empirical Risk Minimization
- Structural Risk Minimization
- Linear SVM and Linear Separable Case
- Primal Optimization Problem
- Dual Optimization Problem
- Non-Linear Case
- Support Vector Regression
- Dual Problem for Regression
- Kernel Functions in SVMs
- Open Problem

Statistical Learning Theory (Vapnik 1995,1998)

Empirical Risk Minimization

- Given a set of decision functions {f(x): l} ,

f : n [-1,1] where is set of abstract parameters.

- Suppose (x1,y1), (x2,y2), ..., (xl, yl) are such that x n,

y {1,-1} are taken from an unknown distribution P(x,y).

We want to find a f* which minimizes the expected risk

functional

where f(x), { f(x): l}are called hypothesis and hypothesis space, respectively.

Empirical Risk Minimization

- The problem is that the distribution function P(x,y) is unknown. We can not compute the expected risk. Instead we compute the empirical risk
- The idea behind minimizing empirical risk is that if Remp converges to the expected risk, then minimum of Remp may converge to the minimum of expected risk.
- A typical uniform VC bound, which holds with probability 1-, has the following form

Structural Risk Minimization

- A small value of the empirical risk does not necessarily imply a small value of expected risk.
- Structural Risk Minimization Principle (SRM) (Vapnik 1982,1995): VC dimension and empirical risk should be minimized at the same time.
- Need a nested structure of hypothesis space
- H1 H2 H3 ... Hn.....
- with the property that h(n) h(n+1), where h(n) is the VC dimension of the Hn.
- Need to solve the following problem

Linear SVM and Linear Separable Case

- Assume that we are given a set S of points xin where each xi belongs to either of two classes defined by yi {1,-1}. The objective is to find a hyperplane that divides S leaving all the points of the same class on the same side while maximizing the minimum distance between either of the two classes and the hyperplane [Vapnik 1995].
- Definition 1. The set S is linearly separable if there exists a w n and a b such that
- In order to make each decision surface corresponding to one unique pair (w,b), the following constraint is imposed.

Relationship between VC dimension and the canonical hyperplane.

- Suppose all points x1, x2, x3, ... ,x1 lie in the n-unit dimensional sphere. The set

has a VC dimension h that satisfies the following bound

h min {A2, L} + 1

- Maximizing margin minimizes the function complexity

continued

- The distance from a point x to the hyperplane associated to the pair (w,b) is
- The distance between canonical hyperplane and the closest point is
- The goal of the SVM is to find, among all the hyperplanes that correctly

classify the data with the minimum norm, or minimum ||w||2.

Minimizing ||w||2 is equivalent to finding separating hyperplane for

which the distance between two classes, is maximized. This distance is

called margin.

Primal Optimization Problem

- Primal problem

Computing Saddle Points

- The Lagrangian is
- Optimality conditions

Optimal point

- Support vector: a training vector for which

Non-Linear Case

- If the data are nonlinear separable, we map the input variable x into a higher dimensional feature space.
- If we map the input space to the feature space, then we will obtain a hyperplane that separates the data into two groups in the feature space.
- Kernel function

Dual problem in nonlinear case

- replace the dot product of the inputs with the kernel function in linearly non separable case

Support Vector Regression

- The e- insensitive support vector regression:find a function f(x) that has e deviation from the actually obtained target yi for all training data and at the same time is as flat as possible.If
- Primal Regression Problem

Soft Margin Formulation

- Soft Margin Formulation
- C determines the trade off between the flatness of the f(x) and the amount up to which deviations larger than e are tolerated.
- The e-insensitive loss function ||e (Vapnik 1995) is defined as

Saddle Point Optimality Conditions

- Lagrangian function will help us to formulate the dual problem
- Optimality Conditions

Dual Problem for Regression

- Dual Problem
- Solving

KKT Optimality Conditions and b*

- KKT Optimality Conditions
- only samples (xi,yi) with corresponding li = C lie outside the e-insensitive tube around f. If li is nonzero, then l*i is zero and vice versa. Finally if li is in (0,C) then corresponding is zero.
- b can be computed as follows

QP SV Regression Problem in Feature Space

- Mapping in the feature space we obtain the following quadratic SV regression problem
- At the optimal solution, we obtain

Kernel Functions in SVMs

- An inner product in feature space has an equivalent kernel in input space
- Any symmetric positive semi-definite function (Smola 1998), which satisfies the Mercer\'s Conditions can be used as kernel function in the SVM context. Mercer\'s Conditions can be written as

Some kernel functions

- Polynomial type:
- Gaussian Radial Basis Function (GRBF):
- Exponential Radial Basis Function:
- Multi-Layer Perceptron:
- Fourier Series:

Open Problem

- We have more than one kernel to map the input space into feature space.
- Question: which kernel functions provide good generalization for a particular problem?
- Some validation techniques, such as bootstrapping, and cross-validation can be used to determine a good kernel
- Even when we decide for a kernel function, we have to compute the parameters of the kernel (e.g RBF has a parameter s and one has to decide the value of the s before the experiment).
- No theory yet for selection of optimal kernels (Smola 1988, Amari 1999)
- For a more extensive literature and software in SVMs check the web page http://svm.first.gmd.de/

Download Presentation

Connecting to Server..