SVM – Support Vector Machines

SVM – Support Vector Machines Presented By: Bella Specktor

Lecture Topics: • Motivation to SVM • SVM – Algorithm Description • SVM Applications

Motivation SVM Alg. Applications References Motivation - Learning Our task is to detect and exploit complex patterns in data. For this, we should use learning algorithms. We would like to use algorithm that is able to make generalizations, but not over generalization. • Neural Networks can be used for this task

Motivation SVM Alg. Applications References Linear Classification • Suppose we have linear separable data, and we want to classify it into 2 classes. We will label the training data • Our purpose is to find and b. • Linear separation of the input space is done by the function: • Any of this hyperplanes • would be fine for the • separation. Which one to • choose?

Motivation SVM Alg. Applications References Perceptron Algorithm If • What about the non-linear case?

Motivation SVM Alg. Applications References Neural Networks We can use advanced networks architecture with multiple layers. • But… • Some of them are having many local minima. • Need to find how many neurons are needed. • Sometimes we get many solutions.

Motivation SVM Alg. Applications References SVM - History • Said to start in 1979 with Vladimir Vapnik’s paper. • Major developmets throughout 1990’s : introduced in 1992 by Boser, Guyon & Vapnik. • Centrelized web site: • www.kernel-machines.org • Has been applied to diverse • problems very successfully in the last 10-15 • years.

Motivation SVM Alg. Applications References The SVM Algorithm • Margin of a linear classifier is the width that the boundary could be increased by before hitting a datapoint. • SVM would choose the Maximal Margin, • Where distance to the closest negative example = Distance to the closest positive example.

Motivation SVM Alg. Applications References Why Maximum Margin? • Better empirical performance. • Even if we have small error in the location of the boundary, we have least chance of misclassification. • Avoiding local minima.

Motivation SVM Alg. Applications References VC (Vapnik-Chervinenkis) dimensions and Structural Risk Minimization • VC dimension of model f is the maximal data point set cardinality that can be shattered by f. For Example: • Set of points P is said to be shattered by F if for any subset of points there exists such that f can separate P perfectly.

Motivation SVM Alg. Applications References • The bound on the test error of the classification model is given by: (Vapnik, 1995, “Structural Minimization Principle”) • Intuitively, functions with high VC dimension represent many dichotomies for a given data set.

Motivation SVM Alg. Applications References VC dimensions and Structural Risk Minimization A function that minimizes the empirical risk and has low VC dimension will generalize well regardless of the dimensionality of the input space (structural risk minimization). • Vapnik has shown that maximizing the margin of separation between classes is equivalent to minimizing the VC dimension.

Motivation SVM Alg. Applications References • Support Vectors are the points closest to the separating hiperplane. . Those are critical points whose removal would change the solution found. • Optimal hyperplane is completely defined by the support vectors.

Motivation SVM Alg. Applications References • Let be an example closest to the boundary. Set • Thus, support vectors lie in the hyperplanes: • Notice that:

Motivation SVM Alg. Applications References

Motivation SVM Alg. Applications References The Margin is . Thus, we will get the widest Margin by minimizing . But how to do it? • For this purpose, we will switch to Dual • Representation and use Quadratic • Programming. • Convert the problem to: minimize Subject to constraint:

Motivation SVM Alg. Applications References In convex problem, R is positive semidefinite. In this case, QP has global minimizer

Motivation SVM Alg. Applications References • Our problem is: Subject to constraint: • Introduce Lagrange multipliers associated with the constraints. The solution to the primal problem is equivalent to determining the saddle point of the function:

Motivation SVM Alg. Applications References can be optimized by quadratic programming. formulated in terms of , but depends on w and b.

Motivation SVM Alg. Applications References b can be determined by the optimal and condition: implies: • For every sample i, one of the following must hold: Many sparse solution. Samples with are Support Vectors

Motivation SVM Alg. Applications References Test Phase: determine on which side of the decision boundary a given test pattern lies and assign the corresponding label:

Motivation SVM Alg. Applications References Soft Margin Classifier In real world problem it is not likely to get an exactly separate line dividing the data. We might have a curved decision boundary. Exact linear separation may not be desirable if the data has noise in it. Smoothing boundary. We want that

Motivation SVM Alg. Applications References is the upper bound on the number of training errors. Thus, in order to control error rate, we would minimize also , when a larger C is corresponding to assigning higher penalty for errors. The new QP: Constrains: Define: Where:

Motivation SVM Alg. Applications References Non Linear SVM Limitations of Linear SVM: Doesn’t work on non linear separable data. Noise problem. But… the advantage is that it deals with vectorial data. We saw earlier that we can use Neural Networks, but it has many limitations. What should we do?

Motivation SVM Alg. Applications References Let’s look at the following example: We would like to map the samples so that they would be linearly separable. If we will lift to two dimensional space with we will get:

Motivation SVM Alg. Applications References So, possible Solution can be: map data into a richer feature space (usually called Hilbert’s space) including non linear features, and than use linear classifier. • But… • There is a computational problem. • There is a generalization problem.

Motivation SVM Alg. Applications References Solution: using kernels Remember we used dual representation, and hence data appears only in the form of dot products. Kernel is a function that returns the value of the dot product between the images of two arguments. Thus, we can replace dot products with Kernels .

Motivation SVM Alg. Applications References • Now, rather than making inner product on the new, larger vectors, we represent dot product of the data after doing non linear mapping on them. The Kernel Matrix: • For Kernel we would only need to use K in the training algorithm, and would never need to explicitly even know what is.

Motivation SVM Alg. Applications References

Motivation SVM Alg. Applications References Mercer’s Condition Which functions can serve as Kernels? Every (semi) positive definite symmetric function is a Kernel, i.e. there exist a mapping such that it is possible to write:

Motivation SVM Alg. Applications References Different Kernel Functions Polynomial: where p is degree of the polinomial. 2. Gaussian Radial Basis Function: 3. Two layer sigmoidal NN:

Motivation SVM Alg. Applications References Multi-Class Classification Two basic strategies: One Versus All: Q SVMs are trained, and each of the SVMs separates a single class from all the others. Classification is done by “winner takes all strategy”, in which the classifier with the highest output function assigns the class.

Motivation SVM Alg. Applications References Multi-Class Classification • Pairwise: Q(Q-1) machines are trained, each SVM separates a pair of classes. • The classification is done by “max-wins” voting strategy, in which the class with most votes determines the instance classification. • First is preferable in terms of training complexity. • Experiments didn’t show big performance differences between the two.

Motivation SVM Alg. Applications References Summary - SVM Algorithm for pattern classification Start with data x1,…,xn which lives in feature space of dimension d. Implicitly define the feature space by choosing a Kernel. Find the largest margin linear discriminant function in the higher dimensional space by using quadratic programming package to solve:

Motivation SVM Alg. Applications References Strength and Weaknesses of SVM Strength: Training is relatively easy. No local minima (unlike in NN). Scales relatively well to high dimensional data. Trade-of between complexity and error can be controlled explicitly. Major Weakness: Need for a good Kernel function.

Motivation SVM Alg. Applications References What Is SVM useful for? Pattern Recognition: Object Recognition Handwriting recognition Speecker Identification Text Categorization Face Recognition • Regression Estimation

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach Bernd Heisle, Purdi Ho & TomasoPogio • Global Approach – basic algorithm: • One-versus all strategy was used. • Linear SVM for each person in the dataset. • Each SVM was trained to distinguish between all images of a single person and all other images.

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach • Given a set of q people (a set of q SVMs), the class label y of a face pattern x is computed as follows: Let |d| be the distance from x to hyperplane: where • The gray values of the face picture were converted to feature vector.

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach • Global Approach – improved algorithm: • Variation of this algorithm was using second degree polynomial SVM (SVM with second degree polynomial Kernel).

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach • Component-based algorithm: • In the detection phase, facial components were detected.

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach • Than, final detection was made by combining the results of the component classifiers. Each of the componenets was normalized in size their gray levels were combined into a single feature vector: • Again , one versus all Linear SVM was used.

Motivation SVM Alg. Applications References Face Recognition with SVM - Global Versus Component Approach - Results • The Component-Based algorithm showed much better results than the Global approach.

Motivation SVM Alg. Applications References References • B. Heisele, P. Ho & T. Poggio. Face Recognition With Support Vector Machines. Computer Vision and Image understanding, vol. 91, no 1-2, pp. 6-21, 2003. • C.J.C Burges, A Tutorial on Support Vector Machines for Pattern Recognition. Data Mining and Knowledge Discovery. Vol 2(1) , 121-167. • J.P Lewis. A Short SVM (Support Vector Machines) Tutorial.

Motivation SVM Alg. Applications References References • Prof. Bebis. Support Vector Machines. Pattern Recognition Course Spring 2006 Lecture Slides. • Prof. A.W Moore. Support Vector Machines. 2003 Lecture Slides. • R. Osadchy. Support Vector Machines. 2008 Lecture Slides. • Youtube– Facial Expressions Recognition http://www.youtube.com/watch?v=iPFg52yOZzY

SVM – Support Vector Machines

SVM – Support Vector Machines

Presentation Transcript

SVM — Support Vector Machines

SVM Support Vectors Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machine (SVM)

Support Vector Machines

Support Vector Machines

Support Vector Machine (SVM)

Support Vector Machine - SVM