Kernel methods

NavneetGoyal, BITS-Pilani, Rajasthan INDIA Kernel methods • Figure source: • http://wwwold.ini.ruhr-uni-bochum.de/thbio/group/neuralnet/index_p.html

Kernel Methods • In computer science, kernel methods (KMs) are a class of algorithms for pattern analysis, whose best known element is the support vector machine (SVM) (Wikipedia) • Transformations • Feature Spaces • Kernel Functions • Kernel Tricks • Inner Products

Kernel Methods Algorithms capable of operating with kernels include: • Support vector machine (SVM) • Gaussian processes • Fisher's linear discriminant analysis (LDA) • Principal components analysis (PCA) (Kernel PCA) • Canonical correlation analysis • Ridge regression • Spectral clustering • Linear adaptive filters • …

Kernel Methods • Kernels are non-linear generalizations of inner products

Kernel Methods • Any kernel-based method comprises of two modules: • Mapping into embedding or feature space • Learning algorithm designed to discover linear patterns in that space

Kernel Methods • Why this approach works? • Detecting linear patterns has been the focus of much research in statistics and machine learning • Resulting algorithms are well understood and efficient • Computational shortcut: makes it possible to represent linear patterns efficiently in high dimensional space to ensure adequate representational power • The shortcut is nothing but the KERNEL FUNCTION

Kernel Methods • Kernel methods allow us to extend algorithms such as SVMs to define non-linear decision boundaries • Other algorithms that only depend on inner products between data points can be extended similarly • Kernel functions which are symmetric and positive definite allows us to implicitly define inner products in high dimensional • Replacing inner products in input space with positive definite kernels immediately extends algorithms like SVM to • Linear separation in high dimensional space • Or equivalently to a non-linear separation in input space

Types of Kernels • Positive definite symmetric kernels (PDS) • Negative definite symmetric kernels (NDS) • Role of NDS in construction on PDS!

Kernel Methods • Input space, χ • High dimensional space, ℍ • ℍ can be really large!! • Document classification • Trigrams • Vocabulary of 100000 words • Dimension of feature space reaches 1015 • Generalization ability of large-margin classifiers like SVM does not depend on dimensions of the feature space, but on the margin and no. of training examples

Kernel Functions • A function K: 𝟀 x 𝟀 → ℝ is called a kernel over 𝟀 • For any two points x, x’ ∈ 𝟀, K(x,x’) = 〈 ϕ (x), ϕ (x’)〉 For some mapping ϕ : 𝟀 → ℍ to a Hilbert space ℍ called a feature space • K is efficient! • K(x,x’) is O(N) • 〈 ϕ (x), ϕ (x’)〉 is O(dim ℍ) with dim ℍ ≫ N • K is flexible! • No need to explicitly define or compute ϕ • Kernel K can be arbitrarily chosen so long as the existensce of ϕ is guaranteed, i.e. K satisfies Mercer’s condition

Kernel Functions • Mercer’s Condition • A kernel function K can be expressed as K(x,x’) = 〈 ϕ (x), ϕ (x’)〉 iff, for any function g(x) such that ∫g(x)2dx is finite, then ∫K(x,x’)g(x)g(x’) dxdx’ ≥ 0. Kernels satisfying Mercer’s condition are called Positive Definite Kernel Functions! Transformed space of SVM kernels is called a Reproducing Kernel Hilbert Space (RKHS)

Kernel Functions • Examples • Show that the polynomial Kernel fn. satisfies the Mercer’s condition

Feature Spaces example:

Modularity Kernel methods consist of two modules: 1) The choice of kernel (this is non-trivial) 2) The algorithm which takes kernels as input Modularity: Any kernel can be used with any kernel-algorithm. some kernel algorithms: - support vector machine - Fisher discriminant analysis - kernel regression - kernel PCA some kernels:

Goodies and Baddies • Goodies: • Kernel algorithms are typically constrained convex optimization • problems  solved with either spectral methods or convex optimization tools. • Efficient algorithms do exist in most cases. • The similarity to linear methods facilitates analysis. There are strong • generalization bounds on test error. • Baddies: • You need to choose the appropriate kernel • Kernel learning is prone to over-fitting • All information must go through the kernel-bottleneck.

Kernel methods

Kernel methods

Presentation Transcript

Chapter 6 Kernel Smoothing Methods

Kernel Methods Part 2

Overview of Kernel Methods

Kernel Methods: Basics

Kernel Methods and SVM’s

Kernel Methods

Kernel Methods

Kernel Methods for Relation Extraction

Neural Networks and Kernel Methods

Kernel synchronization methods

Speaker Verification via Kernel Methods

Kernel – Based Methods

Kernel Methods

Overview of Kernel Methods (Part 2)

Kernel Methods

Kernel Methods for fMRI Pattern Prediction

Support Vector and Kernel Methods

Reproducing Kernel Hilbert Space (RKHS), Regularization Theory, and Kernel Methods

Kernel Methods: Support Vector Machines

Kernel Density Estimation, Kernel Methods, and fast learning

Kernel Methods

Lecture 7. Kernel Smoothing Methods