300 likes | 692 Views
Navneet Goyal , BITS- Pilani , Rajasthan INDIA. Kernel methods. Figure source: http://wwwold.ini.ruhr-uni-bochum.de/thbio/group/neuralnet/index_p.html. Kernel Methods.
E N D
NavneetGoyal, BITS-Pilani, Rajasthan INDIA Kernel methods • Figure source: • http://wwwold.ini.ruhr-uni-bochum.de/thbio/group/neuralnet/index_p.html
Kernel Methods • In computer science, kernel methods (KMs) are a class of algorithms for pattern analysis, whose best known element is the support vector machine (SVM) (Wikipedia) • Transformations • Feature Spaces • Kernel Functions • Kernel Tricks • Inner Products
Kernel Methods Algorithms capable of operating with kernels include: • Support vector machine (SVM) • Gaussian processes • Fisher's linear discriminant analysis (LDA) • Principal components analysis (PCA) (Kernel PCA) • Canonical correlation analysis • Ridge regression • Spectral clustering • Linear adaptive filters • …
Kernel Methods • Kernels are non-linear generalizations of inner products
Kernel Methods • Any kernel-based method comprises of two modules: • Mapping into embedding or feature space • Learning algorithm designed to discover linear patterns in that space
Kernel Methods • Why this approach works? • Detecting linear patterns has been the focus of much research in statistics and machine learning • Resulting algorithms are well understood and efficient • Computational shortcut: makes it possible to represent linear patterns efficiently in high dimensional space to ensure adequate representational power • The shortcut is nothing but the KERNEL FUNCTION
Kernel Methods • Kernel methods allow us to extend algorithms such as SVMs to define non-linear decision boundaries • Other algorithms that only depend on inner products between data points can be extended similarly • Kernel functions which are symmetric and positive definite allows us to implicitly define inner products in high dimensional • Replacing inner products in input space with positive definite kernels immediately extends algorithms like SVM to • Linear separation in high dimensional space • Or equivalently to a non-linear separation in input space
Types of Kernels • Positive definite symmetric kernels (PDS) • Negative definite symmetric kernels (NDS) • Role of NDS in construction on PDS!
Kernel Methods • Input space, χ • High dimensional space, ℍ • ℍ can be really large!! • Document classification • Trigrams • Vocabulary of 100000 words • Dimension of feature space reaches 1015 • Generalization ability of large-margin classifiers like SVM does not depend on dimensions of the feature space, but on the margin and no. of training examples
Kernel Functions • A function K: 𝟀 x 𝟀 → ℝ is called a kernel over 𝟀 • For any two points x, x’ ∈ 𝟀, K(x,x’) = 〈 ϕ (x), ϕ (x’)〉 For some mapping ϕ : 𝟀 → ℍ to a Hilbert space ℍ called a feature space • K is efficient! • K(x,x’) is O(N) • 〈 ϕ (x), ϕ (x’)〉 is O(dim ℍ) with dim ℍ ≫ N • K is flexible! • No need to explicitly define or compute ϕ • Kernel K can be arbitrarily chosen so long as the existensce of ϕ is guaranteed, i.e. K satisfies Mercer’s condition
Kernel Functions • Mercer’s Condition • A kernel function K can be expressed as K(x,x’) = 〈 ϕ (x), ϕ (x’)〉 iff, for any function g(x) such that ∫g(x)2dx is finite, then ∫K(x,x’)g(x)g(x’) dxdx’ ≥ 0. Kernels satisfying Mercer’s condition are called Positive Definite Kernel Functions! Transformed space of SVM kernels is called a Reproducing Kernel Hilbert Space (RKHS)
Kernel Functions • Examples • Show that the polynomial Kernel fn. satisfies the Mercer’s condition
Feature Spaces example:
Modularity Kernel methods consist of two modules: 1) The choice of kernel (this is non-trivial) 2) The algorithm which takes kernels as input Modularity: Any kernel can be used with any kernel-algorithm. some kernel algorithms: - support vector machine - Fisher discriminant analysis - kernel regression - kernel PCA some kernels:
Goodies and Baddies • Goodies: • Kernel algorithms are typically constrained convex optimization • problems solved with either spectral methods or convex optimization tools. • Efficient algorithms do exist in most cases. • The similarity to linear methods facilitates analysis. There are strong • generalization bounds on test error. • Baddies: • You need to choose the appropriate kernel • Kernel learning is prone to over-fitting • All information must go through the kernel-bottleneck.