1 / 25

An Overview of Kernel-Based Learning Methods

An Overview of Kernel-Based Learning Methods. Yan Liu Nov 18, 2003. Outline. Introduction Theory Basis: Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization Kernel –based learning algorithm

abram
Download Presentation

An Overview of Kernel-Based Learning Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Overview of Kernel-Based Learning Methods Yan Liu Nov 18, 2003

  2. Outline • Introduction • Theory Basis: • Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization • Kernel –based learning algorithm • Supervised learning: support vector machines(SVMs), kernel fisher discriminant (KFD) • Unsupervised learning: one class SVM , kernel PCA • Kernel design • Standard kernels • Making kernels from kernels • Application oriented kernels: Fisher kernel

  3. Example Idea: map the problem into higher dimensional space. Let F be a potentially much higher dimensional feature space. Let f : X -> F, x->f(x) Learning problem now works with samples (f(x_1), y_1), . . . , (f(x_N)), y_N) in F × Y. Key : Can this mapped problem be classified in a “simple” way? Introduction

  4. Exploring Theory: Roadmap

  5. Reproducing Kernel Hilbert Space -1 • Inner product space: • Hilbert space: • Hilbert space is a complete inner product space, obeying the following:

  6. Reproducing Kernel Hilbert Space -2 • Reproducing Kernel Hilbert Space (RKHS) • Gram matrix • given a kernel k(x, y), define the gram matrix to be Kij = k(xi, xj) • We say the kernel is positive definite when the corresponding gram matrix is positive definite • Definition of RKHS

  7. Reproducing Kernel Hilbert Space -3 • Reproducing properties: • Comment • RKHS is a “bounded” Hilbert space • RKHS is a “smoothed” Hilbert space

  8. Mercer’s Theorem-1 • Mercer’s Theorem • For discrete case, assume A is the Gram Matrix. If A is positive definite, then

  9. Mercer’s Theorem-2 • Comment • Mercer’s theorem provides a concrete way to construct the basis for a RKHS • Mercer’s condition is the only constraint for a kernel: the corresponding gram matrix must be positive definite to be a kernel

  10. Representer Theorem-1

  11. Representer Theorem-2 • Comment • Representer theorem is a powerful result. It shows that although we search for the optimal solution in an infinite-dimension feature space, adding the regularization term reduces the problem to finite-dimensional space (training examples) • In reality, regularization and RKHS are equivalent.

  12. Exploring Theory: Roadmap

  13. Outline • Introduction • Theory Basis: • Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization • Kernel –based learning algorithm • Supervised learning: support vector machines(SVMs), kernel fisher discriminant (KFD) • Unsupervised learning: one class SVM , kernel PCA • Kernel design • Standard kernels • Making kernels from kernels • Application oriented kernels: Fisher kernel

  14. Support Vector Machines-1quick overview

  15. Support Vector Machines-1quick overview

  16. Support Vector Machines-3 • Parameter Sparsity • Most a_i are zeros • C: regularization constant • : slack variables

  17. Support Vector Machines-4Optimization technique • Chunking: • Each step sovles the problem containing all non-zero a_I plus some of the a_I violating KKT conditions • Decomposition methods: SVM_light • The size of the subproblem is fixed, add and remove one sample in each iteration • Sequential minimal optimization (SMO) • Each iteration solves a quadratic problem of size two

  18. Kernel Fisher Discriminant-1Overview of LDA • Fisher’s discriminant (or LDA): find the linear projection with the most discriminative direction • Maximizing the Rayleigh coefficient where S_w is the within class variance and S_B is between class variance. • Comparison with PCA

  19. Kernel Fisher Discriminant-2 • KFD: solves the problem of Fisher’s linear discriminant to get a nonlinear discriminant in input space. • One can express w in terms of mapped training patterns: • The optimization problem for the KFD can be written as:

  20. Kernel PCA -1 • The basic idea of PCA: find a set of orthogonal directions that capture most of the variance in the data. • However, sometimes the clusters are more than N (N is the number of dimensions) • Kernel PCA tries to map the data into a higher dimensional space and perform standard PCA. Using the kernel trick, we can do all our calculations in a lower dimension.

  21. Kernel PCA -2 • Covariance matrix • By definition • Then we have • Define the gram matrix • At last we have: • Therefore we simply have to solve an eigenvalue problem on the Gram matrix.

  22. Outline • Introduction • Theory Basis: • Reproducing Kernel Hilbert space(RKHS), Mercer’s theorem, Representer theorem, regularization • Kernel –based learning algorithm • Supervised learning: support vector machines(SVMs), kernel fisher discriminant (KFD) • Unsupervised learning: one class SVM , kernel PCA • Kernel design • Standard kernels • Making kernels from kernels • Application oriented kernels: Fisher kernel

  23. Standard Kernels

  24. Making kernels out of Kernels • Theorem: • K(x, z) = K1(x,z) + K2(x,z) • K(x, z) = aK1(x,z) • K(x, z) = K1(x,z) * K2(x, z) • K(x, z) = f(x) f(z) • K(x, z) = K3(Φ (x), Φ (y)) • Kernel selection

  25. Fisher-kernel • Jaakolla and Haussler proposed using a generative model as a kernel in a discriminative (non-probabilistic) kernel classifier. • Build a HMM model for each family • Compute the fisher scores for each parameter in the HMM • Use scores as features and predict by SVM with RBF kernel • Good performance for protein family classification

More Related