welcome to the kernel class l.
Skip this Video
Loading SlideShow in 5 Seconds..
Welcome to the Kernel-Class PowerPoint Presentation
Download Presentation
Welcome to the Kernel-Class

Loading in 2 Seconds...

play fullscreen
1 / 11

Welcome to the Kernel-Class - PowerPoint PPT Presentation

  • Uploaded on

Welcome to the Kernel-Class. My name : Max (Welling) Book: There will be class-notes/slides. Homework : reading material, some exercises, some MATLAB implementations. I like : an active attitude in class. ask questions! respond to my questions. Enjoy.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Welcome to the Kernel-Class' - mardi

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
welcome to the kernel class
Welcome to the Kernel-Class
  • My name: Max (Welling)
  • Book:
  • There will be class-notes/slides.
  • Homework: reading material, some exercises,

some MATLAB implementations.

  • I like: an active attitude in class.

ask questions! respond to my questions.

  • Enjoy.
primary goal
Primary Goal
  • What is the primary goal of:
  • - Machine Learning
  • - Data Mining
  • - Pattern Recognition
  • - Data Analysis
  • - Statistics
  • Automatic detection of non-coincidental structure in data.
  • Robust algorithms are insensitive to outliers and wrong
  • model assumptions.
  • Stable algorithms: generalize well to unseen data.
  • Computationally efficient algorithms are necessary to handle
  • large datasets.
supervised unsupervised learning
Supervised & Unsupervised Learning
  • supervised: classification, regression
  • unsupervised: clustering, dimensionality reduction, ranking,
  • outlier detection.
  • primal vs. dual problems: generalized linear models vs.
  • kernel classification.

this is like nearest neighbor


feature spaces
Feature Spaces

non-linear mapping to F

1. high-D space

2. infinite-D countable space :

3. function space (Hilbert space)


kernel trick
Kernel Trick

Note: In the dual representation we used the Gram matrix

to express the solution.

Kernel Trick:

Replace :


If we use algorithms that only depend on the Gram-matrix, G,

then we never have to know (compute) the actual features

This is the crucial point of kernel methods

properties of a kernel
Properties of a Kernel

Definition:A finitely positive semi-definite function

is a symmetric function of its arguments for which matrices formed

by restriction on any finite subset of points is positive semi-definite.

Theorem:A function can be written

as where is a feature map

iff k(x,y) satisfies the semi-definiteness property.

Relevance: We can now check if k(x,y) is a proper kernel using

only properties of k(x,y) itself,

i.e. without the need to know the feature map!


Kernel methods consist of two modules:

1) The choice of kernel (this is non-trivial)

2) The algorithm which takes kernels as input

Modularity: Any kernel can be used with any kernel-algorithm.

some kernel algorithms:

- support vector machine

- Fisher discriminant analysis

- kernel regression

- kernel PCA

- kernel CCA

some kernels:

goodies and baddies
Goodies and Baddies
  • Goodies:
  • Kernel algorithms are typically constrained convex optimization
  • problems  solved with either spectral methods or convex optimization tools.
  • Efficient algorithms do exist in most cases.
  • The similarity to linear methods facilitates analysis. There are strong
  • generalization bounds on test error.
  • Baddies:
  • You need to choose the appropriate kernel
  • Kernel learning is prone to over-fitting
  • All information must go through the kernel-bottleneck.
  • regularization is very important!
  • regularization parameters determined by out of sample.
  • measures (cross-validation, leave-one-out).

Demo Trevor Hastie.

learning kernels
Learning Kernels
  • All information is tunneled through the Gram-matrix information
  • bottleneck.
  • The real art is to pick an appropriate kernel.
  • e.g. take the RBF kernel:

if c is very small: G=I (all data are dissimilar): over-fitting

if c is very large: G=1 (all data are very similar): under-fitting

We need to learn the kernel. Here is some ways to combine kernels to improve them:




any positive