- 144 Views
- Uploaded on
- Presentation posted in: General

Kernel methods - overview

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

- Kernel smoothers
- Local regression
- Kernel density estimation
- Radial basis functions

Data Mining and Statistical Learning - 2008

Kernel methods are regression techniques used to estimate a response function

from noisy data

Properties:

- Different models are fitted at each query point, and only those observations close to that point are used to fit the model
- The resulting function is smooth
- The models require only a minimum of training

Data Mining and Statistical Learning - 2008

where

Data Mining and Statistical Learning - 2008

- OLS: A single model is fitted to all data
- Splines: Different models are fitted to different subintervals (cuboids) of the input domain
- Kernel methods: Different models are fitted at each query point

Data Mining and Statistical Learning - 2008

The Nadaraya-Watson kernel-weighted average

where indicates the window size and the function D shows how the weights change with distance within this window

The estimated function is smooth!

K-nearest neighbours

The estimated function is piecewise constant!

Data Mining and Statistical Learning - 2008

Epanechnikov kernel

Tri-cube kernel

Data Mining and Statistical Learning - 2008

- The smoothing parameter λ has to be defined
- When there are ties at xi : Compute an average y value and introduce weights representing the number of points
- Boundary issues
- Varying density of observations:
- bias is constant
- the variance is inversely proportional to the density

Data Mining and Statistical Learning - 2008

Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression

Data Mining and Statistical Learning - 2008

Find the intercept and slope parameters solving

The solution is a linear combination of yi:

Data Mining and Statistical Learning - 2008

Kernel smoothing

Solve the minimization problem

Local linear regression

Solve the minimization problem

Data Mining and Statistical Learning - 2008

- Automatically modifies the kernel weights to correct for bias
- Bias depends only on the terms of order higher than one in the expansion of f.

Data Mining and Statistical Learning - 2008

- Fitting polynomials instead of straight lines
Behavior of estimated response function:

Data Mining and Statistical Learning - 2008

Advantages:

- Reduces the ”Trimming of hills and filling of valleys”
Disadvantages:

- Higher variance (tails are more wiggly)

Data Mining and Statistical Learning - 2008

Bias-Variance tradeoff:

Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance.

Data Mining and Statistical Learning - 2008

- Automatic selection ( cross-validation)
- Fixing the degrees of freedom

Data Mining and Statistical Learning - 2008

The one-dimensional approach is easily extended to p dimensions by

- Using the Euclidian norm as a measure of distance in the kernel.
- Modifying the polynomial

Data Mining and Statistical Learning - 2008

”The curse of dimensionality”

- The fraction of points close to the boundary of the input domain increases with its dimension
- Observed data do not cover the whole input domain

Data Mining and Statistical Learning - 2008

Structured kernels (standardize each variable)

Note: A is positive semidefinite

Data Mining and Statistical Learning - 2008

Structured regression functions

- ANOVA decompositions (e.g., additive models)
Backfitting algorithms can be used

- Varying coefficient models (partition X)
- INSERT FORMULA 6.17

Data Mining and Statistical Learning - 2008

Varying coefficient

models (example)

Data Mining and Statistical Learning - 2008

- Assumption: model is locally linear ->maximize the log-likelihood locally at x0:
- Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et ->
yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt)

Data Mining and Statistical Learning - 2008

- Straightforward estimates of the density are bumpy
- Instead, Parzen’s smooth estimate is preferred:
Normally, Gaussian kernels are used

Data Mining and Statistical Learning - 2008

Using the idea of basis expansion, we treat kernel functions as basis functions:

where ξj –prototype parameter, λj-scale parameter

Data Mining and Statistical Learning - 2008

Choosing the parameters:

- Estimate {λj,ξj} separately from βj (often by using the distribution of X alone) and solve least-squares.

Data Mining and Statistical Learning - 2008