1 / 24

# Kernel methods - overview - PowerPoint PPT Presentation

Kernel methods - overview. Kernel smoothers Local regression Kernel density estimation Radial basis functions. Introduction. Kernel methods are regression techniques used to estimate a response function from noisy data Properties:

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about ' Kernel methods - overview' - melanie-ashley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Kernel methods- overview

• Kernel smoothers

• Local regression

• Kernel density estimation

• Radial basis functions

Data Mining and Statistical Learning - 2008

Kernel methods are regression techniques used to estimate a response function

from noisy data

Properties:

• Different models are fitted at each query point, and only those observations close to that point are used to fit the model

• The resulting function is smooth

• The models require only a minimum of training

Data Mining and Statistical Learning - 2008

where

Data Mining and Statistical Learning - 2008

Kernel methods, splines and ordinary least squares regression (OLS)

• OLS: A single model is fitted to all data

• Splines: Different models are fitted to different subintervals (cuboids) of the input domain

• Kernel methods: Different models are fitted at each query point

Data Mining and Statistical Learning - 2008

Kernel-weighted averages and moving averages regression (OLS)

The Nadaraya-Watson kernel-weighted average

where  indicates the window size and the function D shows how the weights change with distance within this window

The estimated function is smooth!

K-nearest neighbours

The estimated function is piecewise constant!

Data Mining and Statistical Learning - 2008

Epanechnikov kernel regression (OLS)

Tri-cube kernel

Examples of one-dimesional kernel smoothers

Data Mining and Statistical Learning - 2008

Issues in kernel smoothing regression (OLS)

• The smoothing parameter λ has to be defined

• When there are ties at xi : Compute an average y value and introduce weights representing the number of points

• Boundary issues

• Varying density of observations:

• bias is constant

• the variance is inversely proportional to the density

Data Mining and Statistical Learning - 2008

Boundary effects of one-dimensional regression (OLS)kernel smoothers

Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression

Data Mining and Statistical Learning - 2008

Local linear regression regression (OLS)

Find the intercept and slope parameters solving

The solution is a linear combination of yi:

Data Mining and Statistical Learning - 2008

Kernel smoothing vs local linear regression regression (OLS)

Kernel smoothing

Solve the minimization problem

Local linear regression

Solve the minimization problem

Data Mining and Statistical Learning - 2008

Properties of local linear regression regression (OLS)

• Automatically modifies the kernel weights to correct for bias

• Bias depends only on the terms of order higher than one in the expansion of f.

Data Mining and Statistical Learning - 2008

Local polynomial regression regression (OLS)

• Fitting polynomials instead of straight lines

Behavior of estimated response function:

Data Mining and Statistical Learning - 2008

Polynomial vs local linear regression regression (OLS)

Advantages:

• Reduces the ”Trimming of hills and filling of valleys”

Disadvantages:

• Higher variance (tails are more wiggly)

Data Mining and Statistical Learning - 2008

Selecting the width of the kernel regression (OLS)

Bias-Variance tradeoff:

Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance.

Data Mining and Statistical Learning - 2008

Selecting the width of the kernel regression (OLS)

• Automatic selection ( cross-validation)

• Fixing the degrees of freedom

Data Mining and Statistical Learning - 2008

Local regression in regression (OLS)RP

The one-dimensional approach is easily extended to p dimensions by

• Using the Euclidian norm as a measure of distance in the kernel.

• Modifying the polynomial

Data Mining and Statistical Learning - 2008

Local regression in regression (OLS)RP

”The curse of dimensionality”

• The fraction of points close to the boundary of the input domain increases with its dimension

• Observed data do not cover the whole input domain

Data Mining and Statistical Learning - 2008

Structured local regression models regression (OLS)

Structured kernels (standardize each variable)

Note: A is positive semidefinite

Data Mining and Statistical Learning - 2008

Structured local regression models regression (OLS)

Structured regression functions

• ANOVA decompositions (e.g., additive models)

Backfitting algorithms can be used

• Varying coefficient models (partition X)

• INSERT FORMULA 6.17

Data Mining and Statistical Learning - 2008

Structured local regression models regression (OLS)

Varying coefficient

models (example)

Data Mining and Statistical Learning - 2008

Local methods regression (OLS)

• Assumption: model is locally linear ->maximize the log-likelihood locally at x0:

• Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et ->

yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt)

Data Mining and Statistical Learning - 2008

Kernel density estimation regression (OLS)

• Straightforward estimates of the density are bumpy

• Instead, Parzen’s smooth estimate is preferred:

Normally, Gaussian kernels are used

Data Mining and Statistical Learning - 2008

Radial basis functions and kernels regression (OLS)

Using the idea of basis expansion, we treat kernel functions as basis functions:

where ξj –prototype parameter, λj-scale parameter

Data Mining and Statistical Learning - 2008

Radial basis functions and kernels regression (OLS)

Choosing the parameters:

• Estimate {λj,ξj} separately from βj (often by using the distribution of X alone) and solve least-squares.

Data Mining and Statistical Learning - 2008