Kernel methods overview
Sponsored Links
This presentation is the property of its rightful owner.
1 / 24

Kernel methods - overview PowerPoint PPT Presentation


  • 144 Views
  • Uploaded on
  • Presentation posted in: General

Kernel methods - overview. Kernel smoothers Local regression Kernel density estimation Radial basis functions. Introduction. Kernel methods are regression techniques used to estimate a response function from noisy data Properties:

Download Presentation

Kernel methods - overview

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Kernel methods- overview

  • Kernel smoothers

  • Local regression

  • Kernel density estimation

  • Radial basis functions

Data Mining and Statistical Learning - 2008


Introduction

Kernel methods are regression techniques used to estimate a response function

from noisy data

Properties:

  • Different models are fitted at each query point, and only those observations close to that point are used to fit the model

  • The resulting function is smooth

  • The models require only a minimum of training

Data Mining and Statistical Learning - 2008


A simple one-dimensional kernel smoother

where

Data Mining and Statistical Learning - 2008


Kernel methods, splines and ordinary least squares regression (OLS)

  • OLS: A single model is fitted to all data

  • Splines: Different models are fitted to different subintervals (cuboids) of the input domain

  • Kernel methods: Different models are fitted at each query point

Data Mining and Statistical Learning - 2008


Kernel-weighted averages and moving averages

The Nadaraya-Watson kernel-weighted average

where  indicates the window size and the function D shows how the weights change with distance within this window

The estimated function is smooth!

K-nearest neighbours

The estimated function is piecewise constant!

Data Mining and Statistical Learning - 2008


Epanechnikov kernel

Tri-cube kernel

Examples of one-dimesional kernel smoothers

Data Mining and Statistical Learning - 2008


Issues in kernel smoothing

  • The smoothing parameter λ has to be defined

  • When there are ties at xi : Compute an average y value and introduce weights representing the number of points

  • Boundary issues

  • Varying density of observations:

    • bias is constant

    • the variance is inversely proportional to the density

Data Mining and Statistical Learning - 2008


Boundary effects of one-dimensionalkernel smoothers

Locally-weighted averages can be badly biased on the boundaries if the response function has a significant slope apply local linear regression

Data Mining and Statistical Learning - 2008


Local linear regression

Find the intercept and slope parameters solving

The solution is a linear combination of yi:

Data Mining and Statistical Learning - 2008


Kernel smoothing vs local linear regression

Kernel smoothing

Solve the minimization problem

Local linear regression

Solve the minimization problem

Data Mining and Statistical Learning - 2008


Properties of local linear regression

  • Automatically modifies the kernel weights to correct for bias

  • Bias depends only on the terms of order higher than one in the expansion of f.

Data Mining and Statistical Learning - 2008


Local polynomial regression

  • Fitting polynomials instead of straight lines

    Behavior of estimated response function:

Data Mining and Statistical Learning - 2008


Polynomial vs local linear regression

Advantages:

  • Reduces the ”Trimming of hills and filling of valleys”

    Disadvantages:

  • Higher variance (tails are more wiggly)

Data Mining and Statistical Learning - 2008


Selecting the width of the kernel

Bias-Variance tradeoff:

Selecting narrow window leads to high variance and low bias whilst selecting wide window leads to high bias and low variance.

Data Mining and Statistical Learning - 2008


Selecting the width of the kernel

  • Automatic selection ( cross-validation)

  • Fixing the degrees of freedom

Data Mining and Statistical Learning - 2008


Local regression in RP

The one-dimensional approach is easily extended to p dimensions by

  • Using the Euclidian norm as a measure of distance in the kernel.

  • Modifying the polynomial

Data Mining and Statistical Learning - 2008


Local regression in RP

”The curse of dimensionality”

  • The fraction of points close to the boundary of the input domain increases with its dimension

  • Observed data do not cover the whole input domain

Data Mining and Statistical Learning - 2008


Structured local regression models

Structured kernels (standardize each variable)

Note: A is positive semidefinite

Data Mining and Statistical Learning - 2008


Structured local regression models

Structured regression functions

  • ANOVA decompositions (e.g., additive models)

    Backfitting algorithms can be used

  • Varying coefficient models (partition X)

  • INSERT FORMULA 6.17

Data Mining and Statistical Learning - 2008


Structured local regression models

Varying coefficient

models (example)

Data Mining and Statistical Learning - 2008


Local methods

  • Assumption: model is locally linear ->maximize the log-likelihood locally at x0:

  • Autoregressive time series. yt=β0+β1yt-1+…+ βkyt-k+et ->

    yt=ztT β+et. Fit by local least-squares with kernel K(z0,zt)

Data Mining and Statistical Learning - 2008


Kernel density estimation

  • Straightforward estimates of the density are bumpy

  • Instead, Parzen’s smooth estimate is preferred:

    Normally, Gaussian kernels are used

Data Mining and Statistical Learning - 2008


Radial basis functions and kernels

Using the idea of basis expansion, we treat kernel functions as basis functions:

where ξj –prototype parameter, λj-scale parameter

Data Mining and Statistical Learning - 2008


Radial basis functions and kernels

Choosing the parameters:

  • Estimate {λj,ξj} separately from βj (often by using the distribution of X alone) and solve least-squares.

Data Mining and Statistical Learning - 2008


  • Login