1 / 19

Kernel Methods

Kernel Methods. Jong Cheol Jeong. Out line. 6.1 One-Dimensional Kernel Smoothers 6.1.1 Local Linear Regression 6.1.2 Local Polynomial Regression 6.2 Selecting the Width of Kernel 6.3 Local Regression in R p 6.4 Structured Local Regression Models in R p

xuan
Download Presentation

Kernel Methods

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernel Methods Jong Cheol Jeong

  2. Out line • 6.1 One-Dimensional Kernel Smoothers • 6.1.1 Local Linear Regression • 6.1.2 Local Polynomial Regression • 6.2 Selecting the Width of Kernel • 6.3 Local Regression in Rp • 6.4 Structured Local Regression Models in Rp • 6.5 Local Likelihood and Other Models • 6.6 Kernel Density Estimation and Classification • 6.7Radial Basis Functions and Kernels • 6.8 Mixture Models for Density Estimation and Classification

  3. Kernel Function:the kernel function is a weighting function by assigning weights to the nearby data points in making an estimate.

  4. One-Dimensional Kernel Smoothers K-nearest-neighbor (6.1)

  5. One-Dimensional Kernel Smoothers Nadaraya-Watson kernel-weighted average (6.2) With the Epanechnikov quadratic kernel (6.3) (6.4)

  6. One-Dimensional Kernel Smoothers Adaptive neighborhoods with kernels (6.5) X[k] is the kth closest xi to x0 Tri-cube (6.6)

  7. One-Dimensional Kernel Smoothers Nearest-Neighbor kernel Vs. Epanechnikov kernel

  8. Local Linear Regression

  9. Local Linear Regression Locally weighted linear regression (6.7) Estimate function with equivalent kernel (6.8) (6.9)

  10. Local Polynomial Regression Local quadratic regression (6.11) Trimming the hills and filling the valleys

  11. Local Polynomial Regression Bias-variance tradeoff in selecting the polynomial degree

  12. Selecting the Width of Kernel Bias-variance tradeoff in selecting the width • The window is narrow then its variance will be relatively large, and the bias will tend to be small • The window is wide then its variance will be relatively small, and the bias will tend to be higher

  13. Local Regression in Rp Local regression in p-dimension (6.12) (6.13) D can be radial function or tri-cube function

  14. Structured Local Regression Models in Rp When the dimension to sample-size ratio is unfavorable, local regression does not help us much, unless we are willing to make some structural assumptions about the model - Downgrading or omitting coordinates can reduce the error Equation 6.13 gives equal weight to each coordinate, so we can modify the Kernel in order to control the weight on each coordinate Structured kernels (6.14)

  15. Structured Regression functions Fitting a regression function: considering every labels of interaction ANOVA decompositions: a statistical idea of analyzing the variances between different variables and find certain dependencies on subset of variables (6.15) Eliminating some of higher-order terms

  16. Structured Regression functions Varying coefficient models: a special case of structured model Dividing the p predictors in X (6.16) Constructing a linear model for given Z Regression model by locally weighted least squares (6.17)

  17. Questions • Section 6.2 details how we may select the optimal lambda parameter for a kernel.  How do we select the optimal kernel function?  Are there kernels that tend to outperform others in most cases?  If not, are there ways to determine a kernel that may perform well without doing an experiment?

  18. Questions • One benefit of using kernels with SVM's is that we can expand the dimensionality of the dataset and make it more likely to find a separating hyperplane with a hard margin.  But section 6.3 says that for local regression, the proportion of points on the boundary increases to 1 as the dimensionality increases.  Thus, the predictions we make will have even more bias.  Is there a compromise solution that will work, or is the kernel trick best applied in classification problems?

  19. Questions?

More Related