1 / 34

Kernel Regression

Kernel Regression. Prof. Bennett Math Model of Learning and Discovery 2/24/03 Based on Chapter 2 of Shawe-Taylor and Cristianini. Outline. Review Ridge Regression LS-SVM=KRR Dual Derivation Bias Issue Summary. Ridge Regression Review. Use least norm solution for fixed

Download Presentation

Kernel Regression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Kernel Regression Prof. Bennett Math Model of Learning and Discovery 2/24/03 Based on Chapter 2 of Shawe-Taylor and Cristianini

  2. Outline • Review Ridge Regression • LS-SVM=KRR • Dual Derivation • Bias Issue • Summary

  3. Ridge Regression Review • Use least norm solution for fixed • Regularized problem • Optimality Condition: Requires 0(n3) operations

  4. Dual Representation • Inverse always exists for any • Alternative representation: Solving ll equation is 0(l3)

  5. Dual Ridge Regression • To predict new point: • Note need only compute G, the Gram Matrix Ridge Regression requires only inner products between data points

  6. Linear Regression in Feature Space Key Idea: Map data to higher dimensional space (feature space) and perform linear regression in embedded space. Embedding Map:

  7. Kernel Function • A kernel is a function K such that • There are many possible kernels. Simplest is linear kernel.

  8. Ridge Regression in Feature Space • To predict new point: • To compute the Gram Matrix Use kernel to compute inner product

  9. Alternative Dual Derivation • Original math model • Equivalent math model • Construct dual using Wolfe Duality

  10. Lagrangian Function • Consider the problem • Lagrangian function is

  11. Wolfe Dual Problem • Primal • Dual

  12. Lagrangian Function • Primal • Lagrangian

  13. Wolfe Dual Problem Construct Wolfe Dual Simplify by eliminating z=

  14. Simplified Problem Get rid of z Simplify by eliminating w=X’

  15. Simplified Problem Get rid of w

  16. Optimal solution • Problem in matrix notation with G=XX’ • Solution satisfies

  17. What about Bias • If we limit regression function to f(x)=w’x means that solution must pass through origin. • Many models may require a bias or constant factor f(x)=w’x+b

  18. Eliminate Bias • One way to eliminate bias is to “center” the data Make data have mean of 0

  19. Center y Y now has sample mean of 0 Frequently good to make y have standard length:

  20. Center X • Mean X • Center X

  21. You Try • Consider data matrix with 3 points in 4 dimensions • Computer the centered by hand and with the following formula.

  22. Center (X) in Feature Space • We cannot center (X) directly in feature space. • Center G = XX’ • Works in feature space too for G in kernel space

  23. Centering Kernel Practical Computation:

  24. Ridge Regression in Feature Space • Original way • Predicted normalized y • Predicted original y

  25. Worksheet • Normalized Y • Invert to get unnormalized y

  26. Centering Test Data Calculate test data just like training data: Prediction of test data becomes:

  27. Alternate Approach • Directly add bias to the model: • Optimization problem becomes:

  28. Lagrangian Function • Consider the problem • Lagrangian function is

  29. Lagrangian Function • Primal

  30. Wolfe Dual Problem Simplify by eliminating z= and using e’ =0

  31. Simplified Problem Simplify by eliminating w=X’

  32. Simplified Problem Get rid of w

  33. New Problem to be solved • Problem in matrix notation with G=XX’ • This is a constrained optimization problem. Solution is also system of equations, but not as simple.

  34. Kernel Ridge Regression • Centered algorithm just requires centering of the kernel and solving one equation. • Can also add bias directly. • + Lots of fast equation solvers. • + Theory supports generalization • - requires full training kernel to compute  • - requires full training kernel to predict future points

More Related