CS 59000 Statistical Machine learning Lecture 13 - PowerPoint PPT Presentation

carlo
cs 59000 statistical machine learning lecture 13 n.
Skip this Video
Loading SlideShow in 5 Seconds..
CS 59000 Statistical Machine learning Lecture 13 PowerPoint Presentation
Download Presentation
CS 59000 Statistical Machine learning Lecture 13

play fullscreen
1 / 25
Download Presentation
CS 59000 Statistical Machine learning Lecture 13
117 Views
Download Presentation

CS 59000 Statistical Machine learning Lecture 13

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. CS 59000 Statistical Machine learningLecture 13 Yuan (Alan) Qi Purdue CS Oct. 8 2008

  2. Outline Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis Gaussian processes (GPs) From linear regression to GP GP for regression

  3. Kernel Trick 1. Reformulate an algorithm such that input vector enters only in the form of inner product . 2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel function: Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector Machines

  4. Dual Representation for Ridge Regression Dual variables:

  5. Kernel Ridge Regression Using kernel trick: Minimize over dual variables:

  6. Generate Kernel Matrix Positive semidefinite Consider Gaussian kernel:

  7. Principle Component Analysis (PCA) Assume We have is a normalized eigenvector:

  8. Feature Mapping Eigen-problem in feature space

  9. Dual Variables Suppose , we have

  10. Eigen-problem in Feature Space (1)

  11. Eigen-problem in Feature Space (2) Normalization condition: Projection coefficient:

  12. General Case for Non-zero Mean Case Kernel Matrix:

  13. Gaussian Processes How kernels arise naturally in a Bayesian setting? Instead of assigning a prior on parameters w, we assign a prior on function value y. Infinite space in theory Finite space in practice (finite number of training set and test set)

  14. Linear Regression Revisited Let We have

  15. From Prior on Parameter to Prior on Function The prior on function value:

  16. Stochastic Process A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)

  17. Gaussian Processes The joint distribution of any variables is a multivariable Gaussian distribution. Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :

  18. Impact of Kernel Function Covariance matrix : kernel function Application economics & finance

  19. Gaussian Process for Regression Likelihood: Prior: Marginal distribution:

  20. Samples of GP Prior over Functions

  21. Samples of Data Points

  22. Predictive Distribution is a Gaussian distribution with mean and variance:

  23. Predictive Mean We see the same form as kernel ridge regression and kernel PCA.

  24. GP Regression Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?

  25. Marginal Distribution of Target Values