CS 59000 Statistical Machine learning Lecture 13

117 Views

Download Presentation
## CS 59000 Statistical Machine learning Lecture 13

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**CS 59000 Statistical Machine learningLecture 13**Yuan (Alan) Qi Purdue CS Oct. 8 2008**Outline**Review of kernel trick, kernel ridge regression and kernel Principle Component Analysis Gaussian processes (GPs) From linear regression to GP GP for regression**Kernel Trick**1. Reformulate an algorithm such that input vector enters only in the form of inner product . 2. Replace input x by its feature mapping: 3. Replace the inner product by a Kernel function: Examples: Kernel PCA, Kernel Fisher discriminant, Support Vector Machines**Dual Representation for Ridge Regression**Dual variables:**Kernel Ridge Regression**Using kernel trick: Minimize over dual variables:**Generate Kernel Matrix**Positive semidefinite Consider Gaussian kernel:**Principle Component Analysis (PCA)**Assume We have is a normalized eigenvector:**Feature Mapping**Eigen-problem in feature space**Dual Variables**Suppose , we have**Eigen-problem in Feature Space (2)**Normalization condition: Projection coefficient:**General Case for Non-zero Mean Case**Kernel Matrix:**Gaussian Processes**How kernels arise naturally in a Bayesian setting? Instead of assigning a prior on parameters w, we assign a prior on function value y. Infinite space in theory Finite space in practice (finite number of training set and test set)**Linear Regression Revisited**Let We have**From Prior on Parameter to Prior on Function**The prior on function value:**Stochastic Process**A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)**Gaussian Processes**The joint distribution of any variables is a multivariable Gaussian distribution. Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :**Impact of Kernel Function**Covariance matrix : kernel function Application economics & finance**Gaussian Process for Regression**Likelihood: Prior: Marginal distribution:**Predictive Distribution**is a Gaussian distribution with mean and variance:**Predictive Mean**We see the same form as kernel ridge regression and kernel PCA.**GP Regression**Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?