1 / 28

CS 59000 Statistical Machine learning Lecture 15

CS 59000 Statistical Machine learning Lecture 15. Yuan (Alan) Qi Purdue CS Oct. 21 2008. Outline . Review of Gaussian Processes (GPs) From linear regression to GP GP for regression Learning hyperparameters Automatic Relevance Determination GP for classification. Gaussian Processes.

vadin
Download Presentation

CS 59000 Statistical Machine learning Lecture 15

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 59000 Statistical Machine learningLecture 15 Yuan (Alan) Qi Purdue CS Oct. 21 2008

  2. Outline Review of Gaussian Processes (GPs) From linear regression to GP GP for regression Learning hyperparameters Automatic Relevance Determination GP for classification

  3. Gaussian Processes How kernels arise naturally in a Bayesian setting? Instead of assigning a prior on parameters w, we assign a prior on function value y. Infinite space in theory Finite space in practice (finite number of training set and test set)

  4. Linear Regression Revisited Let We have

  5. From Prior on Parameter to Prior on Function The prior on function value:

  6. Stochastic Process A stochastic process is specified by giving the joint distribution for any finite set of values in a consistent manner (Loosely speaking, it means that a marginalized joint distribution is the same as the joint distribution that is defined in the subspace.)

  7. Gaussian Processes The joint distribution of any variables is a multivariable Gaussian distribution. Without any prior knowledge, we often set mean to be 0. Then the GP is specified by the covariance :

  8. Impact of Kernel Function Covariance matrix : kernel function Application economics & finance

  9. Gaussian Process for Regression Likelihood: Prior: Marginal distribution:

  10. Samples of Data Points

  11. Predictive Distribution is a Gaussian distribution with mean and variance:

  12. Predictive Mean is the nth component of We see the same form as kernel ridge regression and kernel PCA.

  13. GP Regression Discussion: the difference between GP regression and Bayesian regression with Gaussian basis functions?

  14. Computational Complexity GP prediction for a new data point: GP: O(N3) where N is number of data points Basis function model: O(M3) where M is the dimension of the feature expansion When N is large: computationally expensive. Sparsification: make prediction based on only a few data points (essentially make N small)

  15. Learning Hyperparameters Empirical Bayes Methods

  16. Automatic Relevance Determination Consider two-dimensional problems: Maximizing the marginal likelihood will make certain small, reducing its relevance to prediction.

  17. Example t = sin(2π x1) x2 = x1 +n x3 = e

  18. Gaussian Processes for Classification Likelihood: GP Prior: Covariance function:

  19. Sample from GP Prior

  20. Predictive Distribution No analytical solution. Approximate this integration: Laplace’s method Variational Bayes Expectation propagation

  21. Laplace’s method for GP Classification (1)

  22. Laplace’s method for GP Classification (2) Taylor expansion:

  23. Laplace’s method for GP Classification (3) Newton-Raphson update:

  24. Laplace’s method for GP Classification (4) Gaussian approximation:

  25. Laplace’s method for GP Classification (4) Question: How to get the mean and the variance above?

  26. Predictive Distribution

  27. Example

More Related