1 / 29

CS 59000 Statistical Machine learning Lecture 16

This lecture covers the topics of Gaussian process regression, learning hyperparameters, automatic relevance determination, Gaussian processes for classification, and support vector machines. The lecture also discusses the course projects and paper presentations.

julie
Download Presentation

CS 59000 Statistical Machine learning Lecture 16

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 59000 Statistical Machine learningLecture 16 Yuan (Alan) Qi Purdue CS Oct. 23 2008

  2. Outline Information about paper presentation and course projects Review of Gaussian processe regression Learning hyperparameters Automatic Relevance Determination GP for classification Support Vector Machine

  3. Course Projects Second last week: Paper presentation. Last week: Project presentation 21 Registered students. 6 groups: 3-4 persons per group, 20 minutes per group. 5 mins questions.

  4. Paper Presentation Each group presents one recent paper from top conferences/journals on machine learning or bioinformatics or computer vision, e.g., NIPS, ICML, UAI, RECOMB, ISMB, JMLR. Up to your choice Format: Define problem to solve and describe challenges The algorithm/model in a nutshell. Highlight the essence Results Discussion: Strength and weakness of this paper

  5. Project Topics Anything related to the course materials: new methods, theoretical proofs, and applications. Novelty is appreciated. New algorithms or applications, proof for unanswered questions.

  6. Review: Gaussian Process for Regression Likelihood: Prior: Marginal distribution:

  7. Predictive Distribution is a Gaussian distribution with mean and variance:

  8. Computational Complexity GP prediction for a new data point: GP: O(N3) where N is number of data points Basis function model: O(M3) where M is the dimension of the feature expansion When N is large: computationally expensive. Sparsification: make prediction based on only a few data points (essentially make N small)

  9. Learning Hyperparameters Empirical Bayes Methods

  10. Automatic Relevance Determination Consider two-dimensional problems: Maximizing the marginal likelihood will make certain small, reducing its relevance to prediction.

  11. Gaussian Processes for Classification Likelihood: GP Prior: Covariance function:

  12. Sample from GP Prior

  13. Predictive Distribution No analytical solution. Approximate this integration: Laplace’s method Variational Bayes Expectation propagation

  14. Laplace’s method for GP Classification (1)

  15. Laplace’s method for GP Classification (2) Taylor expansion:

  16. Laplace’s method for GP Classification (3) Newton-Raphson update:

  17. Laplace’s method for GP Classification (4) Gaussian approximation:

  18. Laplace’s method for GP Classification (4) Question: How to get the mean and the variance above?

  19. Predictive Distribution

  20. Example

  21. Support Vector Machines Support Vector Machines: motivated by statistical learning theory. Maximum margin classifiers Margin: the smallest distance between the decision boundary and any of the samples

  22. Distance of Data Point to Hyperplace Consider data points that are correctly classified. The distance of a data point to the hyperplace:

  23. Maximizing Margin Since scaling w and b together will not change the above ratio, we set In the case of data points for which the equality holds, the constraints are said to be active, whereas for the remainder they are said to be inactive.

  24. Reformulating Optimization Problem Quadratic programming: Subject to

  25. Using Lagrange Multipliers

  26. Dual Variables Setting derivatives over L to zero:

  27. Dual Problem

  28. Computational Complexity Quadratic programming: When Dimension < Number of data points, Solving the Dual problem is more costly. Dual representation allows the use of kernels

  29. Prediction

More Related