1 / 102

CS 59000 Statistical Machine learning Lecture 3

CS 59000 Statistical Machine learning Lecture 3. Yuan (Alan) Qi (alanqi@cs.purdue.edu) Sept. 2008. Review: Bayes’ Theorem. posterior  likelihood × prior. The Multivariate Gaussian. Maximum (Log) Likelihood. Maximum Likelihood for Regression.

frye
Download Presentation

CS 59000 Statistical Machine learning Lecture 3

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CS 59000 Statistical Machine learningLecture 3 Yuan (Alan) Qi (alanqi@cs.purdue.edu) Sept. 2008

  2. Review: Bayes’ Theorem posterior  likelihood × prior

  3. The Multivariate Gaussian

  4. Maximum (Log) Likelihood

  5. Maximum Likelihood for Regression Determine by minimizing sum-of-squares error, .

  6. Predictive Distribution by ML

  7. MAP: A Step towards Bayes Determine by minimizing regularized sum-of-squares error, .

  8. Bayesian Curve Fitting

  9. Bayesian Predictive (Posterior) Distribution

  10. Decision Theory Inference step Determine either or . Decision step For given x, determine optimal t.

  11. Minimum Misclassification Rate

  12. Minimum Expected Loss Regions are chosen to minimize

  13. The Squared Loss Function Minimize

  14. (Differential) Entropy For discrete random variables For continuous random variables • Important quantity in • coding theory • statistical physics • machine learning • Measure randomness

  15. The Kullback-Leibler Divergence

  16. Conditional Entropy & Mutual Information

  17. Parametric Distributions Basic building blocks: Need to determine given Representation: or ? Recall Curve Fitting

  18. Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution

  19. Binary Variables (2) N coin flips: Binomial Distribution

  20. Binomial Distribution

  21. ML Parameter Estimation for Bernoulli (1) Given:

  22. ML Parameter Estimation for Bernoulli (2) Example: Prediction: all future tosses will land heads up Overfitting to D

  23. Beta Distribution Distribution over .

  24. Beta Distribution

  25. Bayesian Bernoulli The Beta distribution provides the conjugate prior for the Bernoulli distribution.

  26. Prior ∙ Likelihood = Posterior

  27. Properties of the Posterior As the size of the data set, N , increase

  28. Prediction under the Posterior What is the probability that the next coin toss will land heads up? Predictive posterior distribution

  29. Multinomial Variables 1-of-K coding scheme:

  30. ML Parameter estimation Given: Ensure , use a Lagrange multiplier, ¸.

  31. The Multinomial Distribution

  32. The Dirichlet Distribution Conjugate prior for the multinomial distribution.

  33. Bayesian Multinomial (1)

  34. The Gaussian Distribution

  35. Central Limit Theorem The distribution of the sum of N i.i.d. random variables becomes increasingly Gaussian as N grows. Example: N uniform [0,1] random variables.

  36. Geometry of the Multivariate Gaussian

  37. Moments of the Multivariate Gaussian (1) thanks to anti-symmetry of z

  38. Moments of the Multivariate Gaussian (2)

  39. Partitioned Gaussian Distributions

  40. Partitioned Conditionals and Marginals

  41. Partitioned Conditionals and Marginals

  42. Bayes’ Theorem for Gaussian Variables Given we have where

  43. Maximum Likelihood for the Gaussian (1) Given i.i.d. data , the log likeli-hood function is given by Sufficient statistics

  44. Maximum Likelihood for the Gaussian (2) Set the derivative of the log likelihood function to zero, and solve to obtain Similarly

  45. Maximum Likelihood for the Gaussian (3) Under the true distribution Hence define

  46. End

  47. Sequential Estimation Contribution of the Nth data point, xN correction given xN correction weight old estimate

  48. The Robbins-Monro Algorithm (1) Consider µ and z governed by p(z,µ) and define the regression function Seek µ? such that f(µ?) = 0.

  49. The Robbins-Monro Algorithm (2) Assume we are given samples from p(z,µ), one at the time.

More Related