1 / 34

Machine Learning Unit 5, Introduction to Artificial Intelligence, Stanford online course

Machine Learning Unit 5, Introduction to Artificial Intelligence, Stanford online course. Made by: Maor Levy, Temple University 2012. The unsupervised learning problem. Many data points, no labels. Unsupervised Learning?. Google Street View. K-Means. Many data points, no labels.

chun
Download Presentation

Machine Learning Unit 5, Introduction to Artificial Intelligence, Stanford online course

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine LearningUnit 5,Introduction to Artificial Intelligence, Stanford online course Made by: Maor Levy, Temple University 2012

  2. The unsupervised learning problem Many data points, no labels

  3. Unsupervised Learning? • Google Street View

  4. K-Means Many data points, no labels

  5. K-Means • Choose a fixed number of clusters • Choose cluster centers and point-cluster allocations to minimize error • can’t do this by exhaustive search, because there are too many possible allocations. • Algorithm • fix cluster centers; allocate points to closest cluster • fix allocation; compute best cluster centers • x could be any set of features for which we can compute a distance (careful about scaling) * From Marc Pollefeys COMP 256 2003

  6. K-Means

  7. K-Means * From Marc Pollefeys COMP 256 2003

  8. Results of K-Means Clustering: Image Clusters on intensity Clusters on color K-means clustering using intensity alone and color alone * From Marc Pollefeys COMP 256 2003

  9. K-Means • Is an approximation to EM • Model (hypothesis space): Mixture of N Gaussians • Latent variables: Correspondence of data and Gaussians • We notice: • Given the mixture model, it’s easy to calculate the correspondence • Given the correspondence it’s easy to estimate the mixture models

  10. Expectation Maximzation: Idea • Data generated from mixture of Gaussians • Latent variables: Correspondence between Data Items and Gaussians

  11. Generalized K-Means (EM)

  12. Gaussians

  13. ML Fitting Gaussians

  14. E-Step Learning a Gaussian Mixture(with known covariance) M-Step

  15. Expectation Maximization • Converges! • Proof [Neal/Hinton, McLachlan/Krishnan]: • E/M step does not decrease data likelihood • Converges at local minimum or saddle point • But subject to local minima

  16. EM Clustering: Results http://www.ece.neu.edu/groups/rpl/kmeans/

  17. Practical EM • Number of Clusters unknown • Suffers (badly) from local minima • Algorithm: • Start new cluster center if many points “unexplained” • Kill cluster center that doesn’t contribute • (Use AIC/BIC criterion for all this, if you want to be formal)

  18. Spectral Clustering

  19. The Two Spiral Problem

  20. Spectral Clustering: Overview Data Similarities Block-Detection * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  21. Eigenvectors and Blocks l2= 2 l3= 0 l1= 2 l4= 0 • Block matrices have block eigenvectors: • Near-block matrices have near-block eigenvectors: [Ng et al., NIPS 02] eigensolver l3= -0.02 l1= 2.02 l2= 2.02 l4= -0.02 eigensolver * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  22. Spectral Space e1 • Can put items into blocks by eigenvectors: • Resulting clusters independent of row ordering: e2 e1 e2 e1 e2 e1 e2 * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  23. The Spectral Advantage • The key advantage of spectral clustering is the spectral space representation: * Slides from Dan Klein, Sep Kamvar, Chris Manning, Natural Language Group Stanford University

  24. Measuring Affinity Intensity Distance Texture * From Marc Pollefeys COMP 256 2003

  25. Scale affects affinity * From Marc Pollefeys COMP 256 2003

  26. * From Marc Pollefeys COMP 256 2003

  27. The Space of Digits (in 2D) M. Brand, MERL

  28. Dimensionality Reduction with PCA

  29. Linear: Principal Components • Fit multivariate Gaussian • Compute eigenvectors of Covariance • Project onto eigenvectors with largest eigenvalues

  30. Other examples of unsupervised learning Mean face (after alignment) Slide credit: Santiago Serrano

  31. Eigenfaces Slide credit: Santiago Serrano

  32. Non-Linear Techniques • Isomap • Local Linear Embedding Isomap, Science, M. Balasubmaranian and E. Schwartz

  33. Scape (Drago Anguelov et al)

  34. References: • Sebastian Thrun and Peter Norvig, Artificial Intelligence, Stanford University http://www.stanford.edu/class/cs221/notes/cs221-lecture6-fall11.pdf

More Related