1 / 31

Unsupervised Learning

Unsupervised Learning. Supervised learning vs. unsupervised learning. Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm . Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm . Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm .

dooley
Download Presentation

Unsupervised Learning

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Learning

  2. Supervised learning vs. unsupervised learning

  3. Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

  4. Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

  5. Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

  6. Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

  7. Adapted from Andrew Moore, http://www.autonlab.org/tutorials/gmm

  8. K-means clustering algorithm Input: k, D; Choose k points as initial centroids (cluster centers); Repeat the following until the stopping criterion is met: For each data point xD do compute the distance from x to each centroid; assign x to the closest centroid; Re-compute centroids as means of current cluster memberships Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

  9. Demo http://home.dei.polimi.it/matteucc/Clustering/tutorial_html/AppletKM.html

  10. Stopping/convergence criterion • no (or minimum) re-assignments of data points to different clusters, • no (or minimum) change of centroids, or • minimum decrease in the sum of squared error (SSE), Ciis the jth cluster, mj is the centroid of cluster Cj (the mean vector of all the data points in Cj), and dist(x, mj) is the distance between data point x and centroidmj. (1) Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

  11. Example distance functions • Let xi = (ai1, ...,ain) andxj = (aj1, ...,ajn) • Euclidean distance: • Manhattan (city block) distance CS583, Bing Liu, UIC

  12. Distance function for text documents • A text document consists of a sequence of sentences and each sentence consists of a sequence of words. • To simplify: a document is usually considered a “bag” of words in document clustering. • Sequence and position of words are ignored. • A document is represented with a vector just like a normal data point. • Distance between two documents is the cosine of the angle between their corresponding feature vectors. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt

  13. Example from http://arbesman.net/blog/2011/03/24/clustering-map-of-biomedical-articles Clustering Map of Biomedical Articles

  14. Example: Image segmentation by k-means clustering by color From http://vitroz.com/Documents/Image%20Segmentation.pdf K=5, RGB space

  15. K=10, RGB space

  16. K=5, RGB space

  17. K=10, RGB space

  18. K=5, RGB space

  19. K=10, RGB space

  20. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means

  21. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means • The algorithm is only applicable if the mean is defined. • For categorical data, k-mode - the centroid is represented by most frequent values.

  22. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means • The algorithm is only applicable if the mean is defined. • For categorical data, k-mode - the centroid is represented by most frequent values. • The user needs to specify k.

  23. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means • The algorithm is only applicable if the mean is defined. • For categorical data, k-mode - the centroid is represented by most frequent values. • The user needs to specify k. • The algorithm is sensitive to outliers • Outliers are data points that are very far away from other data points. • Outliers could be errors in the data recording or some special data points with very different values.

  24. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means • The algorithm is only applicable if the mean is defined. • For categorical data, k-mode - the centroid is represented by most frequent values. • The user needs to specify k. • The algorithm is sensitive to outliers • Outliers are data points that are very far away from other data points. • Outliers could be errors in the data recording or some special data points with very different values. • k-means is sensitive to initial random centroids

  25. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means: Problems with outliers CS583, Bing Liu, UIC

  26. How to deal with outliers/noise in clustering?

  27. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Dealing with outliers • One method is to remove some data points in the clustering process that are much further away from the centroids than other data points. • To be safe, we may want to monitor these possible outliers over a few iterations and then decide to remove them. • Another method is to perform random sampling. Since in sampling we only choose a small subset of the data points, the chance of selecting an outlier is very small. • Assign the rest of the data points to the clusters by distance or similarity comparison, or classification CS583, Bing Liu, UIC

  28. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means (cont …) • The algorithm is sensitive to initial seeds. + + CS583, Bing Liu, UIC

  29. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means (cont …) • If we use different seeds: good results There are some methods to help choose good seeds + + CS583, Bing Liu, UIC

  30. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt Weaknesses of k-means (cont …) • The k-means algorithm is not suitable for discovering clusters that are not hyper-ellipsoids (or hyper-spheres). + CS583, Bing Liu, UIC

  31. Adapted from Bing Liu, UIC http://www.cs.uic.edu/~liub/teach/cs583-fall-05/CS583-unsupervised-learning.ppt k-means summary • Despite weaknesses, k-means is still the most popular algorithm due to its simplicity, efficiency and • other clustering algorithms have their own lists of weaknesses. • No clear evidence that any other clustering algorithm performs better in general • although they may be more suitable for some specific types of data or applications. CS583, Bing Liu, UIC

More Related