1 / 39

ITEC4310 Applied Artificial Intelligence

ITEC4310 Applied Artificial Intelligence. Lecture 2 Clustering. M a chine Learning. Unsupervised Learning Supervised Learning Everything is data Everything is optimization UPEI and U Havana slides Slideshare - Tilani Gunawardena. Supervised vs. Unsupervised. x 2. x 2. x 1. x 1.

shadec
Download Presentation

ITEC4310 Applied Artificial Intelligence

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ITEC4310Applied Artificial Intelligence Lecture 2 Clustering

  2. Machine Learning • Unsupervised Learning • Supervised Learning • Everything is data • Everything is optimization • UPEI and U Havana slides • Slideshare - Tilani Gunawardena

  3. Supervised vs. Unsupervised x2 x2 x1 x1 Data has labels Just the data

  4. Supervised vs. Unsupervised • We only have a set of data… without any further information • Goal: to discover “interesting structures” in the data • Data has labels • We have training examples that allow to train an algorithm • Goal: to correctly predict the class/value of a sample

  5. Unsupervised Learning • Unsupervised learning is arguably more typical of human and animal learning. • It is also more widely applicable than supervised learning, since it does not require a human expert to manually label the data. • Labeled data is not only expensive to acquire, but it also contains relatively little information, certainly not enough to reliably estimate the parameters of complex models.

  6. “When we’re learning to see, nobody’s telling us what the right answers are — we just look. Every so often, your mother says “that’s a dog”, but that’s very little information. You’d be lucky if you got a few bits of information — even one bit per second — that way. The brain’s visual system has 1014neural connections. And you only live for 109seconds. So it’s no use learning one bit per second. You need more like 105 bits per second. And there’s only one place you can get that much information: from the input itself.” — Geoffrey Hinton

  7. What is a good clustering? • A good clustering will yield clusters with • High intra-cluster similarities • Low inter-cluster similarities • The quality of the result will depend on the clustering method and the similarity measures used. • There are different ways to measure the quality of the clustering. The goal is to discover the “hidden patterns”…

  8. Aspects of clustering • Might be more than one correct answer. • You may have one or more (useful) similarities measures, e.g. Euclidean distance, Manhattan distance, Mahalanobis distance, Pearson correlation… • Clustering is not always made with real-valued vectors. • Almost never in a two-dimensional space.

  9. K-Means An iterative clustering algorithm • Pick K random points as cluster centers (means) • Alternate: • Assign data instances to closes mean • Assign each mean to the average of its assigned points • Stop when no points’ assignments change

  10. K-Means as Optimization • Consider the total distance to the means: • Each iteration reduces phi • Two stages each iteration: • Update assignments: fix means c, change assignments a • Update means: fix assignments a, change means c means points assignments

  11. Phase I: Update Assignments • For each point, re-assign to closest mean: • Can only decrease total distance phi

  12. Phase II: Update Means • Move each mean to the average of its assigned points: • Can only decrease total distance …

  13. K-Means Getting Stuck Local Optimum: Why doesn’t this work out like the earlier example? K-means require initial means… It does matter what you pick!

  14. Hierarchical Clustering • Build a tree-based hierarchical taxonomy (dendrogram)

  15. Agglomerative Clustering • Starts with all instances in a separate cluster and then repeatedly joins the two clusters that are most similar until there is only one cluster.

  16. Dendogram: Hierarchical Clustering • Clustering obtained by cutting the dendrogram at a desired level: each connected component forms a cluster

  17. Summary • For a dataset consisting of n points • O(n2) space -- requires storing the distance matrix • O(n3) time complexity • Advantages • Dendograms are great for visualization • Provides hierarchical relations between clusters

  18. Summary • Disadvantages • Not easy to define levels for clusters • Can never undo what was done previously • Sensitive to cluster distance measures and noise/outliers • Experiments showed that other clustering techniques outperform hierarchical clustering

  19. Non-convex Clusters Need another technique…

  20. Samples • Animations 1 • https://towardsdatascience.com/the-5-clustering-algorithms-data-scientists-need-to-know-a36d136ef68 • Animations 2 • https://www.youtube.com/watch?v=BVFG7fd1H30

  21. Warning • Garbage in  Garbage out • Clustering creates confirmation bias • You will find what you thought you were looking for!

More Related