360 likes | 366 Views
Machine Learning. Clustering. Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means). References.
E N D
Machine Learning Clustering • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)
References • Nilsson, N. J. (1996). Introduction to machine learning. An early draft of a proposed textbook. (Chapter 9) • Marsland, S. (2014). Machine learning: an algorithmic perspective. CRC press. (Chapter 9) • Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence (Chapter 15) (Fuzzy C-Means) • …
Supervised learning Training set: => Classification: estimating the separator hyperplane
Unsupervised learning Training set: => Clustering
Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Applications of Clustering Giant Component Analysis in net Social network analysis Market segmentation Astronomical data analysis Organize computing clusters
K-means Algorithm K: number of clusters First step: random initializing for cluster centers
K-mean Algorithm Second Step: assigning cluster index to samples
K-mean Algorithm Third Step: moving the cluster centroids to the average of the samples in each cluster
K-mean Algorithm Reassigning samples
K-mean Algorithm Moving the centroid to the average
K-mean Algorithm Reassigning samples
K-mean Algorithm Moving the centroid to the average
K-mean Algorithm Reassigning samples no change!
K-means algorithm • Input: • (number of clusters) • Training set
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Moving average Cluster assignment
Distance Metrics • Euclidian distance (L2 norm): • L1 norm: • Cosine Similarity (colleration) (transform to a distance by subtracting from 1):
K-means for non-separated clusters T-shirt sizing Weight Height
Local optima K=3 K<m
Random initialization to escape the local optima For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get . Compute cost function (distortion) } Pick clustering that gave lowest cost
Optimality of clusters • Optimal clusters should • minimize distance within clusters • maximize distance between clusters • Fisher criteria
Content • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)
Choosing the value of K Sometimes, you’re running K-means to get clusters to use for some later purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g.
K-means optimization objective • = index of cluster (1,2,…, ) to which example is currently assigned • = cluster centroid ( ) • = cluster centroid of cluster to which example has been assigned Optimization objective:
K-means optimization objective Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster }
Content • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)
Hierarchical clustering: example Clustering important cities in Iran for a business purpose
Hierarchical clustering: forming clusters • Forming clusters from dendograms
Hierarchical Clustering • Given the input set S, the goal is to produce a hierarchy (dendrogram) in which nodes represent subsets of S. • Features of the tree obtained: • The root is the whole input set S. • The leaves are the individual elements of S. • The internal nodes are defined as the union of their children. • Each level of the tree represents a partition of the input data into several (nested) clusters or groups.
Hierarchical clustering • Input: a pairwise matrix involved all instances in S • Algorithm • Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L= S1, S2, S3, ..., Sn-1, Sn. • Compute a merging cost function between every pair of elements in L to find the two closest clusters {Si, Sj} which will be the cheapest couple to merge. • Remove Si and Sj from L. • Merge Si and Sj to create a new internal node Sij in T which will be the parent of Si and Sj in the resulting tree. • Go to Step 2 until there is only one set remaining.
Soft Clustering: Fuzzy C-Means • An extension of k-means • Hierarchical k-means generates partitions • each data point can only be assigned in one cluster • Soft clustering gives probabilities that an instance belongs to each of a set of clusters. • Fuzzy c-means allows data points to be assigned into more than one cluster • each data point has a degree of membership (or probability) of belonging to each cluster • Fuzzy C-Means (fcmmatlab command)