Unsupervised Learning: K-means Optimization and Soft Clustering

Machine Learning Clustering • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)

References • Nilsson, N. J. (1996). Introduction to machine learning. An early draft of a proposed textbook. (Chapter 9) • Marsland, S. (2014). Machine learning: an algorithmic perspective. CRC press. (Chapter 9) • Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence (Chapter 15) (Fuzzy C-Means) • …

Supervised learning Training set: => Classification: estimating the separator hyperplane

Unsupervised learning Training set: => Clustering

Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Applications of Clustering Giant Component Analysis in net Social network analysis Market segmentation Astronomical data analysis Organize computing clusters

K-means Algorithm K: number of clusters First step: random initializing for cluster centers

K-mean Algorithm Second Step: assigning cluster index to samples

K-mean Algorithm Third Step: moving the cluster centroids to the average of the samples in each cluster

K-mean Algorithm

K-mean Algorithm Reassigning samples

K-mean Algorithm Moving the centroid to the average

K-mean Algorithm Reassigning samples

K-mean Algorithm Moving the centroid to the average

K-mean Algorithm Reassigning samples no change!

K-means algorithm • Input: • (number of clusters) • Training set

K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Moving average Cluster assignment

Distance Metrics • Euclidian distance (L2 norm): • L1 norm: • Cosine Similarity (colleration) (transform to a distance by subtracting from 1):

K-means for non-separated clusters T-shirt sizing Weight Height

Local optima K=3 K<m

Random initialization to escape the local optima For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get . Compute cost function (distortion) } Pick clustering that gave lowest cost

Optimality of clusters • Optimal clusters should • minimize distance within clusters • maximize distance between clusters • Fisher criteria

Content • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)

What is the right value of K?

Choosing the value of K

Choosing the value of K Sometimes, you’re running K-means to get clusters to use for some later purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g.

K-means optimization objective • = index of cluster (1,2,…, ) to which example is currently assigned • = cluster centroid ( ) • = cluster centroid of cluster to which example has been assigned Optimization objective:

K-means optimization objective Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster }

Content • Unsupervised Learning • K-means • Optimization objective • Random initialization • Determining Number of Clusters • Hierarchical Clustering • Soft Clustering (Fuzzy C-Means)

Hierarchical clustering: example Clustering important cities in Iran for a business purpose

Hierarchical clustering: example

Hierarchical Clustering: Dendogram

Hierarchical clustering: forming clusters • Forming clusters from dendograms

Hierarchical Clustering • Given the input set S, the goal is to produce a hierarchy (dendrogram) in which nodes represent subsets of S. • Features of the tree obtained: • The root is the whole input set S. • The leaves are the individual elements of S. • The internal nodes are defined as the union of their children. • Each level of the tree represents a partition of the input data into several (nested) clusters or groups.

Hierarchical clustering • Input: a pairwise matrix involved all instances in S • Algorithm • Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L= S1, S2, S3, ..., Sn-1, Sn. • Compute a merging cost function between every pair of elements in L to find the two closest clusters {Si, Sj} which will be the cheapest couple to merge. • Remove Si and Sj from L. • Merge Si and Sj to create a new internal node Sij in T which will be the parent of Si and Sj in the resulting tree. • Go to Step 2 until there is only one set remaining.

Soft Clustering: Fuzzy C-Means • An extension of k-means • Hierarchical k-means generates partitions • each data point can only be assigned in one cluster • Soft clustering gives probabilities that an instance belongs to each of a set of clusters. • Fuzzy c-means allows data points to be assigned into more than one cluster • each data point has a degree of membership (or probability) of belonging to each cluster • Fuzzy C-Means (fcmmatlab command)

Soft Clustering: Fuzzy C-Means

Unsupervised Learning: K-means Optimization and Soft Clustering

Unsupervised Learning: K-means Optimization and Soft Clustering

Presentation Transcript

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering: Partition Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering

Clustering