K-means*: Clustering by Gradual Data Transformation

K-means*: Clustering by Gradual Data Transformation Mikko Malinen and Pasi Fränti • Speech and Image Processing Unit • School of Computing • University of Eastern Finland

K-means* clustering Gradual transformation of data Fit the data to a model Model Intermediate Final Data

K-means clustering Iterate between two steps: 1. Assignment step Assign the points to the nearest centroids 2. Update step Update the location of centroids

K-means* clustering

Example of clustering (s2 dataset)

0% done

10% done

20% done

30% done

40% done

50% done

60% done

70% done

80% done

90% done

100% done

Empty clusters problem

Time Complexity

Time Complexity Fixedk-means

s1 d = 2 n = 5000 k = 15 s2 d = 2 n = 5000 k = 15 s3 d = 2 n = 5000 k = 15 s4 d = 2 n = 5000 k = 15 bridge d = 16 n = 4096 k= 256 missa d = 16 n = 6480 k= 256 house d = 3 n=34000 k=256 thyroid d = 5 n = 215 k = 2 iris d = 4 n = 150 k = 2 wine d = 13 n = 178 k = 3 Datasets

Mean square error

Mean square error vs.number of steps

Number of incorrect clusters All correct: proposed:36% k-means:14%

Number of incorrect clusters 1 incorrect: proposed:64% k-means:38%

Number of incorrect clusters 2 incorrect: proposed: 0% k-means:34%

Number of incorrect clusters 3 incorrect: proposed: 0% k-means:10%

Summary We have presented a clustering method based on gradual transformation of data and k-means. Instead of fitting the model to data, we fit the data to a model. The proposed method gives better mean square error than k-means.

K-means*: Clustering by Gradual Data Transformation

K-means*: Clustering by Gradual Data Transformation

Presentation Transcript

BIRCH: Balanced Iterative Reducing and Clustering using Hierarchies

Clustering Techniques and Applications to Image Segmentation

Scalable Data Mining

Chapter 2 Data Mining

Hinrich Schütze and Christina Lioma Lecture 16: Flat Clustering

Agrobacterium -mediated transformation of rice

Chapter 4: Unsupervised Learning

Nonlinear Models with Spatial Data

LECTURE 3 Introduction to PCA and PLS K-mean clustering Protein function prediction using network concepts Network Centr

Density-Based Clustering of Uncertain Data (KDD2005)

Advanced Methods and Analysis for the Learning and Social Sciences

The Ins and Outs of Gradual Type Inference

DATA MINING LECTURE 6

Transformation sentences 1

Indexing and Retrieval

Clustering and Pathway Analysis

Text Mining

An Introduction to WEKA

Spectral Clustering

CS590D: Data Mining Chris Clifton

Clustering Analysis