800 likes | 974 Views
Swap-based algorithms. Clustering Methods: Part 2d. Pasi Fränti 31.3.2014 Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu, FINLAND. Part I: Random Swap algorithm.
E N D
Swap-based algorithms Clustering Methods: Part 2d • Pasi Fränti • 31.3.2014 • Speech & Image Processing Unit • School of Computing • University of Eastern Finland • Joensuu, FINLAND
Part I:Random Swap algorithm P. Fränti and J. KivijärviRandomised local search algorithm for the clustering problem Pattern Analysis and Applications, 3 (4), 358-369, 2000.
Implementation of the swap 1. Random swap: 2. Re-partition vectors from old cluster: 3. Create new cluster:
Random swap as local search Study neighbor solutions
Random swap as local search Select one and move
Role of K-means Fine-tune solution by hill-climbing technique!
Role of K-means Consider only local optima!
Role of swap: reduce search space Effective search space
Independency of initialization Results for T = 5000 iterations Worst Initial Best Initial Initial
Probability of good swap • Select a proper centroid for removal: • There are M clusters in total: premoval=1/M. • Select a proper new location: • There are N choices: padd=1/N • Only M are significantly different: padd=1/M • In total: • M2significantly different swaps. • Probability of each different swap is pswap=1/M2 • Open question: how many of these are good?
Number of neighbors Open question: what is the size of neighborhood ()? Voronoi neighbors Neighbors by distance
Expected number of iterations • Probability of not finding good swap: • Estimated number of iterations:
Estimated number of iterationsdepending on T Observed = Number of iterations needed in practice. Estimated = Estimate of the number of iterations needed for given q S1 S2 S3 S4
Bounds for the number of iterations Upper limit: Lower limit similarly; resulting in:
Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations:
Efficiency of the random swap Total time to find correct clustering: • Time per iteration Number of iterations Time complexity of a single step: • Swap: O(1) • Remove cluster: 2MN/M = O(N) • Add cluster: 2N = O(N) • Centroids: 2(2N/M) + 2 + 2 = O(N/M) • (Fast) K-means iteration: 4N = O(N)* *See Fast K-means for analysis.
Total time complexity Time complexity of a single step (t): t = O(αN) Number of iterations needed (T): Total time:
Time complexity: conclusions • Logarithmic dependency on q • Linear dependency on N • Quadratic dependency on M(With large number of clusters, can be too slow) • Inverse dependency on (worst case = 2) (Higher the dimensionality and higher the cluster overlap, faster the method)
References Random swap algorithm: • P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000. • P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998. Pseudo code: • http://cs.joensuu.fi/sipu/soft/ Efficiency of Random swap algorithm: • P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.
1st swap MSE = 4.2 * 109 MSE = 3.4 * 109