Swap-based algorithms

Swap-based algorithms Clustering Methods: Part 2d • Pasi Fränti • 31.3.2014 • Speech & Image Processing Unit • School of Computing • University of Eastern Finland • Joensuu, FINLAND

Part I:Random Swap algorithm P. Fränti and J. KivijärviRandomised local search algorithm for the clustering problem Pattern Analysis and Applications, 3 (4), 358-369, 2000.

Pseudo code of Random Swap

Demonstration of the algorithm

Centroid swap

Local repartition

Fine-tuning by K-means1st iteration

Fine-tuning by K-means2nd iteration

Fine-tuning by K-means3rd iteration

Fine-tuning by K-means16th iteration

Fine-tuning by K-meansFinal result after 25 iterations

Implementation of the swap 1. Random swap: 2. Re-partition vectors from old cluster: 3. Create new cluster:

Random swap as local search Study neighbor solutions

Random swap as local search Select one and move

Role of K-means Fine-tune solution by hill-climbing technique!

Role of K-means Consider only local optima!

Role of swap: reduce search space Effective search space

Chain reaction by K-means after swap

Independency of initialization Results for T = 5000 iterations Worst Initial Best Initial Initial

Part II:Efficiency of Random Swap

Probability of good swap • Select a proper centroid for removal: • There are M clusters in total: premoval=1/M. • Select a proper new location: • There are N choices: padd=1/N • Only M are significantly different: padd=1/M • In total: • M2significantly different swaps. • Probability of each different swap is pswap=1/M2 • Open question: how many of these are good?

Number of neighbors Open question: what is the size of neighborhood ()? Voronoi neighbors Neighbors by distance

Observed number of neighborsData set S2

Average number of neighbors

Expected number of iterations • Probability of not finding good swap: • Estimated number of iterations:

Estimated number of iterationsdepending on T Observed = Number of iterations needed in practice. Estimated = Estimate of the number of iterations needed for given q S1 S2 S3 S4

Probability of success (p)depending on T

Probability of failure (q) depending on T

Observed probabilities depending on dimensionality

Bounds for the number of iterations Upper limit: Lower limit similarly; resulting in:

Multiple swaps (w) Probability for performing less than w swaps: Expected number of iterations:

Number of swaps neededExample from image quantization

Efficiency of the random swap Total time to find correct clustering: • Time per iteration  Number of iterations Time complexity of a single step: • Swap: O(1) • Remove cluster: 2MN/M = O(N) • Add cluster: 2N = O(N) • Centroids: 2(2N/M) + 2 + 2 = O(N/M) • (Fast) K-means iteration: 4N = O(N)* *See Fast K-means for analysis.

Time complexity and the observed number of steps

Time spent by K-means iterations

Effect of K-means iterations

Total time complexity Time complexity of a single step (t): t = O(αN) Number of iterations needed (T): Total time:

Time complexity: conclusions • Logarithmic dependency on q • Linear dependency on N • Quadratic dependency on M(With large number of clusters, can be too slow) • Inverse dependency on  (worst case = 2) (Higher the dimensionality and higher the cluster overlap, faster the method)

Time-distortion performance

References Random swap algorithm: • P. Fränti and J. Kivijärvi, "Randomised local search algorithm for the clustering problem", Pattern Analysis and Applications, 3 (4), 358-369, 2000. • P. Fränti, J. Kivijärvi and O. Nevalainen, "Tabu search algorithm for codebook generation in VQ", Pattern Recognition, 31 (8), 1139‑1148, August 1998. Pseudo code: • http://cs.joensuu.fi/sipu/soft/ Efficiency of Random swap algorithm: • P. Fränti, O. Virmajoki and V. Hautamäki, “Efficiency of random swap based clustering", IAPR Int. Conf. on Pattern Recognition (ICPR’08), Tampa, FL, Dec 2008.

Part III:Example when 4 swaps needed

1st swap MSE = 4.2 * 109 MSE = 3.4 * 109

Swap-based algorithms

Swap-based algorithms

Presentation Transcript

GPU-based Visualization Algorithms

15.6 Index Based Algorithms

15.6 Index Based Algorithms

SWAP

Understanding Swap

SWAP Success

Monster Swap

Cache Based Iterative Algorithms

Stream-based Geometric Algorithms

Cut-based clustering algorithms

Swap

swap

Instance-based Learning Algorithms

home swap

Swap Services Online | Swap Sites

Thought Swap

Queue-Based Algorithms