Loading in 5 sec....

Agglomerative clustering (AC)PowerPoint Presentation

Agglomerative clustering (AC)

- By
**luigi** - Follow User

- 119 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Agglomerative clustering (AC)' - luigi

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Agglomerative clustering (AC)

Clustering algorithms: Part 2c

- Pasi Fränti
- 25.3.2014
- Speech & Image Processing Unit
- School of Computing
- University of Eastern Finland
- Joensuu, FINLAND

Agglomerative clusteringCategorization by cost function

Single link

- Minimize distance of nearest vectors
Complete link

- Minimize distance of two furthest vectors
Ward’s method

- Minimize mean square error
- In Vector Quantization, known as Pairwise Nearest Neighbor (PNN) method

We focus on this

Pseudo code

- PNN(X, M) → C, P
- FOR i←1 TO N DO
- p[i]←i; c[i]←x[i];

- REPEAT
- a,b ← FindSmallestMergeCost();
- MergeClusters(a,b);
- m←m-1;

- UNTIL m=M;

- FOR i←1 TO N DO

O(N)

O(N2)

N times

T(N) = O(N3)

Ward’s method[Ward 1963: Journal of American Statistical Association]

Merge cost:

Local optimization strategy:

Nearest neighbor search:

- Find the cluster pair to be merged
- Update of NN pointers

Example - 25 Clusters

MSE ≈ 1.01*109

Example - 24 Clusters

MSE ≈ 1.03*109

Example - 23 Clusters

MSE ≈ 1.06*109

Example - 22 Clusters

MSE ≈ 1.09*109

Example - 21 Clusters

MSE ≈ 1.12*109

Example - 20 Clusters

MSE ≈ 1.16*109

Example - 19 Clusters

MSE ≈ 1.19*109

Example - 18 Clusters

MSE ≈ 1.23*109

Example - 17 Clusters

MSE ≈ 1.26*109

Example - 16 Clusters

MSE ≈ 1.30*109

Example - 15 Clusters

MSE ≈ 1.34*109

Storing distance matrix

- Maintain the distance matrix and update rows for the changed cluster only!
- Number of distance calculations reduces from O(N2) to O(N) for each step.
- Search of the minimum pair still requires O(N2) time still O(N3) in total.
- It also requires O(N2) memory.

Heap structure for fast search[Kurita 1991: Pattern Recognition]

- Search reduces O(N2) O(logN).
- In total: O(N2 logN)

Store nearest neighbor (NN) pointers[Fränti et al., 2000: IEEE Trans. Image Processing]

Time complexity reduces to O(N 3) Ω (N 2)

Pseudo code

- PNN(X, M) → C, P
- FOR i←1 TO N DO
- p[i]←i; c[i]←x[i];

- FOR i←1 TO N DO
- NN[i]← FindNearestCluster(i);

- REPEAT
- a ← SmallestMergeCost(NN);
- b ← NN[i];
- MergeClusters(C,P,NN,a,b,);
- UpdatePointers(C,NN);

- UNTIL m=M;

- FOR i←1 TO N DO

O(N)

O(N2)

O(N)

O(N)

http://cs.uef.fi/pages/franti/research/pnn.txt

Example with NN pointers[Virmajoki 2004: Pairwise Nearest Neighbor Method Revisited ]

ExampleStep 1

ExampleStep 2

ExampleStep 3

ExampleStep 4

ExampleFinal

Processing time comparison

With NN pointers

Algorithm:Lazy-PNN

T. Kaukoranta, P. Fränti and O. Nevalainen, "Vector quantization by lazy pairwise nearest neighbor method", Optical Engineering, 38 (11), 1862-1868, November 1999

Monotony property of merge cost [Kaukoranta et al., Optical Engineering, 1999]

Merge costs values are monotonically increasing:

d(Sa, Sb) d(Sa, Sc) d(Sb, Sc)

d(Sa, Sc) d(Sa+b, Sc)

Lazy variant of the PNN

- Store merge costs in heap.
- Update merge cost value only when it appears at top of the heap.
- Processing time reduces about 35%.

Combining PNN and K-means

K-means

Algorithm:Iterative shrinking

P. Fränti and O. Virmajoki “Iterative shrinking method for clustering problems“Pattern Recognition, 39 (5), 761-765, May 2006.

Agglomerative clustering based on merging

Agglomeration based on cluster removal[Fränti and Virmajoki, Pattern Recognition, 2006]

Complexity analysis

Number of vectors per cluster:

If we iterate until M=1:

Adding the processing time per vector:

Algorithm:PNN with kNN-graph

P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph". IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006

Agglomerative clustering with kNN graph

Merging a and b

Effect on calculationsnumber of steps

Processing time as function of k(number of neighbors in graph)

-PNN (229 s)

Trivial-PNN (>9999 s)

Graph-PNN (1)

MSE = 5.36

Graph-PNN (2)

- Graph created by MSP
- Graph created by D-n-C

- Simple to implement, good clustering quality
- Straightforward algorithm slow O(N3)
- Fast exact (yet simple) algorithm O(τN2)
- Beyond this possible:
- O(τ∙N∙logN) complexity
- Complicated graph data structure
- Compromizes the exactness of the merge

Literature

- P. Fränti, T. Kaukoranta, D.-F. Shen and K.-S. Chang, "Fast and memory efficient implementation of the exact PNN", IEEE Trans. on Image Processing, 9 (5), 773-777, May 2000.
- P. Fränti, O. Virmajoki and V. Hautamäki, "Fast agglomerative clustering using a k-nearest neighbor graph". IEEE Trans. on Pattern Analysis and Machine Intelligence, 28 (11), 1875-1881, November 2006.
- P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), 761-765, May 2006.
- T. Kaukoranta, P. Fränti and O. Nevalainen, "Vector quantization by lazy pairwise nearest neighbor method", Optical Engineering, 38 (11), 1862-1868, November 1999.
- T. Kurita, "An efficient agglomerative clustering algorithm using a heap", Pattern Recognition 24 (3) (1991) 205-209.

Literature

- J. Shanbehzadeh and P.O. Ogunbona, "On the computational complexity of the LBG and PNN algorithms". IEEE Transactions on Image Processing6 (4), 614‑616, April 1997.
- O. Virmajoki, P. Fränti and T. Kaukoranta, "Practical methods for speeding-up the pairwise nearest neighbor method ", Optical Engineering, 40 (11), 2495-2504, November 2001.
- O. Virmajoki and P. Fränti, "Fast pairwise nearest neighbor based algorithm for multilevel thresholding", Journal of Electronic Imaging, 12 (4), 648-659, October 2003.
- O. Virmajoki, Pairwise Nearest Neighbor Method Revisited, PhD thesis, Computer Science, University of Joensuu, 2004.
- J.H. Ward, Hierarchical grouping to optimize an objective function, J. Amer. Statist.Assoc. 58 (1963) 236-244.

Download Presentation

Connecting to Server..