Fast pnn based clustering using k nearest neighbor graph
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Fast PNN-based Clustering Using K -nearest Neighbor Graph PowerPoint PPT Presentation


  • 73 Views
  • Uploaded on
  • Presentation posted in: General

Fast PNN-based Clustering Using K -nearest Neighbor Graph. Pasi Fränti, Olli Virmajoki and Ville Hautamäki 15.11.2003. UNIVERSITY OF JOENSUU DEPARTMENT OF COMPUTER SCIENCE FINLAND. Agglomerative clustering. N = 22 ( data vectors ) M = 3 ( final clusters ). PNN method for clustering.

Download Presentation

Fast PNN-based Clustering Using K -nearest Neighbor Graph

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fast pnn based clustering using k nearest neighbor graph

Fast PNN-based Clustering Using K-nearest Neighbor Graph

Pasi Fränti, Olli Virmajoki and Ville Hautamäki

15.11.2003

UNIVERSITY OF JOENSUU

DEPARTMENT OF COMPUTER SCIENCE

FINLAND


Agglomerative clustering

Agglomerative clustering

N = 22 ( data vectors )

M = 3 ( final clusters )


Pnn method for clustering

PNN method for clustering

Merge cost:

Local optimization strategy:


Nn search

NN search

O(N) searches

with the PNN method

O(k) searches

with the graph structure

( k=3 )


Graph based pnn

Graph-based PNN

  • Based on the exact PNN

  • Search is limited only to the clusters that are connected by the graph structure

  • Reduces the time complexity of every search from O(N) to O(k) (Example: N=4096, k=3-5)


Structure of the graph pnn

Structure of the Graph-PNN

GraphPNN(X, M)S

FOR i 1 to N DO

si {xi};

FOR DO

Find k nearest neighbors;

REPEAT

(sa, sb)  GetNearestClustersInGraph(S);

sab Merge(sa, sb);

Search the k nearest neighbors for sab;

Update the nodes that had sa and sb as neighbors;

UNTIL |S|=M;


Graph structure

Graph structure


Sample graph k 3 and k 4

Sample graph (k=3 and k=4)

(k=3)

(k=4)

Isolated component


Graph pnn double linked

Graph-PNN (double-linked)


Observed number of steps and distance calculations for bridge

(k=3)

Steps

Distance

calculations

Fast PNN

81 960 610

40 166 328

Graph-PNN simple

50 468 663

47 370

Graph-PNN

double linked

517 905

47 413

Observed number of steps and distance calculations for Bridge


Creation of nearest neighbor graph

Creation of nearest neighbor graph

  • Brute force O(N 2)

  • MPS !

  • Divide-and-conquer (to be considered)


Image datasets

Bridge (256256)

d = 16

N = 4096

M = 256

Miss America (360288)

d = 16

N = 6480

M = 256

House (256256)

d = 3

N = 34112

M =256

Image datasets


Birch datasets

BIRCH datasets

Datasets BIRCH1, BIRCH2 and BIRCH3

d = 2

N = 100 000

M = 100


Two dimensional datasets

Two-dimensional datasets

Datasets S1, S2, S3 and S4

d = 2

N = 5 000

M = 15


Run time of the graph pnn

Run time of the Graph-PNN


Quality of the graph pnn

Quality of the Graph-PNN


Time distortion performance

Time-distortion performance


Final results for set s 2

Final results for set S2


Comparison of the graph pnn k 5 with other methods

Birch datasets

BIRCH 1

BIRCH 2

BIRCH 3

Time

MSE

Time

MSE

Time

MSE

Fast PNN

Full search

> 4 h

4.73

> 4 h

2.28

> 4 h

1.96

+PDS+MPS+Lazy

2397

4.73

2115

2.28

2316

1.96

Graph-PNN + GLA

Limited search MPS

41

4.64

16

2.28

44

1.90

Comparison of the Graph-PNN (k=5) with other methods


Conclusions

Conclusions

  • Small neighborhood size (k=3-5) can produce clustering with similar quality to that of full search.

  • The number of steps and distance calculations is remarkable lower than that of the exact PNN.

  • Graph creation is the bottleneck of the algorithm.


  • Login