CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling - PowerPoint PPT Presentation

Chameleon a hierarchical clustering algorithm using dynamic modeling l.jpg
Download
1 / 29

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Outline Motivation Objective Research restrict Literature review An overview of related clustering algorithms The limitations of clustering algorithms CHAMELEON Concluding remarks Personal opinion Motivation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Chameleon a hierarchical clustering algorithm using dynamic modeling l.jpg

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling


Outline l.jpg

Outline

  • Motivation

  • Objective

  • Research restrict

  • Literature review

    • An overview of related clustering algorithms

    • The limitations of clustering algorithms

  • CHAMELEON

  • Concluding remarks

  • Personal opinion


Motivation l.jpg

Motivation

  • Existing clustering algorithms can breakdown

    • Choice of parameters is incorrect

    • Model is not adequate to capture the characteristics of clusters

    • Diverse shapes, densities, and sizes


Objective l.jpg

Objective

  • Presenting a novel hierarchical clustering algorithm – CHAMELEON

    • Facilitating discovery of natural and homogeneous

    • Being applicable to all types of data


Research restrict l.jpg

Research Restrict

  • In this paper, authors ignored the issue of scaling to large data sets that cannot fit in the main memory


Literature review l.jpg

Literature Review

  • Clustering

  • An overview of related clustering algorithms

  • The limitations of the recently proposed state of the art clustering algorithms


Clustering l.jpg

Clustering

  • The intracluster similarity is maximized and the intercluster similarity is minimized [Jain and Dubes, 1988]

  • Serving as the foundation for data mining and analysis techniques


Clustering cont d l.jpg

Clustering(cont’d)

  • Applications

    • Purchasing patterns

    • Categorization of documents on WWW [Boley, et al., 1999]

    • Grouping of genes and proteins that have similar functionality[Harris, et al., 1992]

    • Grouping if spatial locations prone to earth quakes[Byers and Adrian, 1998]


An overview of related clustering algorithms l.jpg

An Overview of Related Clustering Algorithms

  • Partitional techniques

  • Hierarchical techniques


Partitional techniques l.jpg

Partitional Techniques

  • K means[Jain and Dubes, 1988]


Hierarchical techniques l.jpg

Hierarchical Techniques

  • CURE [Guha, Rastogi and Shim, 1998]

  • ROCK [Guha, Rastogi and Shim, 1999]


Limitations of existing hierarchical schemas l.jpg

Limitations of Existing Hierarchical Schemas

  • CURE

    • Fail to take into account special characteristics


Limitations of existing hierarchical schemas cont d l.jpg

Limitations of Existing Hierarchical Schemas(cont’d)

  • ROCK

    • Irrespective of densities and shapes


Chameleon l.jpg

CHAMELEON

  • Overview

  • Modeling the data

  • Modeling the cluster similarity

  • A two-phase clustering algorithm

  • Performance analysis

  • Experimental Results


Overall framework chameleon l.jpg

Overall Framework CHAMELEON


Modeling the data l.jpg

Modeling the Data

  • K-nearest graphs from an original data in 2D


Modeling the cluster similarity l.jpg

Modeling the Cluster Similarity

  • Relative inter-connectivity


Modeling the cluster similarity cont d l.jpg

Modeling the Cluster Similarity(cont’d)

  • Relative closeness


A two phase clustering algorithm l.jpg

A Two-phase Clustering Algorithm

  • Phase I: Finding initial sub-clusters


A two phase clustering algorithm cont d l.jpg

A Two-phase Clustering Algorithm(cont’d)

  • Phase I: Finding initial sub-clusters

    • Multilevel paradigm[Karypis & Kumar, 1999]

    • hMeT|s [Karypis & Kumar, 1999]


A two phase clustering algorithm cont d21 l.jpg

A Two-phase Clustering Algorithm(cont’d)

  • Phase II: Merging sub-clusters using a dynamic framework

TRI, TRC: user specified threshold


A two phase clustering algorithm cont d22 l.jpg

A Two-phase Clustering Algorithm(cont’d)

  • Phase II: Merging sub-clusters using a dynamic framework


Performance analysis l.jpg

Performance Analysis

  • The amount of time required to compute

    • K-nearest neighbor graph

    • Two-phase clustering


Performance analysis cont d l.jpg

Performance Analysis(cont’d)

  • The amount of time required to compute

    • K-nearest neighbor graph

      • Low-dimensional data sets = O(n log n)

      • High-dimensional data sets = O(n2)


Performance analysis cont d25 l.jpg

Performance Analysis(cont’d)

  • The amount of time required to compute

    • Two-phase clustering

      • Computing internal inter-connectivity and closeness for each cluster: O(nm)

      • Selecting the most similar pair of cluster: O(n log n + m2 log m)

    • Total time = O(nm + n log n + m2 log m)


Experimental results l.jpg

Experimental Results

  • Program

    • DBSCAN: a publicly available version

    • CURE: a locally implemented version

  • Data sets

  • Qualitative comparison


Data sets l.jpg

Data Sets

  • Five clusters

  • Different size, shape, and density

  • Noise point

  • Two clusters

  • Close to each other

  • Different region, different densities

  • Six clusters

  • Different size, shape, and orientation

  • Random noise point

  • Special artifacts

  • Eight clusters

  • Different size, shape, density, and orientation

  • Random noise point

  • Eight clusters

  • Different size, shape, and orientation

  • Random noise and special artifacts


Concluding remarks l.jpg

Concluding remarks

  • CHAMELEON can discover natural clusters of different shapes and sizes

  • It is possible to use other algorithms instead of k-nearest neighbor graph

  • Different domains may require different models for capturing closeness and inter-connectivity


Personal opinion l.jpg

Personal Opinion

  • Without further work


  • Login