Chameleon a hierarchical clustering algorithm using dynamic modeling
Download
1 / 29

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling - PowerPoint PPT Presentation


  • 1147 Views
  • Uploaded on

CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling Outline Motivation Objective Research restrict Literature review An overview of related clustering algorithms The limitations of clustering algorithms CHAMELEON Concluding remarks Personal opinion Motivation

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling' - Audrey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Outline l.jpg
Outline Modeling

  • Motivation

  • Objective

  • Research restrict

  • Literature review

    • An overview of related clustering algorithms

    • The limitations of clustering algorithms

  • CHAMELEON

  • Concluding remarks

  • Personal opinion


Motivation l.jpg
Motivation Modeling

  • Existing clustering algorithms can breakdown

    • Choice of parameters is incorrect

    • Model is not adequate to capture the characteristics of clusters

    • Diverse shapes, densities, and sizes


Objective l.jpg
Objective Modeling

  • Presenting a novel hierarchical clustering algorithm – CHAMELEON

    • Facilitating discovery of natural and homogeneous

    • Being applicable to all types of data


Research restrict l.jpg
Research Restrict Modeling

  • In this paper, authors ignored the issue of scaling to large data sets that cannot fit in the main memory


Literature review l.jpg
Literature Review Modeling

  • Clustering

  • An overview of related clustering algorithms

  • The limitations of the recently proposed state of the art clustering algorithms


Clustering l.jpg
Clustering Modeling

  • The intracluster similarity is maximized and the intercluster similarity is minimized [Jain and Dubes, 1988]

  • Serving as the foundation for data mining and analysis techniques


Clustering cont d l.jpg
Clustering(cont Modeling’d)

  • Applications

    • Purchasing patterns

    • Categorization of documents on WWW [Boley, et al., 1999]

    • Grouping of genes and proteins that have similar functionality[Harris, et al., 1992]

    • Grouping if spatial locations prone to earth quakes[Byers and Adrian, 1998]


An overview of related clustering algorithms l.jpg
An Overview of Related Clustering Algorithms Modeling

  • Partitional techniques

  • Hierarchical techniques


Partitional techniques l.jpg
Partitional Techniques Modeling

  • K means[Jain and Dubes, 1988]


Hierarchical techniques l.jpg
Hierarchical Techniques Modeling

  • CURE [Guha, Rastogi and Shim, 1998]

  • ROCK [Guha, Rastogi and Shim, 1999]


Limitations of existing hierarchical schemas l.jpg
Limitations of Existing Hierarchical Schemas Modeling

  • CURE

    • Fail to take into account special characteristics


Limitations of existing hierarchical schemas cont d l.jpg
Limitations of Existing Hierarchical Schemas(cont Modeling’d)

  • ROCK

    • Irrespective of densities and shapes


Chameleon l.jpg
CHAMELEON Modeling

  • Overview

  • Modeling the data

  • Modeling the cluster similarity

  • A two-phase clustering algorithm

  • Performance analysis

  • Experimental Results



Modeling the data l.jpg
Modeling the Data Modeling

  • K-nearest graphs from an original data in 2D


Modeling the cluster similarity l.jpg
Modeling the Cluster Similarity Modeling

  • Relative inter-connectivity


Modeling the cluster similarity cont d l.jpg
Modeling the Cluster Similarity(cont Modeling’d)

  • Relative closeness


A two phase clustering algorithm l.jpg
A Two-phase Clustering Algorithm Modeling

  • Phase I: Finding initial sub-clusters


A two phase clustering algorithm cont d l.jpg
A Two-phase Clustering Algorithm(cont Modeling’d)

  • Phase I: Finding initial sub-clusters

    • Multilevel paradigm[Karypis & Kumar, 1999]

    • hMeT|s [Karypis & Kumar, 1999]


A two phase clustering algorithm cont d21 l.jpg
A Two-phase Clustering Algorithm(cont Modeling’d)

  • Phase II: Merging sub-clusters using a dynamic framework

TRI, TRC: user specified threshold


A two phase clustering algorithm cont d22 l.jpg
A Two-phase Clustering Algorithm(cont Modeling’d)

  • Phase II: Merging sub-clusters using a dynamic framework


Performance analysis l.jpg
Performance Analysis Modeling

  • The amount of time required to compute

    • K-nearest neighbor graph

    • Two-phase clustering


Performance analysis cont d l.jpg
Performance Analysis(cont Modeling’d)

  • The amount of time required to compute

    • K-nearest neighbor graph

      • Low-dimensional data sets = O(n log n)

      • High-dimensional data sets = O(n2)


Performance analysis cont d25 l.jpg
Performance Analysis(cont Modeling’d)

  • The amount of time required to compute

    • Two-phase clustering

      • Computing internal inter-connectivity and closeness for each cluster: O(nm)

      • Selecting the most similar pair of cluster: O(n log n + m2 log m)

    • Total time = O(nm + n log n + m2 log m)


Experimental results l.jpg
Experimental Results Modeling

  • Program

    • DBSCAN: a publicly available version

    • CURE: a locally implemented version

  • Data sets

  • Qualitative comparison


Data sets l.jpg
Data Sets Modeling

  • Five clusters

  • Different size, shape, and density

  • Noise point

  • Two clusters

  • Close to each other

  • Different region, different densities

  • Six clusters

  • Different size, shape, and orientation

  • Random noise point

  • Special artifacts

  • Eight clusters

  • Different size, shape, density, and orientation

  • Random noise point

  • Eight clusters

  • Different size, shape, and orientation

  • Random noise and special artifacts


Concluding remarks l.jpg
Concluding remarks Modeling

  • CHAMELEON can discover natural clusters of different shapes and sizes

  • It is possible to use other algorithms instead of k-nearest neighbor graph

  • Different domains may require different models for capturing closeness and inter-connectivity


Personal opinion l.jpg
Personal Opinion Modeling

  • Without further work


ad