Graph based hierarchical conceptual clustering
Download
1 / 32

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING - PowerPoint PPT Presentation


  • 258 Views
  • Updated On :
  • Presentation posted in: Pets / Animals

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING. by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington. Outline. What is hierarchical conceptual clustering? Overview of Subdue Conceptual clustering in Subdue Evaluation of hierarchical clusterings

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING

by

Istvan Jonyer,

Lawrence B. Holder and

Diane J. Cook

The University of Texas at Arlington


Outline

  • What is hierarchical conceptual clustering?

  • Overview of Subdue

  • Conceptual clustering in Subdue

  • Evaluation of hierarchical clusterings

  • Experiments and results

  • Conclusions


What is clustering?


What is hierarchical conceptual clustering?

  • Unsupervised concept learning

  • Generating hierarchies to explain data

  • Applications

    • Hypothesis generation and testing

    • Prediction based on groups

    • Finding taxonomies


Animals

HeartChamber: four

BodyTemp: regulated

Fertilization: internal

BodyTemp: unregulated

Name: mammal

BodyCover: hair

Name: bird

BodyCover: feathers

Name: reptile

BodyCover: cornified-skin

HeartChamber: imperfect-four

Fertilization: internal

Fertilization: external

Name: amphibian

BodyCover: moist-skin

HeartChamber: three

Name: fish

BodyCover: scales

HeartChamber: two

Example hierarchical conceptualclustering


The Problem

  • Hierarchical conceptual clustering in discrete-valued structural databases

  • Existing systems:

    • Continuous-valued

    • Discrete but unstructured

    • We can do better! (Field under explored)


Related Work

  • Cobweb

  • Labyrinth

  • AutoClass

  • Snob

  • In Euclidian space: Chameleon, Cure

  • Unsupervised learning algorithms


The Solution

  • Take Subdue and extend it!


E

e

A

A

g

a

a

d

B

D

D

B

b

b

c

c

f

C

C

F

Overview of Subdue

  • Data mining in graph representations of structural databases


A

a

D

B

b

c

C

Overview of Subdue

  • Iteratively searching for best substructure by MDL heuristic


E

e

g

d

S

S

f

F

Overview of Subdue

  • Compress using best substructure


Overview of Subdue

  • Fuzzy match

    • Inexact matching of subgraphs

    • Applications:

      • Defining fuzzy concepts

      • Evaluation of clusterings


Conceptual Clustering with Subdue

  • Use Subdue to identify clusters

    • The best subgraph in an iteration defines a cluster

  • When to stop within an iteration?

    • Use –limit option

    • Use –size option

    • Use first minimum heuristic (new)


The First Minimum Heuristic

  • Use subgraph at first local minimum

    • Detect it using –prune2 option


The First Minimum Heuristic

  • Not a greedy heuristic!

    • Although first local minimum is usually the global minimum

    • First local minimum is caused by a smaller, more frequently occurring subgraph

    • Subsequent minima are caused by bigger, less frequently occurring subgraphs

      => First subgraph is more general


The First Minimum Heuristic

A multi-minimum search space:


Lattice vs. Tree

  • Previous work defined classification trees

    • Inadequate in structured domains

  • Better hierarchical description: classification lattice

    • A cluster can have more than one parent

    • A parent can be at any level (not only one level above)


Hierarchical Clustering in Subdue

  • Subdue can compress by a subgraph after each iteration

  • Subsequent clusters may be defined in terms of previously defined clusters

  • This results in a hierarchy


Hierarchical Conceptual Clustering of an Artificial Domain


Root

Hierarchical Conceptual Clustering of an Artificial Domain


Evaluation of Clusterings

  • Traditional evaluation:

    • Not applicable to hierarchical domains

  • No known evaluation for hierarchical clusterings

    • Most hierarchical evaluations are anecdotal


New Evaluation Heuristic for Hierarchical Clusterings

Properties of a good clustering:

  • Small number of clusters

    • Large coverage  good generality

  • Big cluster descriptions

    • More features  more inferential power

  • Minimal or no overlap between clusters

    • More distinct clusters  better defined concepts


New Evaluation Heuristic for Hierarchical Clusterings

Big clusters: bigger distance between disjoint clusters

Overlap: less overlap bigger distance

Few clusters: averaging comparisons


Experiments and Results

  • Validation in an artificial domain

  • Validation in unstructured domains

  • Comparison to existing systems

  • Real world applications


Name

Body Cover

Heart Chamber

Body Temp.

Fertilization

mammal

hair

four

regulated

internal

bird

feathers

four

regulated

internal

reptile

cornified-skin

imperfect-four

unregulated

internal

mammal

Name

four

hair

BodyCover

amphibian

moist-skin

three

unregulated

external

HeartChamber

animal

Fertilization

BodyTemp

regulated

internal

fish

scales

two

unregulated

external

The Animal Domain


Animals

HeartChamber: four

BodyTemp: regulated

Fertilization: internal

BodyTemp: unregulated

Name: mammal

BodyCover: hair

Name: bird

BodyCover: feathers

Name: reptile

BodyCover: cornified-skin

HeartChamber: imperfect-four

Fertilization: internal

Fertilization: external

Name: amphibian

BodyCover: moist-skin

HeartChamber: three

Name: fish

BodyCover: scales

HeartChamber: two

Hierarchical Clustering of the Animal Domain


animals

amphibian/fish

mammal/bird

reptile

fish

amphibian

mammal

bird

Hierarchical Clustering of the Animal Domain by Cobweb


Comparison of Subdue and Cobweb

  • Quality of Subdue’s lattice (tree): 2.60

  • Quality of Cobweb’s tree: 1.74

  • Therefore Subdue is better

  • Reasons for a higher score:

    • Better generalization resulting in less clusters

    • Eliminating overlap between (reptile) and (amphibian/fish)


Chemical Application: Clustering of a DNA sequence


DNA

O

|

O == P — OH

C — N

C — C

C — C

\

O

C

\

N — C

\

C

O

|

O == P — OH

|

O

|

CH2

O

\

C

/ \

C — C N — C

/ \

O C

Chemical Application: Clustering of a DNA sequence

Coverage

  • 61%

  • 68%

  • 71%


Conclusions

  • Goal of hierarchical conceptual clustering of structured databases was achieved

  • Synthesized classification lattice

  • Developed new evaluation heuristic for hierarchical clusterings

  • Good performance in comparison to other systems, even in unstructured domains


Future Work

  • More experiments on real-world domains

  • Comparison to other systems

  • Incorporation of evaluation tool into Subdue


ad
  • Login