graph based hierarchical conceptual clustering
Download
Skip this Video
Download Presentation
GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING

Loading in 2 Seconds...

play fullscreen
1 / 32

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING - PowerPoint PPT Presentation


  • 279 Views
  • Uploaded on

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING. by Istvan Jonyer, Lawrence B. Holder and Diane J. Cook The University of Texas at Arlington. Outline. What is hierarchical conceptual clustering? Overview of Subdue Conceptual clustering in Subdue Evaluation of hierarchical clusterings

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING' - Angelica


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
graph based hierarchical conceptual clustering

GRAPH-BASED HIERARCHICAL CONCEPTUAL CLUSTERING

by

Istvan Jonyer,

Lawrence B. Holder and

Diane J. Cook

The University of Texas at Arlington

outline
Outline
  • What is hierarchical conceptual clustering?
  • Overview of Subdue
  • Conceptual clustering in Subdue
  • Evaluation of hierarchical clusterings
  • Experiments and results
  • Conclusions
what is hierarchical conceptual clustering
What is hierarchical conceptual clustering?
  • Unsupervised concept learning
  • Generating hierarchies to explain data
  • Applications
    • Hypothesis generation and testing
    • Prediction based on groups
    • Finding taxonomies
example hierarchical conceptual clustering
Animals

HeartChamber: four

BodyTemp: regulated

Fertilization: internal

BodyTemp: unregulated

Name: mammal

BodyCover: hair

Name: bird

BodyCover: feathers

Name: reptile

BodyCover: cornified-skin

HeartChamber: imperfect-four

Fertilization: internal

Fertilization: external

Name: amphibian

BodyCover: moist-skin

HeartChamber: three

Name: fish

BodyCover: scales

HeartChamber: two

Example hierarchical conceptualclustering
the problem
The Problem
  • Hierarchical conceptual clustering in discrete-valued structural databases
  • Existing systems:
    • Continuous-valued
    • Discrete but unstructured
    • We can do better! (Field under explored)
related work
Related Work
  • Cobweb
  • Labyrinth
  • AutoClass
  • Snob
  • In Euclidian space: Chameleon, Cure
  • Unsupervised learning algorithms
the solution
The Solution
  • Take Subdue and extend it!
overview of subdue
E

e

A

A

g

a

a

d

B

D

D

B

b

b

c

c

f

C

C

F

Overview of Subdue
  • Data mining in graph representations of structural databases
overview of subdue10
A

a

D

B

b

c

C

Overview of Subdue
  • Iteratively searching for best substructure by MDL heuristic
overview of subdue11
E

e

g

d

S

S

f

F

Overview of Subdue
  • Compress using best substructure
overview of subdue12
Overview of Subdue
  • Fuzzy match
    • Inexact matching of subgraphs
    • Applications:
      • Defining fuzzy concepts
      • Evaluation of clusterings
conceptual clustering with subdue
Conceptual Clustering with Subdue
  • Use Subdue to identify clusters
    • The best subgraph in an iteration defines a cluster
  • When to stop within an iteration?
    • Use –limit option
    • Use –size option
    • Use first minimum heuristic (new)
the first minimum heuristic
The First Minimum Heuristic
  • Use subgraph at first local minimum
    • Detect it using –prune2 option
the first minimum heuristic15
The First Minimum Heuristic
  • Not a greedy heuristic!
    • Although first local minimum is usually the global minimum
    • First local minimum is caused by a smaller, more frequently occurring subgraph
    • Subsequent minima are caused by bigger, less frequently occurring subgraphs

=> First subgraph is more general

the first minimum heuristic16
The First Minimum Heuristic

A multi-minimum search space:

lattice vs tree
Lattice vs. Tree
  • Previous work defined classification trees
    • Inadequate in structured domains
  • Better hierarchical description: classification lattice
    • A cluster can have more than one parent
    • A parent can be at any level (not only one level above)
hierarchical clustering in subdue
Hierarchical Clustering in Subdue
  • Subdue can compress by a subgraph after each iteration
  • Subsequent clusters may be defined in terms of previously defined clusters
  • This results in a hierarchy
evaluation of clusterings
Evaluation of Clusterings
  • Traditional evaluation:
    • Not applicable to hierarchical domains
  • No known evaluation for hierarchical clusterings
    • Most hierarchical evaluations are anecdotal
new evaluation heuristic for hierarchical clusterings
New Evaluation Heuristic for Hierarchical Clusterings

Properties of a good clustering:

  • Small number of clusters
    • Large coverage  good generality
  • Big cluster descriptions
    • More features  more inferential power
  • Minimal or no overlap between clusters
    • More distinct clusters  better defined concepts
new evaluation heuristic for hierarchical clusterings23
New Evaluation Heuristic for Hierarchical Clusterings

Big clusters: bigger distance between disjoint clusters

Overlap: less overlap bigger distance

Few clusters: averaging comparisons

experiments and results
Experiments and Results
  • Validation in an artificial domain
  • Validation in unstructured domains
  • Comparison to existing systems
  • Real world applications
the animal domain
Name

Body Cover

Heart Chamber

Body Temp.

Fertilization

mammal

hair

four

regulated

internal

bird

feathers

four

regulated

internal

reptile

cornified-skin

imperfect-four

unregulated

internal

mammal

Name

four

hair

BodyCover

amphibian

moist-skin

three

unregulated

external

HeartChamber

animal

Fertilization

BodyTemp

regulated

internal

fish

scales

two

unregulated

external

The Animal Domain
hierarchical clustering of the animal domain
Animals

HeartChamber: four

BodyTemp: regulated

Fertilization: internal

BodyTemp: unregulated

Name: mammal

BodyCover: hair

Name: bird

BodyCover: feathers

Name: reptile

BodyCover: cornified-skin

HeartChamber: imperfect-four

Fertilization: internal

Fertilization: external

Name: amphibian

BodyCover: moist-skin

HeartChamber: three

Name: fish

BodyCover: scales

HeartChamber: two

Hierarchical Clustering of the Animal Domain
hierarchical clustering of the animal domain by cobweb
animals

amphibian/fish

mammal/bird

reptile

fish

amphibian

mammal

bird

Hierarchical Clustering of the Animal Domain by Cobweb
comparison of subdue and cobweb
Comparison of Subdue and Cobweb
  • Quality of Subdue’s lattice (tree): 2.60
  • Quality of Cobweb’s tree: 1.74
  • Therefore Subdue is better
  • Reasons for a higher score:
    • Better generalization resulting in less clusters
    • Eliminating overlap between (reptile) and (amphibian/fish)
chemical application clustering of a dna sequence30
DNA

O

|

O == P — OH

C — N

C — C

C — C

\

O

C

\

N — C

\

C

O

|

O == P — OH

|

O

|

CH2

O

\

C

/ \

C — C N — C

/ \

O C

Chemical Application: Clustering of a DNA sequence

Coverage

  • 61%
  • 68%
  • 71%
conclusions
Conclusions
  • Goal of hierarchical conceptual clustering of structured databases was achieved
  • Synthesized classification lattice
  • Developed new evaluation heuristic for hierarchical clusterings
  • Good performance in comparison to other systems, even in unstructured domains
future work
Future Work
  • More experiments on real-world domains
  • Comparison to other systems
  • Incorporation of evaluation tool into Subdue
ad