TOWARDS HIERARCHICAL CLUSTERING
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

TOWARDS HIERARCHICAL CLUSTERING PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on
  • Presentation posted in: General

TOWARDS HIERARCHICAL CLUSTERING. Mark Sh. Levin Inst. for Inform. Transm. Problems, Russian Acad. of Sci. Email: [email protected] Http://www.iitp.ru/mslevin/. PLAN: 1.Basic agglomerative algorithm for hierarchical clustering 2.Multicriteria decision making (DM) approach

Download Presentation

TOWARDS HIERARCHICAL CLUSTERING

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Towards hierarchical clustering

TOWARDS HIERARCHICAL CLUSTERING

Mark Sh. Levin

Inst. for Inform. Transm. Problems, Russian Acad. of Sci.

Email: [email protected] Http://www.iitp.ru/mslevin/

PLAN:

1.Basic agglomerative algorithm for hierarchical clustering

2.Multicriteria decision making (DM) approach

to proximity of objects

3.Integration of objects into several groups/clusters

(i.e., clustering with intersection): algorithms & applications

4.Towards resultant performance (quality of results)

5.Conclusion

CSR’2007, Ural State University, Ekaterinburg, Russia, Sept. 4, 2007


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

Objects (alternatives) Criteria (characteristics)

C1 … Cj … Cm

A1 = ( z11 , … , z1j , … , z1m )

… …

Ai = ( zi1 , … , zij , … , zim )

… …

An = ( zn1 , … , znj , … , znm )

Matrix

Z

“Distance” A1 … Ai … An

A1 = ( d11 , … , d1i , … , d1n )

… …

Ai = ( di1 , … , dii , … , din )

… …

An = ( dn1 , … , dni , … , dnn )

Matrix

D


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

Objects (alternatives) Criteria (characteristics)

C1 … Cj … Cm

A1 = ( z11 , … , z1j , … , z1m )

… …

Ai = ( zi1 , … , zij , … , zim )

… …

An = ( zn1 , … , znj , … , znm )

Matrix

Z

“Distance” A1 … Ai … An

A1 = ( 0 , … , d1i , … , d1n )

… …

Ai = ( di1 , … , 0 , … , din )

… …

An = ( dn1 , … , dni , … , 0 )

Matrix

D


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

Matrix D:

dil = sqrt ( ∑j=1m ( zij – zlj )2 )

Scale for D

min

max

0


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

Agglomerative algorithm:

Stage 1.Computing matrix D=| dil | (pair “distances”)

Stage 2.Revelation of the smallest pair “distance”

(i.e., the minimal pair “distance”,

the minimal element in matrix D)

and integration of the corresponding elements (Ax, Ay)

(objects) into a new joint (integrated) object A=Ax*Ay

Stage 3.Stopping the process or

re-computing the matrix D and GOTO Stage 2.


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

Pair of objects Ax and Ay

Ax = ( zx1 , … , zxj , … , zxm )

Ay = ( zy1 , … , zyj , … , zym )

Integrated object A = ( Ax * Ay )

 j (j = 1,…,m) zi (A) = ( zxj + zyj ) / 2


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

Stage 6:

(1*2*3*4*5*6*7)

. . .

Stage 4:

2

(1*3*4)

(5*6*7)

Stage 3:

2

(1*3*4)

5

(6*7)

Stage 2:

1

2

(3*4)

5

(6*7)

Stage 1:

1

2

(3*4)

5

6

7

Stage 0:

1

2

3

4

5

6

7


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

ILLUSTRATIVE EXAMPLE

Cluster F4

Cluster F2

Cluster F3

Cluster F5

Cluster F1

Cluster F6


Towards hierarchical clustering

Hierarchical clustering: agglomerative algorithm

First, Complexity of agglomerative algorithm:

1.Number of stages (each stage – one integration):

(n-1) stages

2.Each stage:

(a)computing “distances” (n2 * m operations)

THUS:

Operations: O(m n3 )

Memory: O(n(n+m))

Second, we have got the TREE-LIKE STRUCTURE


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

Question 1: What we can do better

in the algorithm?


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

Question 1: What we can do better

in the algorithm?

Question 2:

What is needed in practice (e.g., applications)?

What we can do for applications?


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

Question 1: What we can do better?

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity

2.Complexity: decrease the number of stages:

Integration of several pair of objects

at each stage


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects

at each stage

Usage of an ordinal scale:

0

max{dxy}


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects

at each stage

Usage of an ordinal scale:

0

max{dxy}

To divide the interval [0,max{dxy}]

to get an ordinal scale


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects

at each stage

Usage of an ordinal scale:

0

interval 0

interval 1

interval k

max{dxy}

To divide the interval [0,max{dxy}]

to get an ordinal scale


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects

at each stage

Usage of an ordinal scale:

dab

duv

dpq

dgh

0

interval 0

interval 1

interval k

max{dxy}

Example: pairs of

objects: (a,b), (u,v),

(p,q), (g,h)

To divide the interval [0,max{dxy}]

to get an ordinal scale


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

2.Complexity: decrease the number of stages:

Integration of several pair of objects

at each stage

RESULT:

dab = 0

duv = 0

dpq = 1

dgh = 1

Usage of an ordinal scale:

dab

duv

dpq

dgh

0

interval 0

interval 1

interval k

max{dxy}

To divide the interval [0,max{dxy}]

to get an ordinal scale


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)

Ay -> (zy1, … , zyj , … , zym)


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)

Ay -> (zy1, … , zyj , … , zym)

Vector of “differences” for Ax , Ay:

( (zx1 - zy1) , … , (zxj - zyj), … , (zxm - zym) )


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)

Ay -> (zy1, … , zyj , … , zym)

Vector of “differences” for Ax , Ay:

( (zx1 - zy1) , … , (zxj - zyj), … , (zxm - zym) )

Space of the vectors


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity

Objects: {A1, … , Ai, … An }

Ax -> (zx1, … , zxj , ... , zxm)

Ay -> (zy1, … , zyj , … , zym)

Vector of “differences” for Ax , Ay:

( (zx1 - zy1) , … , (zxj - zyj), … , (zxm - zym) )

Space of the vectors =>ordinal scale&ordinal proximity


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (to do better)

1.Computing: pair “distance” (pair proximity)

Usage of more “correct” approaches

from multicriteria decision making, e.g.,

Revelation of Pareto-layers and usage of

an ordinal scale for pair proximity

C1

Pareto-effective

Layer (1)

Layer 2

Ideal point

(equal objects)

C2

Space of the vectors =>ordinal scale&ordinal proximity


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Question 2:

What is needed in practice (e.g., applications)?

What we can do for applications?


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Question 2:

What is needed in practice (e.g., applications)?

What we can do for applications?

Integration of objects into several groups (clusters)

to obtain more rich resultant structure

(tree => hierarchy, i.e., clusters with intersection)

Examples of applied domains:

1.Engineering: structures of complex systems

2.CS: structures of software/hardware

3.Communication networks (topology)

4.Biology 5.Others


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Question 2:

What is needed in practice (e.g., applications)?

What we can do for applications?

Clustering with

intersection

Cluster F3

Cluster F4

Cluster F2

Cluster F5

Cluster F1

Cluster F6


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Stage 4:

(1*2*3*4*5*6*7)

2

Stage 3:

(3*4*5*6*7)

(1*2*3*4)

Stage 2:

1

(2*3*4)

(6*7)

(3*4*5*6)

Stage 1:

1

(2*3)

(3*4)

(5*6)

(6*7)

Stage 0:

1

2

3

4

5

6

7


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Resultant

structure

(1*2*3*4*5*6*7)

Stage 3:

Stage 2:

Stage 1:

Stage 0:

1

2

3

4

5

6

7


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Example from

biology

(evolution)

Traditional

evolution

process

as tree


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Example from

biology

(evolution)

Hierarchical

structure


Towards hierarchical clustering

Hierarchical clustering: IMPROVEMENTS (practice)

Algorithm 1.

The number of inclusion for each object is not limited):

(i)initial set of objects -> vertices

(ii)”small” proximity -> edges

Thus: a graph

Problem: to reveal cliques in the graph

(It is NP-hard problem)

Algorithm 2.

The number of the inclusion is limited by t

(e.g., t=2/3/4). Here complexity is polynomial.


Towards hierarchical clustering

Hierarchical clustering: performance (i.e., quality)

Performance (i.e., quality) of clustering procedures:

1.Issues of complexity

2.Quality of results (??)

Some traditional approaches:

(a)computing a clustering quality

InterCluster Distance / IntraCluster Distance

(b)Coverage, Diversity

Our case: research procedure

(for investigation and problem structuring)


Towards hierarchical clustering

Hierarchical clustering: performance (i.e., quality)

Decision Making Paradigm (stages) by Herbert A. Simon

1.Analysis of an applied problem (to understand

the problem: main contradictions, etc.)

2.Structuring the problem:

2.1.Generation of alternatives

2.2.Design of criteria

2.3.Design of scales for assessment of

alternatives upon criteria

3.Evaluation of alternatives upon criteria

4.Selection of the best alternative (s)

5.Analysis of results

Basic DM problems: choice, ranking,


Towards hierarchical clustering

Hierarchical clustering: performance (i.e., quality)

FOR CLUSTERING:

1.Analysis of an applied problem (to understand

the problem: main contradictions, etc.)

2.Structuring the problem:

2.1.Generation of alternatives

2.2.Design of criteria

2.3.Design of scales for assessment of

alternatives upon criteria

3.Evaluation of alternatives upon criteria

4.Design of CLUSTERS and

STRUCTURE OF CLUSTERING PROCESS

5.Analysis of results

THUS: we have got some prospective

RESEARCH RPOCEDURES


Towards hierarchical clustering

CONCLUSION

1.Algorithms, procedures &

their analysis

2.New approaches to

performance/quality for

research procedures

3.Various applied examples

4.Usage in education


Towards hierarchical clustering

That’s All

Thanks!

http://www.iitp.ru/mslevin/

Mark Sh. Levin


  • Login