formal foundations of clustering
Download
Skip this Video
Download Presentation
Formal Foundations of Clustering

Loading in 2 Seconds...

play fullscreen
1 / 54

Formal Foundations of Clustering - PowerPoint PPT Presentation


  • 87 Views
  • Uploaded on

Formal Foundations of Clustering. Margareta Ackerman [email protected] Work with Shai Ben-David, Simina Branzei , and David Loker. The Theory-Practice Gap. Clustering is one of the most widely used tools for exploratory data analysis. Social Sciences Biology Astronomy

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Formal Foundations of Clustering' - idana


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
formal foundations of clustering

Formal Foundations of Clustering

Margareta Ackerman

[email protected]

Work with Shai Ben-David, SiminaBranzei, and David Loker

slide2

The Theory-Practice Gap

Clustering is one of the most widely used tools for exploratory data analysis.

Social Sciences

Biology

Astronomy

Computer Science

….

All apply clustering to gain a first understanding of the structure of large data sets.

slide3

The Theory-Practice Gap

“While the interest in and application of cluster analysis has been rising rapidly, the abstract nature of the tool is still poorly understood” (Wright, 1973)

“There has been relatively little work aimed at reasoning about clustering independently of any particular algorithm, objective function, or generative data model” (Kleinberg, 2002)

Both statements still apply today.

slide4

Inherent Obstacles:

  • Clustering is ill-defined

Clustering aims to assign data into groups of similar items

Beyond that, there is very little consensus on the definition of clustering

slide5

Inherent Obstacles

  • Clustering is inherently ambiguous
    • There may be multiple reasonable clusterings
    • There is usually no ground truth
  • There are many clustering algorithms with different (often implicit) objective functions
  • Different algorithms have radically different input-output behaviour
slide8

Clustering Algorithm Selection

There are a wide variety of clustering algorithms, which can produce very different clusterings.

How should a user decide which algorithm to use for a given application?

slide9

Clustering Algorithm Selection

Users rely on cost related considerations: running times, space usage, software purchasing costs, etc…

There is inadequate emphasis on

input-output behaviour

slide10

Our Framework for Algorithm Selection

We propose a framework that lets a user utilize prior knowledge to select an algorithm

  • Identify properties that distinguish between different input-output behaviour of clustering paradigms
  • The properties should be:

1) Intuitive and “user-friendly”

    • Useful for distinguishing clustering algorithms
slide11

Our Framework for Algorithm Selection

In essence, our goal is to understand fundamental differences between clustering methods, and convey them formally, clearly, and as simply as possible.

slide12

Previous Work

  • Axiomatic perspective
    • Impossibility Result: Kleinberg (NIPS, 2003)
    • Consistent axioms for quality measures: Ackerman & Ben-David (NIPS, 2009)
    • Axioms in the weighted setting: Wright (Pattern Recognition, 1973)
slide13

Previous Work

  • Characterizations of Single-Linkage
    • Partitional Setting: BosahZehad and Ben-David (UAI, 2009)
    • Hierarchical Setting: Jarvis and Sibson (Mathematical Taxonomy, 1981) and Carlsson and Memoli (JMLR, 2010).
  • Characterizations of Linkage-Based Clustering
    • Partitional Setting: Ackerman, Ben-David, and Loker (COLT, 2010).
    • Hierarchical Setting: Ackerman & Ben-David (IJCAI, 2011).
slide14

Previous Work

  • Classifications of clustering methods
    • Fischer and Van Ness (Biometrica, 1971)
    • Ackerman, Ben-David, and Loker (NIPS, 2010)
slide15

What’s Left To Be Done?

  • Despite much work on clustering properties, some basic questions remain unanswered.
  • Consider some of the most popular clustering methods:
  • k-means, single-linkage, average-linkage, etc…
  • What are the advantages of k-means over other methods?
  • Previous classifications are missing key properties.
slide16

Our Contributions (at a high level)

We indentify 3 fundamental categories that clearly delineate some essential differences between common clustering methods

The strength of these categories is in their simplicity.

We hope this gives insight into core differences between popular clustering methods.

To define these categories, we first present the weighted clustering setting.

outline
Outline
  • Outline
  • Formal framework
  • Categories and classification
  • A result from each category
  • Conclusions and future work
slide18

Weighted Clustering

Every element is associated with a real valued weight, representing its mass or importance.

Generalizes the notion of element duplication.

Algorithm design, particularly design of approximation algorithms, is often done in this framework.

slide19

Other Reasons to Add Weight:

  • An Example
  • Apply clustering to facility allocation, such as the placement of police stations in a new district.
  • The distribution of stations should enable quick access to most areas in the district.
  • Accessibility of different institutions to a station may have varying importance.
  • The weighted setting enables a convenient method for prioritizing certain landmarks.
slide20

Algorithms in the Weighted Clustering Setting

Traditional clustering algorithms can be readily translated into the weighted setting by considering their behavior on data containing element duplicates.

slide21

Formal Setting

For a finite domain set X, a weight function

w: X →R+

defines the weight of every element.

For a finite domain set X, a distance function

d: X xX →R + u {0}

is the distance defined between the domain points.

slide22

Formal Setting

  • Partitional Clustering Algorithm

(X,d) denotes unweighted data

(w[X],d) denotes weighted data

A Partitional Algorithm maps

Input: (w[X],d,k)

to

Output: a k-partition (k-clustering) of X

slide23

Formal Setting

  • Hierarchical Clustering Algorithm

A Hierarchical Algorithm maps

Input: (w[X],d)

to

Output: dendrogram of X

A dendrogram of (X,d) is a strictly binary tree whose

leaves correspond to elements of X

C appearsin A(w[X],d) if its clusters are in the dendrogram

slide24

Our Contributions

  • We utilize the weighted framework to indentify 3 fundamental categories, describing how algorithms respond to weight.
  • Classify traditional algorithms according to these categories
  • Fully characterize when different algorithms react to weight
slide25

Towards Basic CategoriesRange(X,d)

PARTITIONAL:

Range(A(X, d,k)) = {C | ∃ w s.t. C=A(w[X], d)}

The set of clusterings that A outputs on (X, d) over all possible weight functions.

HIERARCHICAL:

Range(A(X, d)) = {D | ∃ w s.t. D=A(w[X], d)}

The set of dendrograms that A outputs on (X, d) over all possible weight functions.

outline1
Outline
  • Outline
  • Formal framework
  • Categories and classification
  • A result from each category
  • Conclusions and future work
slide27

Categories:

  • Weight Robust

A is weight-robustif for all (X, d), |Range(X,d)| = 1.

  • A never responds to weight.
slide28

Categories:

  • Weight Sensitive
  • A is weight-sensitive if for all (X, d),|Range(X,d)| > 1.
  • A always respondsto weight.
slide29

Categories:

  • Weight Considering
  • An algorithm A is weight-considering if
  • There exists (X, d) where |Range(X,d)|=1.
  • There exists (X, d) where |Range(X,d)|>1.
  • A responds to weight on some
  • data sets, but not others.
slide30

Summary of Categories

  • Range(A(X, d)) = {C | ∃ w such that A(w[X], d) = C}
  • Range(A(X, d)) = {D | ∃ w such that A(w[X], d) = D}
  • Weight-robust: for all (X, d), |Range(X,d)| = 1.
  • Weight-sensitive: for all (X, d),|Range(X,d)| > 1.
  • Weight-considering:
  • ∃ (X, d) where |Range(X,d)|=1.
  • ∃ (X, d) where |Range(X,d)|>1.
outline2
Outline
  • Connecting To Applications
  • The desired category depends on the application.

In the facility allocation example above, a weight-sensitive algorithm may be preferred.

  • In phylogeny, where sampling procedures can be highly biased, some degree of weight robustness may be desired.
slide32

Classification

For the weight considering algorithms, we fully characterize when they are sensitive to weight.

outline3
Outline
  • Outline
  • Formal framework
  • Categories and classification
  • A result from each category
  • Classification of heuristics
  • Conclusions and future work
slide35

Zooming Into:

  • Weight Sensitive Algorithms
  • We show that k-means is weight-sensitive.
  • Ais weight-separableif for any data set (X, d) and subset S of X with at most k points, ∃ wso that A(w[X],d,k) separates all points of S.
  • Fact: Every algorithm that is weight-separable is also weight-sensitive.
slide36

K-means is Weight-Sensitive

Proof:

  • Show that k-means is weight-separable
  • Consider any (X,d) and S⊂X on at most k points
  • Increase weight of points in S until each belongs to a distinct cluster.
  • Theorem:k-means is weight-sensitive.
slide37

Zooming Into:

  • Weight Considering Algorithms
  • We show that Average-Linkage is Weight Considering.
  • Characterize the precise conditions under which it is sensitive to weight.
  • Recall:
  • An algorithm A is weight-considering if
  • There exists (X, d) where |Range(X,d)|=1.
  • There exists (X, d) where |Range(X,d)|>1.
slide38

Average Linkage

  • Average-Linkage is a hierarchical algorithm.
  • It starts by creating a leaf for every element.
  • It then repeatedly merges the “closest” clusters using the following linkage function:

Average weighted distance between clusters

slide39

Average Linkage is Weight Considering

  • (X,d) where Range(X,d) =1:
  • The same dendrogram is output
  • for every weight function.

A

B

C

D

B

C

D

A

slide40

Average Linkage is Weight Considering

  • (X,d) where Range(X,d) >1:

1+ϵ

1

1

2+2ϵ

A

B

C

D

E

B

C

D

A

E

1+ϵ

1

1

2+2ϵ

A

B

C

D

E

B

A

C

D

E

slide41

When is Average Linkage

  • Sensitive to Weight?
  • We showed that Average-Linkage is weight-considering.
  • Can we show when it is sensitive to weight?
  • We provide a complete characterization of when Average-Linkage is sensitive to weight, and when it is not.
slide42

Nice Clustering

  • A clustering is nice if every point is closer to all points within its cluster than to all other points.

Nice

slide43

Nice Clustering

  • A clustering is nice if every point is closer to all points within its cluster than to all other points.

Nice

slide44

Nice Clustering

  • A clustering is nice if every point is closer to all points within its cluster than to all other points.

Not nice

slide45

Characterizing When Average Linkage is

  • Sensitive to Weight
  • A dendrogram is nice if all of its clusterings are nice.
  • Theorem:Range(AL(X,d)) = 1 if and only if (X,d) has a nice dendrogram.
slide46

Characterizing When Average Linkage is Sensitive to Weight: Proof

  • Theorem:Range(AL(X,d)) = 1 if and only if (X,d) has a nice dendrogram.
  • Proof:
  • Show that:
  • If there is a nice dendrogram for (X,d),then
    • Average-Linkage outputs it.
  • If a clustering that is not nice appears in dendrogramAL(w[X],d) for some w, then Range(AL(X,d)) > 1.
slide47

Characterizing When Average Linkage is Sensitive to Weight: Proof (cnt.)

  • Lemma:If there is a nice dendrogram for (X,d),then Average-Linkage outputs it.
  • Proof Sketch:
  • Assume that (w[X],d) has a nice dendrogram.
  • Main idea: Show that every nice clustering of the data appears in AL(w[X],d).
  • For that, we show that each cluster in a nice clustering is formed by the algorithm.
slide48

Characterizing When Average Linkage is Sensitive to Weight: Proof (cnt.)

  • Given a nice clustering C, it can be shown that
  • for any clusters Ci and Cj of C,any disjoint subsets Y and Z of Ci, and any subset W of Cj, Y and Z are closer than Y and W.
  • This implies that C appears in the dendrogram.
slide49

Characterizing When Average Linkage Responds to Weight: Proof (cnt.)

  • Lemma:If a clustering Cthat is not nice appears in AL(w[X],d), then range(X,d)>1.
  • Proof:
  • Since C is not nice, there exist points x, y, and z, so that
    • x and y are belong to the same cluster in C
    • x and z belong to difference clusters
    • yet d(x,z) < d(x,y)
  • If x, y andz are sufficiently heavier than all other points, then x and z will be merged before x and y, so C will not be formed.
slide50

Characterizing When Average Linkage is

  • Sensitive to Weight
  • Theorem:Range(AL(X,d)) = 1 if and only if (X,d) has a nice dendrogram.
  • Average Linkage is robust to weight whenever there is a dendrogram of (X,d) consisting of only nice clusterings, and it is sensitive to weight otherwise.
slide51

Zooming Into:

  • Weight Robust Algorithms
  • These algorithms are invariant to element duplication.
  • Ex. Min-Diameter returns a clustering that minimizes the length of the longest within-cluster edge.
  • As this quantity is not effected by the number of points (or weight) at any location, Min-Diameter is weight robust.
outline4
Outline
  • Outline
  • Introduce framework
  • Present categories and classification
  • Show several results from different categories
  • Conclusions and future work
slide53

Conclusions

  • We introduced three basic categories describing how algorithms respond to weights
  • We characterize the precise conditions under which algorithms respond to weights
  • The same results apply in the non-weighted setting for data duplicates
  • This classification can be used to help select clustering algorithms for specific applications
slide54

Future Directions

  • Capture differences between objective functions similar to k-means (ex. k-medians, k-medoids, min-sum)
  • Show bounds on the size of the Range of weight considering and weight sensitive methods
  • Analyze clustering algorithms for categorical data
  • Analyze clustering algorithms with a noise bucket
  • Indentify properties that are significant for specific clustering applications (some previous work in this directions by Ackerman, Brown, and Loker (ICCABS, 2012)).
ad