A UNIQUENESS THEOREM FOR CLUSTERING

1 / 26

# A UNIQUENESS THEOREM FOR CLUSTERING - PowerPoint PPT Presentation

A UNIQUENESS THEOREM FOR CLUSTERING. Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009. TALK OUTLINE. Questions being addressed Introduce Axioms Introduce Properties Uniqueness Theorem for Single-Linkage Taxonomy of Partitioning Functions.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'A UNIQUENESS THEOREM FOR CLUSTERING' - thy

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### A UNIQUENESS THEOREMFOR CLUSTERING

Shai Ben-David (Waterloo)

UAI 09, Montréal, June 2009

TALK OUTLINE
• Introduce Axioms
• Introduce Properties
• Taxonomy of Partitioning Functions
WHAT IS CLUSTERING?
• Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
• Are there principles governing all clustering paradigms?

“Clustering” is an ill-defined problem

• There are many different clustering tasks,

“Clustering” is an ill-defined problem

• There are many different clustering tasks,

WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING

Independently of any

• particular algorithm,
• particular objective function, or
• particular generative data model
WHAT FOR?
• Choosing a suitable algorithm for a given task.
• Axioms:to capture intuition about clustering in general.
• Expected to be satisfied by all clustering functions
• Properties: to capture differences between different clustering paradigms
TIMELINE
• Jardine, Sibson 1971
• Considered only hierarchical functions
• Kleinberg 2003
• Presented an impossibility result
• This paper
• Presents a uniqueness result for Single-Linkage
THE BASIC SETTING
• For a finite domain set S, a distance function is a symmetric mapping

d:SxS → R+ such that

d(x,y)=0 iff x=y.

• A partitioning function takes a dissimilarity function on S and returns a partition of S.
• We wish to define the axioms that distinguishclustering functions, from any other functions that output domain partitions.
KLEINBERG’S AXIOMS
• Scale Invariance

F(λd)=F(d) for all d and all strictly positive λ.

• Richness

The range of F(d) over all d is the set of all possible partitionings

• Consistency

If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances,

then F(d) = F(d’).

KLEINBERG’S AXIOMS
• Scale Invariance

F(λd)=F(d) for all d and all strictly positive λ.

• Richness

The range of F(d) over all d is the set of all possible partitionings

• Consistency

If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances,

then F(d) = F(d’).

Inconsistent! No algorithm can satisfy all 3 of these.

CONSISTENT AXIOMS:

Fix k

• Scale Invariance

F(λd, k)=F(d, k) for all d and all strictly positive λ.

• k-Richness

The range of F(d, k) over all d is the set of all possible

k-partitionings

• Consistency

If d’ equals d except for shrinking distances within clusters of F(d, k) or stretching between-cluster distances,

then F(d, k)=F(d’, k).

Consistent! (And satisfied by Single-Linkage, Min-Sum, …)

CLUSTERING FUNCTIONS
• Definition. Call any partitioning function which satisfies

a Clustering Function

• Scale Invariance
• k-Richness
• Consistency
TWO CLUSTERING FUNCTIONS

Min-Sum k-Clustering

• While there are more than k clusters
• Merge the two most similar clusters

Find the k-partitioning Γ which minimizes

Not Hierarchical

(Is NP-Hard to optimize)

Both Functions satisfy:

Similarity between two clusters is the similarity of the most similar two points from differing clusters

• Scale Invariance
• k-Richness
• Consistency

Hierarchical

Proofs in paper.

CLUSTERING FUNCTIONS
• Single-Linkage and Min-Sum are both Clustering functions.
• How to distinguish between them in an Axiomatic framework? Use Properties
• Not all properties are desired in every clustering situation: pick and choose properties for your task
PROPERTIES - ORDER-CONSISTENCY
• Order-Consistency
• If two datasets d and d’ have the same ordering of the distances, then for all k,
• F(d, k)=F(d’, k)
• In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points.
• NOT satisfied by most objective functions (Min-Sum, k-means, …)
PATH-DISTANCE

In other words, we find the path from x to y, which has the smallest longest jump in it.

1

2

e.g.

Pd( , ) = 2

Since the path from above has a jump of distance 2

7

Undrawn edges are large

4

1

PATH-DISTANCE
• Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks.
• Being human,we are restricted in how far we can jump from island to island.
• Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully.
PROPERTIES – PATH-DISTANCE COHERENCE
• Path-Distance Coherence
• If two datasets d and d’ have the same induced path distance then for all k,
• F(d, k)=F(d’, k)
UNIQUENESS THEOREM
• Theorem (This work)
• Single-Linkage is the only clustering function satisfying Order-Consistency and Path Distance-Coherence
UNIQUENESS THEOREM
• Theorem (This work)
• Single-Linkage is the only clustering function satisfying Order-Consistency and Path-Distance-Coherence
• Is Path-Distance-Coherence doing all the work? No.
• Consistency is necessary for uniqueness
• k-Richness is necessary
• “X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness
PRACTICAL CONSIDERATIONS
• Single-Linkage is not always the right function to use.
• Because Path-Distance-Coherence is not always desirable.
• It’s not always immediately obvious when we want a function to focus on the Path Distance
• Introduce a different formulation involving Minimum Spanning Trees
PROPERTIES - MST-COHERENCE

If

F

, 2

Then

F

, 2

, 2

20

20

20

20

• MST-Coherence
• If two datasets d and d’ have the same Minimum Spanning Tree then for all k,
• F(d, k)=F(d’, k)
A TAXONOMY OF CLUSTERING FUNCTIONS
• Min-Sum satisfies neither MST-Coherence nor Order-Consistency
• Future work: Characterize other clustering functions