a uniqueness theorem for clustering n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
A UNIQUENESS THEOREM FOR CLUSTERING PowerPoint Presentation
Download Presentation
A UNIQUENESS THEOREM FOR CLUSTERING

Loading in 2 Seconds...

play fullscreen
1 / 26

A UNIQUENESS THEOREM FOR CLUSTERING - PowerPoint PPT Presentation


  • 144 Views
  • Uploaded on

A UNIQUENESS THEOREM FOR CLUSTERING. Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009. TALK OUTLINE. Questions being addressed Introduce Axioms Introduce Properties Uniqueness Theorem for Single-Linkage Taxonomy of Partitioning Functions.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'A UNIQUENESS THEOREM FOR CLUSTERING' - thy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
a uniqueness theorem for clustering

A UNIQUENESS THEOREMFOR CLUSTERING

Reza Bosagh Zadeh (Carnegie Mellon)

Shai Ben-David (Waterloo)

UAI 09, Montréal, June 2009

talk outline
TALK OUTLINE
  • Questions being addressed
  • Introduce Axioms
  • Introduce Properties
  • Uniqueness Theorem for Single-Linkage
  • Taxonomy of Partitioning Functions
what is clustering
WHAT IS CLUSTERING?
  • Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.
some basic unanswered questions
SOME BASIC UNANSWERED QUESTIONS
  • Are there principles governing all clustering paradigms?
  • Which clustering paradigm should I use for a given task?
slide5

THERE ARE MANY CLUSTERING TASKS

“Clustering” is an ill-defined problem

  • There are many different clustering tasks,

leading to different clustering paradigms:

slide6

THERE ARE MANY CLUSTERING TASKS

“Clustering” is an ill-defined problem

  • There are many different clustering tasks,

leading to different clustering paradigms:

we would like to discuss the broad notion of clustering
WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING

Independently of any

  • particular algorithm,
  • particular objective function, or
  • particular generative data model
what for
WHAT FOR?
  • Choosing a suitable algorithm for a given task.
    • Axioms:to capture intuition about clustering in general.
      • Expected to be satisfied by all clustering functions
    • Properties: to capture differences between different clustering paradigms
timeline
TIMELINE
  • Jardine, Sibson 1971
    • Considered only hierarchical functions
  • Kleinberg 2003
    • Presented an impossibility result
  • This paper
    • Presents a uniqueness result for Single-Linkage
the basic setting
THE BASIC SETTING
  • For a finite domain set S, a distance function is a symmetric mapping

d:SxS → R+ such that

d(x,y)=0 iff x=y.

  • A partitioning function takes a dissimilarity function on S and returns a partition of S.
  • We wish to define the axioms that distinguishclustering functions, from any other functions that output domain partitions.
kleinberg s axioms
KLEINBERG’S AXIOMS
  • Scale Invariance

F(λd)=F(d) for all d and all strictly positive λ.

  • Richness

The range of F(d) over all d is the set of all possible partitionings

  • Consistency

If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances,

then F(d) = F(d’).

kleinberg s axioms1
KLEINBERG’S AXIOMS
  • Scale Invariance

F(λd)=F(d) for all d and all strictly positive λ.

  • Richness

The range of F(d) over all d is the set of all possible partitionings

  • Consistency

If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances,

then F(d) = F(d’).

Inconsistent! No algorithm can satisfy all 3 of these.

consistent axioms
CONSISTENT AXIOMS:

Fix k

  • Scale Invariance

F(λd, k)=F(d, k) for all d and all strictly positive λ.

  • k-Richness

The range of F(d, k) over all d is the set of all possible

k-partitionings

  • Consistency

If d’ equals d except for shrinking distances within clusters of F(d, k) or stretching between-cluster distances,

then F(d, k)=F(d’, k).

Consistent! (And satisfied by Single-Linkage, Min-Sum, …)

clustering functions
CLUSTERING FUNCTIONS
  • Definition. Call any partitioning function which satisfies

a Clustering Function

  • Scale Invariance
  • k-Richness
  • Consistency
two clustering functions
TWO CLUSTERING FUNCTIONS

Single-Linkage

Min-Sum k-Clustering

  • Start with with all points in their own cluster
  • While there are more than k clusters
    • Merge the two most similar clusters

Find the k-partitioning Γ which minimizes

Not Hierarchical

(Is NP-Hard to optimize)

Both Functions satisfy:

Similarity between two clusters is the similarity of the most similar two points from differing clusters

  • Scale Invariance
  • k-Richness
  • Consistency

Hierarchical

Proofs in paper.

clustering functions1
CLUSTERING FUNCTIONS
  • Single-Linkage and Min-Sum are both Clustering functions.
  • How to distinguish between them in an Axiomatic framework? Use Properties
  • Not all properties are desired in every clustering situation: pick and choose properties for your task
properties order consistency
PROPERTIES - ORDER-CONSISTENCY
  • Order-Consistency
      • If two datasets d and d’ have the same ordering of the distances, then for all k,
      • F(d, k)=F(d’, k)
  • In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points.
  • Satisfied by Single-Linkage, Max-Linkage, Average-Linkage…
  • NOT satisfied by most objective functions (Min-Sum, k-means, …)
path distance
PATH-DISTANCE

In other words, we find the path from x to y, which has the smallest longest jump in it.

1

2

e.g.

Pd( , ) = 2

Since the path from above has a jump of distance 2

7

Undrawn edges are large

4

1

path distance1
PATH-DISTANCE
  • Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks.
  • Being human,we are restricted in how far we can jump from island to island.
  • Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully.
properties path distance coherence
PROPERTIES – PATH-DISTANCE COHERENCE
  • Path-Distance Coherence
      • If two datasets d and d’ have the same induced path distance then for all k,
      • F(d, k)=F(d’, k)
uniqueness theorem
UNIQUENESS THEOREM
  • Theorem (This work)
    • Single-Linkage is the only clustering function satisfying Order-Consistency and Path Distance-Coherence
uniqueness theorem1
UNIQUENESS THEOREM
  • Theorem (This work)
    • Single-Linkage is the only clustering function satisfying Order-Consistency and Path-Distance-Coherence
  • Is Path-Distance-Coherence doing all the work? No.
    • Consistency is necessary for uniqueness
    • k-Richness is necessary
  • “X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness
practical considerations
PRACTICAL CONSIDERATIONS
  • Single-Linkage is not always the right function to use.
    • Because Path-Distance-Coherence is not always desirable.
  • It’s not always immediately obvious when we want a function to focus on the Path Distance
    • Introduce a different formulation involving Minimum Spanning Trees
properties mst coherence
PROPERTIES - MST-COHERENCE

If

F

, 2

Then

F

, 2

, 2

20

20

20

20

  • MST-Coherence
      • If two datasets d and d’ have the same Minimum Spanning Tree then for all k,
      • F(d, k)=F(d’, k)
a taxonomy of clustering functions
A TAXONOMY OF CLUSTERING FUNCTIONS
  • Min-Sum satisfies neither MST-Coherence nor Order-Consistency
  • Future work: Characterize other clustering functions