1 / 26

A UNIQUENESS THEOREM FOR CLUSTERING

A UNIQUENESS THEOREM FOR CLUSTERING. Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009. TALK OUTLINE. Questions being addressed Introduce Axioms Introduce Properties Uniqueness Theorem for Single-Linkage Taxonomy of Partitioning Functions.

thy
Download Presentation

A UNIQUENESS THEOREM FOR CLUSTERING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A UNIQUENESS THEOREMFOR CLUSTERING Reza Bosagh Zadeh (Carnegie Mellon) Shai Ben-David (Waterloo) UAI 09, Montréal, June 2009

  2. TALK OUTLINE • Questions being addressed • Introduce Axioms • Introduce Properties • Uniqueness Theorem for Single-Linkage • Taxonomy of Partitioning Functions

  3. WHAT IS CLUSTERING? • Given a collection of objects (characterized by feature vectors, or just a matrix of pair-wise similarities), detects the presence of distinct groups, and assign objects to groups.

  4. SOME BASIC UNANSWERED QUESTIONS • Are there principles governing all clustering paradigms? • Which clustering paradigm should I use for a given task?

  5. THERE ARE MANY CLUSTERING TASKS “Clustering” is an ill-defined problem • There are many different clustering tasks, leading to different clustering paradigms:

  6. THERE ARE MANY CLUSTERING TASKS “Clustering” is an ill-defined problem • There are many different clustering tasks, leading to different clustering paradigms:

  7. WE WOULD LIKE TO DISCUSS THE BROAD NOTION OF CLUSTERING Independently of any • particular algorithm, • particular objective function, or • particular generative data model

  8. WHAT FOR? • Choosing a suitable algorithm for a given task. • Axioms:to capture intuition about clustering in general. • Expected to be satisfied by all clustering functions • Properties: to capture differences between different clustering paradigms

  9. TIMELINE • Jardine, Sibson 1971 • Considered only hierarchical functions • Kleinberg 2003 • Presented an impossibility result • This paper • Presents a uniqueness result for Single-Linkage

  10. THE BASIC SETTING • For a finite domain set S, a distance function is a symmetric mapping d:SxS → R+ such that d(x,y)=0 iff x=y. • A partitioning function takes a dissimilarity function on S and returns a partition of S. • We wish to define the axioms that distinguishclustering functions, from any other functions that output domain partitions.

  11. KLEINBERG’S AXIOMS • Scale Invariance F(λd)=F(d) for all d and all strictly positive λ. • Richness The range of F(d) over all d is the set of all possible partitionings • Consistency If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, then F(d) = F(d’).

  12. KLEINBERG’S AXIOMS • Scale Invariance F(λd)=F(d) for all d and all strictly positive λ. • Richness The range of F(d) over all d is the set of all possible partitionings • Consistency If d’ equals d except for shrinking distances within clusters of F(d) or stretching between-cluster distances, then F(d) = F(d’). Inconsistent! No algorithm can satisfy all 3 of these.

  13. CONSISTENT AXIOMS: Fix k • Scale Invariance F(λd, k)=F(d, k) for all d and all strictly positive λ. • k-Richness The range of F(d, k) over all d is the set of all possible k-partitionings • Consistency If d’ equals d except for shrinking distances within clusters of F(d, k) or stretching between-cluster distances, then F(d, k)=F(d’, k). Consistent! (And satisfied by Single-Linkage, Min-Sum, …)

  14. CLUSTERING FUNCTIONS • Definition. Call any partitioning function which satisfies a Clustering Function • Scale Invariance • k-Richness • Consistency

  15. TWO CLUSTERING FUNCTIONS Single-Linkage Min-Sum k-Clustering • Start with with all points in their own cluster • While there are more than k clusters • Merge the two most similar clusters Find the k-partitioning Γ which minimizes Not Hierarchical (Is NP-Hard to optimize) Both Functions satisfy: Similarity between two clusters is the similarity of the most similar two points from differing clusters • Scale Invariance • k-Richness • Consistency Hierarchical Proofs in paper.

  16. CLUSTERING FUNCTIONS • Single-Linkage and Min-Sum are both Clustering functions. • How to distinguish between them in an Axiomatic framework? Use Properties • Not all properties are desired in every clustering situation: pick and choose properties for your task

  17. PROPERTIES - ORDER-CONSISTENCY • Order-Consistency • If two datasets d and d’ have the same ordering of the distances, then for all k, • F(d, k)=F(d’, k) • In other words the clustering function only cares about whether a pair of points are closer/further than another pair of points. • Satisfied by Single-Linkage, Max-Linkage, Average-Linkage… • NOT satisfied by most objective functions (Min-Sum, k-means, …)

  18. PATH-DISTANCE In other words, we find the path from x to y, which has the smallest longest jump in it. 1 2 e.g. Pd( , ) = 2 Since the path from above has a jump of distance 2 7 Undrawn edges are large 4 1

  19. PATH-DISTANCE • Imagine each point is an island, and we would like to go from island a to island b. As if we’re trying to cross a river by jumping on rocks. • Being human,we are restricted in how far we can jump from island to island. • Path-Distance would have us find the path with the smallest longest jump, ensuring that we could complete all the jumps successfully.

  20. PROPERTIES – PATH-DISTANCE COHERENCE • Path-Distance Coherence • If two datasets d and d’ have the same induced path distance then for all k, • F(d, k)=F(d’, k)

  21. UNIQUENESS THEOREM • Theorem (This work) • Single-Linkage is the only clustering function satisfying Order-Consistency and Path Distance-Coherence

  22. UNIQUENESS THEOREM • Theorem (This work) • Single-Linkage is the only clustering function satisfying Order-Consistency and Path-Distance-Coherence • Is Path-Distance-Coherence doing all the work? No. • Consistency is necessary for uniqueness • k-Richness is necessary • “X is Necessary”: All other axioms/properties satisfied, just X missing, still not enough to get uniqueness

  23. PRACTICAL CONSIDERATIONS • Single-Linkage is not always the right function to use. • Because Path-Distance-Coherence is not always desirable. • It’s not always immediately obvious when we want a function to focus on the Path Distance • Introduce a different formulation involving Minimum Spanning Trees

  24. PROPERTIES - MST-COHERENCE If F , 2 Then F , 2 , 2 20 20 20 20 • MST-Coherence • If two datasets d and d’ have the same Minimum Spanning Tree then for all k, • F(d, k)=F(d’, k)

  25. A TAXONOMY OF CLUSTERING FUNCTIONS • Min-Sum satisfies neither MST-Coherence nor Order-Consistency • Future work: Characterize other clustering functions

  26. THANKS FOR YOUR ATTENTION!

More Related