1 / 23

An Impossibility Theorem for Clustering

An Impossibility Theorem for Clustering . By Jon Kleinberg. Definitions. Clustering function: operates on a set S of more than 2 points and the distances among them where is a partition of S Distance function: the distance is 0 only for d(i,i)

niveditha
Download Presentation

An Impossibility Theorem for Clustering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An Impossibility Theorem for Clustering By Jon Kleinberg

  2. Definitions • Clustering function: operates on a set S of more than 2 points and the distances among them where is a partition of S • Distance function: the distance is 0 only for d(i,i) • Does not require the triangle inequality.

  3. Many different clustering criteria • k-center • k-median • k-means • Inter-Intra • etc

  4. k-Center Minimize maximum distance

  5. k-median Minimize average distance k-means: minimize distance squared

  6. Inter-Intra T(C) D(C) Maximize D(C) – T(C)

  7. Motivation • Each criterion optimizes different features • Is there one clustering criterion with phenomenal cosmic powers?

  8. Method • Give three intuitive axioms that any criterion should satisfy • Surprise: Not possible to satisfy all three • Reminiscent of Arrow’s Impossibility theorem: ranking is impossible

  9. Axiom 1 – Scale-Invariance • For any distance function d and any β >0 we have that f(S,d)=f(S,βd)

  10. Axiom 2 - Richness • Range(f) is equal to all partitions of S • i.e. All possible clusterings can be generated given the right distances

  11. d(i,j) d’(i,j) d(i,j) d’(i,j) Axiom 3 - Consistency • Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=

  12. Definition • Anti-chain: A collection of partitions is an anti-chain if it does not contain two distinct partitions such that one is a refinement of the other • Anti-Chains can not satisfy Richness

  13. Main Result • For each , there is no clustering function f that satisfies Scale-Invariance, Richness and Consistency • Implied by proof that if f satisfies Scale-Invariance and Consistency, then Range(f) is an anti-chain

  14. Reminder of Axioms • Scale-Invariance: For any distance function d and any β >0 we have that f(d)=f(β d) • Richness: Range(f) is equal to all partitions of S • Consistency: Let d and d’ be two distance functions. If f(d) = and d’ is such that the distance between all points in a cluster is less than in d and the distance between inter-cluster points is larger than in d then f(d’)=

  15. Single Linkage • Cluster by combining the closest points 0 1 4 9 10 12 15 19 20

  16. Any two axioms • For every pair of axioms, there is a stopping condition for single linkage • Consistency + Richness: only link if distance is less than r • Consistency + SI: stop when you have k connected components • Richness + SI: if x is the diameter of the graph, only add edges with weight βx

  17. Centroid-Based Clustering • (k,g)-centroid clustering function: Choose T, a set of k centroid points such that is minimized • If g is identity, we get k-median, etc. • Result: For every and every function g and n significantly larger than k the (k,g)-centroid clustering function does not satisfy consistency.

  18. Proof: A contradiction X (size m) Y (size λm) r+δ ε r

  19. A new distance function X0 (size m/2) Y (size λm) r’ < r r’ r+δ ε r r+δ r’ X1 (size m/2)

  20. Wrapping Up • If we pick λ, r, r’, ε and δ right then we can have: • But then our new centers are in X0 and X1 • But our new distance followed consistency, so it should give us X and Y. • This covers the case where k is 2.

  21. Discussion: Relaxing Axioms • Refinement-consistency: if d’ is an f(d)-transformation of d, then f(d’) is a refinement of f(d) • Near-Richness: all partitions except the trivial one can be obtained • These together allow a function that satisfies these replacements. • What other relaxations could we have?

  22. Discussion • Does this mean there is a law of continuous employment for clustering criterion creators? • Is the clustering function properly defined? • Allow overlaps • Allow outliers • Are these the right axioms? • All partitions possible vs. power set • Axioms for graph clustering?

  23. Questions?

More Related