Analyzing and Improving Local Search:  k-means and ICP

Analyzing and Improving Local Search: k-means and ICP PowerPoint PPT Presentation


  • 288 Views
  • Uploaded on
  • Presentation posted in: General

What is this talk about?. Two popular but poorly understood algorithmsFast in practiceNobody knows whyFind (highly) sub-optimal solutionsThe big questions:What makes these algorithms fast?How can the solutions they find be improved?. What is k-means?. Divides point-set into k

Download Presentation

Analyzing and Improving Local Search: k-means and ICP

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


1. Analyzing and Improving Local Search: k-means and ICP David Arthur Special University Oral Exam Stanford University

2. What is this talk about? Two popular but poorly understood algorithms Fast in practice Nobody knows why Find (highly) sub-optimal solutions The big questions: What makes these algorithms fast? How can the solutions they find be improved?

3. What is k-means? Divides point-set into k “tightly-packed” clusters

4. What is ICP (Iterative Closest Point)? Finds a subset of A “similar” to B

5. Main outline Focus on k-means in talk Goals: Understand running time Harder than you might think! Worst case = exponential “Smoothed” = polynomial Find better clusterings k-means++ (modification of k-means) Provably “near”-optimal In practice: faster and more accurate than the competition

6. Main outline What is k-means? k-means worst-case complexity k-means smoothed complexity k-means++

7. Main outline What is k-means? What exactly is being solved? Why this algorithm? How does it work? k-means worst-case complexity k-means smoothed complexity k-means++

8. The k-means problem Input: An integer k A set X of n points in Rd Task: Partition the points into k clusters C1, C2,…, Ck Also choose “centers” c1, c2,…, ck for the clusters

9. The k-means problem Input: An integer k A set X of n points in Rd Task: Partition the points into k clusters C1, C2,…, Ck Also choose “centers” c1, c2,…, ck for the clusters Minimize objective function: where c(x) is the center of the cluster containing x Similar to variance

10. The k-means problem Problem is NP-hard Even when k = 2 (Drineas et al., 04) Even when d = 2 (Mahajan et al., 09) Some 1+e approximation algorithms known: Example running times: O(n + kk+2 e –(2d+1)k logk+1(n) logk+1(1/e)) (Har-Peled and Mazumdar, 04) O(2(k/e)^O(1)dn) (Kumar et al., 04) All exponential (or worse) in k

11. The k-means problem An example real-world data set: From the UC-Irvine Machine Learning Repository Looking to detect malevolent network connections: n=494,021 k=100 d=38 1+e approximation algorithms too slow for this!

12. k-means method The fast way: k-means method (Lloyd 82, MacQueen 67) “By far the most popular clustering algorithm used in scientific and industrial applications” (Berkhin, 02)

13. k-means method Start with k arbitrary centers {ci} (In practice: chosen at random from data points)

14. k-means method Start with k arbitrary centers {ci} (In practice: chosen at random from data points)

15. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci

16. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci

17. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci

18. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci

19. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci

20. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

21. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

22. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

23. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

24. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

25. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

26. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

27. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

28. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

29. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

30. k-means method Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable

31. Start with k arbitrary centers {ci} Assign points to clusters Ci based on closest ci Set each ci to be the center of mass of Ci Repeat the last two steps until stable k-means method

32. What is known already Number of iterations: Finite! Each iteration decreases f In practice: Sub-linear (Duda, 00) Worst-case: O(n) (Har-Peled and Sadri, 04) min(O(n3kd), O(kn)) (Inaba et al., 00) Very large gap!

33. Accuracy: Only finds a local optimum Local optimum can be arbitrarily bad i.e., f / fOPT unbounded Even with high probability in 1 dimension Even in “natural” examples: well separated Gaussians What is known already

34. Main outline What is k-means? k-means worst-case complexity* Number of iterations can be: Super-polynomial! k-means smoothed complexity k-means++ * “How slow is the k-means method?” (Arthur and Vassilvitskii, 06)

35. Worst-case overview Recursive construction: Start with input X (goes from A to B in T iterations) Then modify X: Add 1 dimension, O(k) points, O(1) clusters Old part of input still goes from A to B in T iterations New part: Resets everything once

36. Worst-case overview Recursive construction: Repeat m times: O(m2) points O(m) clusters 2m iterations Lower bound follows:

37. Recursive construction (Overview)

38. Recursive construction (Overview)

39. Recursive construction (Trace: t=0)

40. Recursive construction (Trace: t=0...T)

41. Recursive construction (Trace: t=T+1) Assigning points to clusters

42. Recursive construction (Trace: t=T+1) Assigning points to clusters

43. Recursive construction (Trace: t=T+1) Assigning points to clusters

44. Recursive construction (Trace: t=T+1) Assigning points to clusters

45. Recursive construction (Trace: t=T+1) Recomputing centers

46. Recursive construction (Trace: t=T+1) Recomputing centers

47. Recursive construction (Trace: t=T+2) Assigning points to clusters

48. Recursive construction (Trace: t=T+2) Assigning points to clusters

49. Recursive construction (Trace: t=T+2) Assigning points to clusters

50. Recursive construction (Trace: t=T+2) Recomputing centers

51. Recursive construction (Trace: t=T+2) Recomputing centers

52. Recursive construction (Trace: t=T+3) Assigning points to clusters

53. Recursive construction (Trace: t=T+3) Assigning points to clusters

54. Recursive construction (Trace: t=T+3) Assigning points to clusters

55. Recursive construction (Trace: t=T+3) Recomputing centers

56. Recursive construction (Trace: t=T+3) Recomputing centers

57. Recursive construction (Trace: t=T+4) Assigning points to clusters

58. Recursive construction (Trace: t=T+4) Assigning points to clusters

59. Recursive construction (Trace: t=T+4) Assigning points to clusters

60. Recursive construction (Trace: t=T+4) Recomputing centers

61. Recursive construction (Trace: t=T+4) Recomputing centers

62. Worst-case complexity summary k-means can require iterations For random centers: even with high probability Even when d=2 (Vattani, 09) (d=1 is open)

63. Worst-case complexity summary k-means can require iterations For random centers: even with high probability Even when d=2 (Vattani, 09) (d=1 is open) ICP: Can require ?(n/d)d iterations Similar (but easier) argument

64. Main outline What is k-means? k-means worst-case complexity k-means smoothed complexity* Take an arbitrary input, but randomly perturb it Expected number of iterations is polynomial Works for any k, d k-means++ * “k-means has smoothed polynomial complexity” (Arthur, Manthey, and Röglin, 09)

65. The problem with worst-case complexity What is the problem? k-means has bad worst-case complexity But is not actually slow in practice Need a different model to understand real world Simple explanations: Average case: Real-world data is not random

66. The problem with worst-case complexity A better explanation: Smoothed analysis (Spielman and Teng, 01) Between average case and worst case Perturb each point by normal distribution, variance s2 Show expected running time is poly in n, D/s D = diameter of point-set

67. Proof overview Recall the potential function: X is set of all data points c(x) is corresponding cluster center

68. Proof overview Recall the potential function: X is set of all data points c(x) is corresponding cluster center Bound f: f = nD2 initially Will prove f very likely to drop e2 each iteration

69. Proof overview Recall the potential function: X is set of all data points c(x) is corresponding cluster center Bound f: f = nD2 initially Will prove f very likely to drop e2 each iteration Gives: # of iterations is at most n(D/e)2

70. The easy approach Do union bound over all possible k-means steps What defines a step? Original clustering A (= kn choices) Actually = n3kd choices (Inaba et al., 00) Resulting clustering B Total number of possible steps = n6kd Probability a fixed step can be bad: Bounded by probability that A and B have near-identical f Probability = (e/s)d

71. The easy approach The argument: = P[k-means takes more than n(D/e)2 iterations] = P[There exists a possible bad step] = (# of possible steps) · P[step is bad] = n6kd · (e/s)d = small... if e < s · (1/n)O(k) Resulting bound: nO(k) iterations Not polynomial! (Arthur and Vassilitskii, 06), (Manthey and Röglin, 09)

72. How can this be improved? Union bound is wasteful! These two k-means steps can be analyzed together:

73. How can this be improved? Union bound is wasteful! These two k-means steps can be analyzed together:

74. How can this be improved? Union bound is wasteful! These two k-means steps can be analyzed together:

75. How can this be improved? Union bound is wasteful! These two k-means steps can be analyzed together:

76. How can this be improved? A transition blueprint: Which points switched clusters Approximate positions for relevant centers Bonus: Most approximate centers determined by above! Not obvious facts: m = number of points switching clusters # transition blueprints = (nk2)m · (D/e)O(m) P[blueprint is bad] = (e/s)m (for most blueprints)

77. A good approach The new argument: = P[k-means takes more than n(D/e)2 iterations] = P[There exists a possible bad blueprint] = (# of possible blueprints) · P[blueprint is bad] = (nk2)m · (D/e)O(m) · (e/s)m = small... if e < s · (s/nD)O(1) Resulting bound: polynomial iterations!

78. Smoothed complexity summary Smoothed complexity is polynomial Still have work to do: O(n26) Getting tight exponents in smoothed analysis is hard Original theorem for Simplex: O(n96)! ICP: Also polynomial smoothed complexity Much easier argument!

79. Main outline What is k-means? k-means worst-case complexity k-means smoothed complexity k-means++* What’s wrong with k-means? What’s k-means++? O(log k)-competitive with OPT Experimental results * “k-means++: The advantages of careful seeding” (Arthur and Vassilvitskii, 07)

80. What’s wrong with k-means? Recall: Only finds a local optimum Local optimum can be arbitrarily bad i.e., f / fOPT unbounded Even with high probability in 1 dimension Even in “natural” examples: well separated Gaussians

81. What’s wrong with k-means? k-means locally optimizes a clustering But can miss the big picture!

82. What’s wrong with k-means? k-means locally optimizes a clustering But can miss the big picture!

83. What’s wrong with k-means? k-means locally optimizes a clustering But can miss the big picture!

84. The solution Easy way to fix this mistake: Make centers far apart

85. k-means++ The “right” way of choosing initial centers Choose first center uniformly at random

86. k-means++ The “right” way of choosing initial centers Choose first center uniformly at random Repeat until k centers: Add a new center Choose x0 with probability D(x) = distance between x and closest center

87. k-means++ Example:

88. k-means++ Example:

89. k-means++ Example:

90. k-means++ Example:

91. k-means++ Example:

92. Theoretical guarantee Claim: E[f] = O(log k) · fOPT Guarantee holds as soon as centers picked But k-means steps good in practice

93. Proof idea Let C1, C2, ..., Ck be OPT clusters Points in Ci contribute: fOPT(Ci) to OPT potential fkm++(Ci) to k-means++ potential Lemma: If we pick a center from Ci: E[fkm++(Ci)] = 8fOPT(Ci) Proof: Linear algebra magic True for any reasonable probability distribution

94. Proof idea Real danger is we waste center on already covered Ci Probability of choosing covered cluster: fcurrent(covered clusters) / fcurrent = 8fOPT / fcurrent Cost of choosing one: If t uncovered clusters left: 1/t fraction of fcurrent now unfixable Cost = fcurrent / t Expected cost: (fOPT / fcurrent ) · (fcurrent / t) = fOPT / t

95. Proof idea Cost over all k steps: fOPT · (1/1 + 1/2 + ... + 1/k) = fOPT · (log k)

96. k-means++ accuracy improvement

97. * Values above 1 indicate k-means++ is out-performing k-means k-means++ vs k-means

98. Other algorithms? Theory community has proposed other reasonable algorithms too Iterative swapping best in practice (Kanungo et al., 04) Theoretically O(1)-approximation Implementation gives this up to be viable in practice Actual guarantees: None

99. k-means++ accuracy improvement vs Iterative Swapping (Kanungo et al., 04)

100. * Values above 1 indicate k-means++ is out-performing Iterative Swapping k-means++ vs Iterative Swapping

101. k-means++ summary Friends don’t let friends use vanilla k-means! k-means++ has provable accuracy guarantee O(log k)-competitive with OPT k-means++ is faster on average k-means++ gets better clusterings (almost) always

102. Main outline Goals: Understand number of iterations Harder than you might think! Worst case = exponential Smoothed = polynomial Find better clusterings k-means++ (modification of k-means) Provably “near”-optimal In practice: faster and better than the competition

103. Special thanks! My advisor: Rajeev Motwani My defense committee: Ashish Goel Vladlen Koltun Serge Plotkin Tim Roughgarden

104. Special thanks! My co-authors: Bodo Manthey Rina Panigrahy Heiko Röglin Aneesh Sharma Sergei Vassilvitskii Ying Xu

105. Special thanks! My other fellow students: Gagan Aggarwal Brian Babcock Bahman Bahmani Krishnaram Kenthapadi Aleksandra Korolova Shubha Nabar Dilys Thomas

106. Special thanks! And my listeners!

  • Login