1 / 50

Genetic Algorithms for clustering problem

Pasi Fränti. Genetic Algorithms for clustering problem. 7.4.2016. General structure. Genetic Algorithm : Generate S initial solutions REPEAT Z iterations Select best solutions Create new solutions by crossover Mutate solutions END-REPEAT. Main principle. Components of GA.

gtheriot
Download Presentation

Genetic Algorithms for clustering problem

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pasi Fränti Genetic Algorithmsfor clustering problem 7.4.2016

  2. General structure Genetic Algorithm: Generate S initial solutions REPEAT Z iterations Select best solutions Create new solutions by crossover Mutate solutions END-REPEAT

  3. Main principle

  4. Components of GA • Representation of solution • Selection method • Crossover method • Mutation Most critical !

  5. Representation

  6. Representation of solution • Partition (P): • Optimal centroid can be calculated from P • Only local changes can be made • Codebook (C): • Optimal partition can be calculated from C • Calculation of P takes O(NM)  slow • Combined (C, P): • Both data structures are needed anyway • Computationally more efficient

  7. Selection method • To select which solutions will be used in crossover for generating new solutions • Main principle: good solutions should be used rather than weak solutions • Two main strategies: • Roulette wheel selection • Elitist selection • Exact implementation not so important

  8. Roulette wheel selection • Select two candidate solutions for the crossover randomly. • Probability for a solution to be selected is weighted according to its distortion:

  9. Elitist selection • Main principle: select all possible pairs among the best candidates. Elitist approach using zigzag scanning among the best solutions

  10. Crossover

  11. Crossover methods Different variants for crossover: • Random crossover • Centroid distance • Pairwise crossover • Largest partitions • PNN Local fine-tuning: • All methods give new allocation of the centroids. • Local fine-tuning must be made by K-means. • Two iterations of K-means is enough.

  12. Random crossover Select M/2 centroids randomly from the two parent. Solution 1 Solution 2 +

  13. c4 c4 c3 c2 c2 c3 c1 c1 2 4 5 1 8 Explanation Data point Centroid M – number of clusters Parent solution A Parent solution B New Solution: How to create a new solution? Picking M/2 randomly chosen cluster centroids from each of the two parents in turn. How many solutions are there? 36 possibilities how to create a new solution. Probability to select a good one? Not high, some are good but K-Means is needed, most are bad. See statistics. M = 4 Some possibilities: Rough statistics: Optimal: 1 Good: 7 Bad: 28

  14. c4 c4 c1 c2 c3 c1 c2 c3 2 4 5 1 8 c1 c1 c1 c4 c4 c4 c2 c3 c2 c2 c3 c3 Parent solution A Parent solution B Childsolution(bad) Childsolution(good) Childsolution(optimal)

  15. Centroid distance crossover [Pan, McInnes, Jack, 1995: Electronics Letters ] [Scheunders, 1997: Pattern Recognition Letters ] • For each centroid, calculate its distance to the center point of the entire data set. • Sort the centroids according to the distance. • Divide into two sets: central vectors (M/2 closest) and distant vectors (M/2 furthest). • Take central vectors from one codebook and distant vectors from the other.

  16. c4 c4 c2 c2 1) Distances d(ci, Ced): A:d(c4, Ced) < d(c2, Ced)< d(c1, Ced) < d(c3, Ced) B:d(c1, Ced) < d(c3, Ced)< d(c2, Ced) < d(c4, Ced) 2) Sort centroids according to the distance: A:c4,c2,c1, c3, B:c1, c3, c2, c4 3) Divide into two sets (M = 4): A:central vectors: c4, c2, distant vectors:c1, c3B:central vectors:c1, c3, distant vectors:c2, c4 Explanation c1 Data point c3 Centroid Centroid of entire dataset M – number of clusters c1 c3 Parent solution A Parent solution B c4 6 6 c4 5 5 Ced c1 Ced c3 c2 c1 c3 1 c2 1 2 4 5 1 8 2 4 5 1 8 New solution: Variant (a) Take cental vectors from parent solution A and distant vectors from parent solution B OR Variant (b) Take distant vectors from parent solution A andcentral vectors from parent solution B

  17. c4 c4 c2 c2 Explanation c1 Data point c3 Centroid Centroid of entire dataset M – number of clusters c1 c3 Child - variant (a) Child – variant (b) c4 6 6 5 5 c3 Ced c3 Ced c4 c2 c2 c1 c1 1 1 2 4 5 1 8 2 4 5 1 8 New solution: Variant (a) Take cental vectors from parent solution A and distant vectors from parent solution B OR Variant (b) Take distant vectors from parent solution A andcentral vectors from parent solution B

  18. Pairwise crossover[Fränti et al, 1997: Computer Journal] Greedy approach: • For each centroid, find its nearest centroid in the other parent solution that is not yet used. • Among all pairs, select one of the two randomly. Small improvement: • No reason to consider the parents as separate solutions. • Take union of all centroids. • Make the pairing independent of parent.

  19. Pairwise crossover example Initial parent solutions MSE=11.92109 MSE=8.79109

  20. Pairwise crossover example Pairing between parent solutions MSE=7.34109

  21. Pairwise crossover example Pairing without restrictions MSE=4.76109

  22. Largest partitions[Fränti et al, 1997: Computer Journal] Crossover algorithm: • Each cluster in the solutions A and B is assigned with a number, cluster size S, indicating how many data objects belong to it. • In each phase we pick the centroid of the largest cluster. • Assume that cluster i was chosen from A. The cluster centroid Ci is removed from A to avoid its reselection. • For the same reason we update the cluster sizes of B by removing the effect of those data objects in B that were assigned to the chosen cluster i in A.

  23. Explanation Data point Centroid Largest partitions[Fränti et al, 1997: Computer Journal] Parent solution A Parent solution B S=50 S=50 S=30 S=30 S=100 S=100 c1 S=20 S=20

  24. PNN crossover for GA[Fränti et al, 1997: The Computer Journal] Initial 1 Initial 2 Union Combined After PNN PNN

  25. The PNN crossover method (1)[Fränti, 2000: Pattern Recognition Letters]

  26. The PNN crossover method (2)

  27. Importance of K-means(Random crossover) Bridge Worst Best

  28. Effect of crossover method(with k-means iterations) Bridge

  29. Effect of crossover method(with k-means iterations) Binary dataBridge2

  30. Mutations

  31. Mutations • Purpose is to implement small random changes to the solutions. • Happens with a small probability. • Sensible approach: change the location of one centroid by the random swap! • Role of mutations is to simulate local search. • If mutations are needed  crossover method is not very good.

  32. Effect of k-means and mutations K-means improves but less vital Mutations alone better than random crossover!

  33. GAIS – Going extreme

  34. Agglomerative clustering PNN: Pairwise Nearest Neigbor method • Merges two clusters • Preserves hierarchy of clusters IS: Iterative shrinking method • Removes one cluster • Repartition data vectors in removed cluster

  35. Iterative shrinking

  36. Pseudo code

  37. Local optimization of IS Finding secondary cluster: Removal cost of single vector:

  38. Example (1)

  39. Example (2)

  40. Pseudo code of GAIS[Virmajoki & Fränti, 2006: Pattern Recognition]

  41. PNN vs. IS crossovers Further improvement of about 1%

  42. Optimized GAIS variants GAIS short (optimized for speed): • Create new generations only as long as the best solution keeps improving (T=*). • Use a small population size (Z=10) • Apply two iterations of k‑means (G=2). GAIS long (optimized for quality): • Create a large number of generations (T=100) • Large population size (Z=100) • Iterate k‑means relatively long (G=10).

  43. Comparison with image data Popular Simplest of the good ones Previous GA BEST!

  44. What does it cost? Bridge Random: ~0 s K-means: 8 s SOM: 6 minutes GA-PNN: 13 minutes GAIS – short: ~1 hour GAIS – long: ~3 days

  45. Comparison of algorithms

  46. Variation of the result

  47. Time vs. quality comparisonBridge

  48. Conclusions • Best clustering obtained by GA • Crossover method most important • Mutations not needed

  49. References • P. Fränti and O. Virmajoki, "Iterative shrinking method for clustering problems", Pattern Recognition, 39 (5), 761-765, May 2006. • P. Fränti, "Genetic algorithm with deterministic crossover for vector quantization", Pattern Recognition Letters, 21 (1), 61-68, January 2000. • P. Fränti, J. Kivijärvi, T. Kaukoranta and O. Nevalainen, "Genetic algorithms for large scale clustering problems", The Computer Journal, 40 (9), 547-554, 1997. • J. Kivijärvi, P. Fränti and O. Nevalainen, "Self-adaptive genetic algorithm for clustering", Journal of Heuristics, 9 (2), 113-129, 2003. • J.S. Pan, F.R. McInnes and M.A. Jack, VQ codebook design using genetic algorithms. Electronics Letters,31, 1418-1419, August 1995. • P. Scheunders, A genetic Lloyd-Max quantization algorithm. Pattern Recognition Letters,17, 547-556, 1996.

  50. Working space Text box

More Related