Modified global k-means algorithm for minimum sum-of-squares clustering problems

Download Presentation

Modified global k-means algorithm for minimum sum-of-squares clustering problems

Loading in 2 Seconds...

- 134 Views
- Uploaded on
- Presentation posted in: General

Modified global k-means algorithm for minimum sum-of-squares clustering problems

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Modified global k-means algorithm forminimum sum-of-squares clustering problems

Presenter : Lin, Shu-Han

- Authors : Adil M. Bagirov

Pattern Recognition (PR, 2008)

- Motivation
- Objective
- Methodology
- Experiments
- Conclusion
- Comments

- k-Meansalgorithm
- sensitive to the choice of starting points
- inefficient for solving clustering problems in large data sets

- Global k-Means (GKM) algorithm
- incremental algorithm (dynamically adds a cluster center at a time)
- uses each data point as a candidate for the k-th cluster center

Propose a new version of GKM

sensitive to the choice of a starting point

5

Objectivefunction

6

- Oldversion
- Reformulatedversion

7

- Oldversion
- Proposedversion(auxiliaryclusterfunction)

8

- Proposedversion

9

10

MSk-means:Multi-startk-means

GKM:fastGlobalK-Means

MGKM:ModifiedGlobalK-Means

11

12

- Overall(14datasets,140results)
- The MS k-meansalgorithm finds the best known (or near best known) solutions42 (33.3%) times
- GKMalgorithm 76 (60.3%) times
- MGKMalgorithm 102 (81.0%) times

- Largekinlargedatasets(m)
- The MS k-means algorithmfailedto find the best known (or near best known) solutions
- GKM algorithmfinds such solutions 22 (45.8%) times
- MGKM algorithm42(87.5%) times.

13

- AnewversionoftheGKM
- Changethecomputationofstartingpoints
- Byminimizetheauxiliaryclusterfunction
- Giventolerance
- IsmoreeffectivethanGKM
- largedatasetespecially

- Thechoiceofstartingpointsink-meansiscrucial

- Advantage
- Theoreticallyanalysis

- Drawback
- Describewhytheythinktomodifyanythingtheytendtomodifyisimportant,orneedto.

- Application
- GKMoutperformsk-meansalgorithm