1 / 16

K-means algorithm

K-means algorithm. Jelena Vukovic 53/07 jeca.zr@gmail.com. Introduction . Basic idea of k-means algorithm Detailed explenation Most common problems of the algorithm Applications Possible improvements. Bassic principles of algorithm. Given the set of points (x 1 , x 2 , … , x n )

nyx
Download Presentation

K-means algorithm

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. K-means algorithm JelenaVukovic 53/07 jeca.zr@gmail.com

  2. Introduction • Basic idea of k-means algorithm • Detailed explenation • Most common problems of the algorithm • Applications • Possible improvements Elektrotehnički fakultet u Beogradu

  3. Bassic principles of algorithm • Given the set of points (x1, x2, … , xn) • Partition n points into k sets (n>k) (S1, S2, … , Sk) • The goal is to minimize within-cluster sum of squares • µi is the mean of points in Si Elektrotehnički fakultet u Beogradu

  4. The algorithm • Initialize the numberof means (k) • Iterate: • Assign each point to the nearest mean • Move mean tocenter of its cluster Elektrotehnički fakultet u Beogradu

  5. The algorithm Move means Assign points to nearest mean Elektrotehnički fakultet u Beogradu

  6. The algorithm • The complexity is O(n * k * I * d) • n – number of points • k – number of clusters • I – number of iterations • d – number of attributes Re-assign points Elektrotehnički fakultet u Beogradu

  7. The algorithm Elektrotehnički fakultet u Beogradu

  8. K nearest neighbors • Very similar algorithm • The decision is made based on thesimple majority of the closest k neighbors • In k-means the Euclidian distant measure is used Elektrotehnički fakultet u Beogradu

  9. Some limitations of algorithm • The number of clusters needs to be known in advance • Initialization of means position • Problems appear when clusters have different • Shapes • Sizes • Density Elektrotehnički fakultet u Beogradu

  10. Initial centroids problem • Random distribution (the most common) • Multiple runs • Testing on a data sample • Analyze the data Elektrotehnički fakultet u Beogradu

  11. Different density Original points 3 Clusters Elektrotehnički fakultet u Beogradu

  12. Non-globular shapes Original points 2 Clusters Elektrotehnički fakultet u Beogradu

  13. Pros and cons Pros Cons K needs to be known Ellipsoid shape is assumed Requires some knowledge about data in advance Possibility of many loop turns, without significant changes in clusters • Simple to implement • Fast • Not highly demanding Elektrotehnički fakultet u Beogradu

  14. Applications of the algorithm • Many different uses • Computer vision • Market segmentation • Geostatic • Astronomy • etc Elektrotehnički fakultet u Beogradu

  15. Improvements • Pre-processing of the data in order to better estimate k • Run multiple iteration in parallel with different centroid initialization • Ignore possible errors to avoid non-standard cluster shapes Elektrotehnički fakultet u Beogradu

  16. Thank you! Elektrotehnički fakultet u Beogradu

More Related