1 / 9

Data Clustering Michael J. Watts mike.watts.nz

Data Clustering Michael J. Watts http://mike.watts.net.nz. Lecture Outline. Data clustering Distance measures K-Means clustering Hierarchical clustering. Data Clustering. Grouping similar items together Assigns items into subsets Usually single subsets Usually uses distance measures

cbranton
Download Presentation

Data Clustering Michael J. Watts mike.watts.nz

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Data Clustering Michael J. Watts http://mike.watts.net.nz

  2. Lecture Outline • Data clustering • Distance measures • K-Means clustering • Hierarchical clustering

  3. Data Clustering • Grouping similar items together • Assigns items into subsets • Usually single subsets • Usually uses distance measures • Vector quantisation • Data reduction

  4. Distance Measures • Many distance measure exist • Euclidean • Manhattan • Mahalanobis • Each useful in different ways

  5. k-Means Clustering • Simple clustering technique • Widely used • Requires a priori selection of number of clusters • Iterative, can be slow • Depends on selection of clusters / seeds

  6. k-Means Clustering • 1) select k seed examples as initial centres • 2) calculate the distance from each centre to each example • 3) assign each example to the nearest cluster • 4) calculate new centres • 5) repeat steps 2-4 until a stopping condition is reached

  7. Hierarchical Clustering • Also called agglomerative hierarchical clustering • Constructs a tree structure • Dendogram • Close examples are lower down the tree • Each cluster is treated as a single entity • All clusters are grouped into a super-cluster • Widely used in bioinformatics

  8. Hierarchical Clustering • 1) Calculate distance between each pair of vectors • 2) Merge closest two vectors / entities • New entity has average of parents 3) Re-calculate the distances 4) Repeat 1-3 until all vectors / entities are merged

  9. Summary • Clustering groups together similar items • Similarity is measured by distance • Many distance measures exist • k-Means groups examples into individual clusters • Hierarchical groups examples into ever-smaller groups

More Related