1 / 12

Pattern Recognition Chapter 8: Clustering Large Data Sets

Pattern Recognition Chapter 8: Clustering Large Data Sets. First Semester 2013 Department of Computer Science Faculty of Science Chiang Mai University. Learning Objectives. What is clustering? k-Means Algorithm. Clustering. Process of grouping a set of patterns

moya
Download Presentation

Pattern Recognition Chapter 8: Clustering Large Data Sets

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Pattern RecognitionChapter 8: Clustering Large Data Sets First Semester 2013 Department of Computer Science Faculty of Science Chiang Mai University

  2. Learning Objectives • What is clustering? • k-Means Algorithm 204453: Pattern Recognition

  3. Clustering • Process of grouping a set of patterns • Clusters: Partition consisting of cohesive groups from a given collection of patterns • Unsupervised: Unlabelled patterns • Supervised: Labeled patterns • Similar: Patterns in the same cluster • Dissimilar: Patterns in different clusters 204453: Pattern Recognition

  4. The input-output behavior ofa clustering algorithm 204453: Pattern Recognition

  5. The input-output behavior of clustering 204453: Pattern Recognition

  6. Cluster Distance • Intra: Small • Similarity: High • Inter: Large • Similarity: Low 204453: Pattern Recognition

  7. A two-dimensional data set of10 vectors (cont.) 204453: Pattern Recognition

  8. The ijth entry in the matrix isthe distance between Xi and Xj (threshold = 5) 204453: Pattern Recognition

  9. K-Means Algorithm STEP 1: Select k out of the given n patterns as the initial cluster centres. Assign each of the remaining n – k patterns to one of the k clusters; a pattern is assigned to its closest centre/cluster. STEP 2: Compute the cluster centres based on the current assignment of patterns. 204453: Pattern Recognition

  10. K-Means Algorithm (cont.) STEP 3: Assign each of the n patterns to its closest centre/cluster. STEP 4: If there is no change in the assignment of patterns to clusters during two successive iterations, then stop; else, go to step 2. * Selecting the initial clusters us a very important issue. 204453: Pattern Recognition

  11. Optimal partition whenA, D and F are the initial means 204453: Pattern Recognition

  12. Reference • Pattern Recognition: An Algorithmic Approach (Undergraduate Topics in Computer Science), M. Narasimha Murty and V. Susheela Devi, Springer, 2012 204453: Pattern Recognition

More Related