Prepared by: Mahmoud Rafeek Al-Farra

College of Science & Technology Dep. Of Computer Science & IT BCs of Information Technology Data Mining Chapter 6: Clustering Methods Prepared by: Mahmoud Rafeek Al-Farra 2013 www.cst.ps/staff/mfarra

Course’s Out Lines • Introduction • Data Preparation and Preprocessing • Data Representation • Classification Methods • Evaluation • Clustering Methods • Mid Exam • Association Rules • Knowledge Representation • Special Case study : Document clustering • Discussion of Case studies by students

Out Lines • Definition of Clustering • Why clustering? • Where to use clustering? • Next: Types of Data in Cluster Analysis • Next: A Categorization of Major Clustering Methods

Definition of Clustering • Clustering can be considered the most important unsupervised learning technique; so, as every other problem of this kind, it deals with finding a structure in a collection of unlabeled data. • Clustering is “the process of organizing objects into groups whose members are similar in some way”. • A cluster is therefore a collection of objects which are “similar” between them and are “dissimilar” to the objects belonging to other clusters.

Definition of Clustering • Cluster: a collection of data objects • Similar to one another within the same cluster • Dissimilar to the objects in other clusters • Cluster analysis • Grouping a set of data objects into clusters • Clustering is unsupervised classification: no predefined classes

Learning

Why clustering? • Simplifications • Pattern detection • Useful in data concept construction • Unsupervised learning process

Where to use clustering? • Data mining • Information retrieval • text mining • Web analysis • marketing • medical diagnostic

Which method should I use? • Type of attributes in data • Scalability to larger dataset • Ability to work with irregular data • Time cost • complexity • Data order dependency • Result presentation

Thanks

Prepared by: Mahmoud Rafeek Al-Farra