1 / 19

Cluster Analysis

Cluster Analysis. Introduction. Also called classification analysis and numerical taxonomy Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized No (in)dependent variables Find naturally occurring groupings of objects.

ahava
Download Presentation

Cluster Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Cluster Analysis Dr. Michael R. Hyman

  2. Introduction • Also called classification analysis and numerical taxonomy • Goal: assign objects to groups so that intra-group similarity and inter-group dissimilarity as maximized • No (in)dependent variables • Find naturally occurring groupings of objects

  3. Uses in Studying Consumers • Benefit segmentation • Finding market niches • Finding homogeneous market segments for future study • Data reduction

  4. Clusters Formed by Using Data on Two Characteristics

  5. Scatter Plot of Income and Education Data for PC Owners and Non-owners

  6. Procedure #1: Divisive (tear down) • Start with profile data • Find variable with highest variance • Split objects above and below mean on this variable • Find remaining high variance variable and split along mean

  7. Procedure #2: Agglomerative (build up) • Select similarity measure • Distance (Euclidean, city block) • Correlation • Similarity • Search similarity matrix for most similar cluster pair • Repeat iteratively until only one cluster remains

  8. Commonly Used Similarity Coefficients 20

  9. Procedure #2: Agglomerative Stopping Rules • Theory and practice • Distance that clusters combine • Within/between group variance • Relative sizes of clusters

  10. Procedure #2: Agglomerative Linkage Methods • Single (nearest neighbor) • Makes long, thin clusters • Complete (maximum distance to farthest neighbor) • Sensitive to outliers • Average distance between objects • Variance methods (minimum within-cluster variance) • Nodal (begin with two least similar objects as nodes)

  11. Procedure #2: Agglomerative Reliability and Validity Assessment • Use different distance measures • Use different clustering methods • Split data, run both halves, and compare • Shuffle cases (objects) • Solve with subset of profile variables

  12. General Problems • Early assignments treated as permanent • Precludes later revision for improved fit • Number of clusters • More clusters means greater intra-group homogeneity but less descriptive power • No good measure of cluster compactness • Lack of statistical properties makes inference difficult

  13. General Problems (cont.) • Coping with inter-correlated profile variables • Must select profile variables that can discriminate among objects • Sensitive to unit of measurement and outliers • Fix: Standardize data and delete outliers • Subjective interpretation of results (i.e., naming clusters)

  14. Steps for Conducting a Cluster Analysis: A Summary

More Related