This presentation is the property of its rightful owner.
1 / 39

# Cluster analysis PowerPoint PPT Presentation

Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it. K-means. Criteria. Same criteria with multivariate data:. Justifying the criteria. Anova: decomposition of the variance.

Cluster analysis

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## Cluster analysis

• Partition Methods

Divide data into disjoint clusters

• Hierarchical Methods

Build a hierarchy of the observations and deduce the clusters from it.

### Justifying the criteria

• Anova: decomposition of the variance.

Univariate:

SST=SSW+SSB

Multivariate:

Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).

### Number of clusters

Consequences of standardization

### Problems of k-means

• Very sensitive to outliers

• Euclidean distances not appropriate for eliptical clusters

• It does not give the number of clusters.

### Problems of hierarchical cluster

• If n is large, slow. Each time n(n-1)/2 comparisons.

• Euclidean distances not always appropriate

• If n is large, dendogram difficult to interpret