Cluster analysis

• Partition Methods

Divide data into disjoint clusters

• Hierarchical Methods

Build a hierarchy of the observations and deduce the clusters from it.

### Justifying the criteria

• Anova: decomposition of the variance.

Univariate:

SST=SSW+SSB

Multivariate:

Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).

### Number of clusters

Consequences of standardization

### Problems of k-means

• Very sensitive to outliers

• Euclidean distances not appropriate for eliptical clusters

• It does not give the number of clusters.

### Problems of hierarchical cluster

• If n is large, slow. Each time n(n-1)/2 comparisons.

• Euclidean distances not always appropriate

• If n is large, dendogram difficult to interpret