Cluster analysis
This presentation is the property of its rightful owner.
Sponsored Links
1 / 39

Cluster analysis PowerPoint PPT Presentation


  • 95 Views
  • Uploaded on
  • Presentation posted in: General

Cluster analysis. Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it. K-means. Criteria. Same criteria with multivariate data:. Justifying the criteria. Anova: decomposition of the variance.

Download Presentation

Cluster analysis

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Cluster analysis

Cluster analysis


Cluster analysis

  • Partition Methods

    Divide data into disjoint clusters

  • Hierarchical Methods

    Build a hierarchy of the observations and deduce the clusters from it.


K means

K-means


Criteria

Criteria


Same criteria with multivariate data

Same criteria with multivariate data:


Justifying the criteria

Justifying the criteria

  • Anova: decomposition of the variance.

    Univariate:

SST=SSW+SSB

Multivariate:

Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).


K means algorithm

K-means algorithm


Number of clusters

Number of clusters


Cluster analysis

Consequences of standardization


Ruspini example

Ruspini example


Problems of k means

Problems of k-means

  • Very sensitive to outliers

  • Euclidean distances not appropriate for eliptical clusters

  • It does not give the number of clusters.


Hierarchical algoritms

Hierarchical Algoritms


Agglomerative algorithms

Agglomerative algorithms


Nearest neighbour distance

Nearest neighbour distance


Farthest neighbour distance

Farthest neighbour distance


Average distance

Average distance


Centroid method distance

Centroid method distance


Ward s method distance

Ward’s method distance


Dendograms

Dendograms


Example

Example


Problems of hierarchical cluster

Problems of hierarchical cluster

  • If n is large, slow. Each time n(n-1)/2 comparisons.

  • Euclidean distances not always appropriate

  • If n is large, dendogram difficult to interpret


Clustering by variables

Clustering by variables


Distances between quantitative variables

Distances between quantitative variables


Distances between qualitative variables

Distances between qualitative variables


Similarity between attributes

Similarity between attributes


  • Login