1 / 20

# Classification: Cluster Analysis and Related Techniques - PowerPoint PPT Presentation

Classification: Cluster Analysis and Related Techniques. Tanya , Caroline , Nick. Introduction to Classification. Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

Related searches for Classification: Cluster Analysis and Related Techniques

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Classification: Cluster Analysis and Related Techniques' - etta

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Classification: Cluster Analysis and Related Techniques

Tanya,Caroline,Nick

• Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

• Help researchers explore data and generate hypotheses like ordination

• Ordination techniques vs. Classification techniques

• What is a cluster?

• No formal rule exists for identifying clusters→ it is subjective; you make the call

• Hierarchical divide data into clusters and looks for relationships between them to create higher order clusters→ create dendrograms

• Dendrograms subdivide a set of individuals into progressively smaller clusters until a stopping condition is encountered

• Non-hierarchical divide data into clusters without looking at relationships between clusters

Hierarchical Techniques

• Monothetic vs. Polythetic

• Monothetic imposes classifications based on the presence or absence of one attribute at a time

• Association analysis

• Polythetic uses all information within data

• Most common modern approach

• Cluster analysis

• TWINSPAN

• Many procedures and algorithms may be used to create a valid dendrogram

• Similar in technique to Bray-Curtis Ordination

• Procedure:

• Square Matrix of Dissimilarities →Find lowest distance in matrix →Identify pair that generated this →Fuse two observations together (First Cluster)

• Single- link clustering (AKA Nearest- neighbor clustering)

• Clusters are defined by fusing the individual pairs with the smallest distance

• Chaining- two individuals ending up in the same cluster despite having a big dissimilarity → occurs if linked by closely connected points

• Constituent clusters may increase in size gradually with each fusion adding one or small number of elements →inconclusive and hard to interpret

• Allows fusion between members separated by the greatest distance

• Exact opposite of Single Link

• May end up separating individuals that are very similar

• Minimum Variance Clustering (Ward’s technique)

• Intermediate

• There are NO objective rules for interpreting dendrograms

• Use dendrogram for Hypothesis Formation → look for divisions that coincide with existing knowledge about the data → Metadata (Chapter 1)

• Complementary Analysis

• Takes an entire dataset and divides it into categories

• As always, the boundaries for these categories is subjective

• On a plus though, this forces us to admit that there is some uncertainty which a software package wouldn’t tell us

• Acronym for Two-way indicator species analysis

• Polythetic divisive classification technique

• Output is in two-way tables

• There are two ordered lists, one for species and one for observations

• There are two dendrograms, one to classify species, and one to classify observations

• Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)

1) What is the difference between Hierarchical and Non- Hierarchical classification technique

2) Define Cluster

3) T/F There can be only one valid dendrogram for a single data set? (Correct if False)

**********Bonus**********

What is the background of the powerpoint suppose to represent?