Classification cluster analysis and related techniques
1 / 20

Classification: Cluster Analysis and Related Techniques - PowerPoint PPT Presentation

  • Updated On :

Classification: Cluster Analysis and Related Techniques. Tanya , Caroline , Nick. Introduction to Classification. Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

Related searches for Classification: Cluster Analysis and Related Techniques

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Classification: Cluster Analysis and Related Techniques' - etta

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introduction to classification
Introduction to Classification

  • Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

  • Help researchers explore data and generate hypotheses like ordination

    • Ordination techniques vs. Classification techniques

Objective ??

  • What is a cluster?

  • No formal rule exists for identifying clusters→ it is subjective; you make the call

Hierarchical vs non hierarchical
Hierarchical vs. Non-Hierarchical

  • Hierarchical divide data into clusters and looks for relationships between them to create higher order clusters→ create dendrograms

    • Dendrograms subdivide a set of individuals into progressively smaller clusters until a stopping condition is encountered

  • Non-hierarchical divide data into clusters without looking at relationships between clusters

Hierarchical te chnique s
Hierarchical Techniques

  • Monothetic vs. Polythetic

    • Monothetic imposes classifications based on the presence or absence of one attribute at a time

      • Association analysis

    • Polythetic uses all information within data

      • Most common modern approach

      • Cluster analysis

      • TWINSPAN

Cluster analysis
Cluster Analysis

  • Many procedures and algorithms may be used to create a valid dendrogram

  • Similar in technique to Bray-Curtis Ordination

  • Procedure:

    • Square Matrix of Dissimilarities →Find lowest distance in matrix →Identify pair that generated this →Fuse two observations together (First Cluster)

Rules for cluster formation
Rules for cluster formation

  • Single- link clustering (AKA Nearest- neighbor clustering)

    • Clusters are defined by fusing the individual pairs with the smallest distance

    • Chaining- two individuals ending up in the same cluster despite having a big dissimilarity → occurs if linked by closely connected points

    • Constituent clusters may increase in size gradually with each fusion adding one or small number of elements →inconclusive and hard to interpret

Other rules
Other Rules

  • Complete- Link Clustering

    • Allows fusion between members separated by the greatest distance

    • Exact opposite of Single Link

    • May end up separating individuals that are very similar

  • Minimum Variance Clustering (Ward’s technique)

    • Intermediate


  • There are NO objective rules for interpreting dendrograms

  • Use dendrogram for Hypothesis Formation → look for divisions that coincide with existing knowledge about the data → Metadata (Chapter 1)

  • Complementary Analysis

Divisive classification techniques
Divisive Classification Techniques

  • Takes an entire dataset and divides it into categories

  • As always, the boundaries for these categories is subjective

  • On a plus though, this forces us to admit that there is some uncertainty which a software package wouldn’t tell us


  • Acronym for Two-way indicator species analysis

  • Polythetic divisive classification technique

  • Output is in two-way tables

Twinspan tables

  • There are two ordered lists, one for species and one for observations

  • There are two dendrograms, one to classify species, and one to classify observations

  • Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)


1) What is the difference between Hierarchical and Non- Hierarchical classification technique

2) Define Cluster

3) T/F There can be only one valid dendrogram for a single data set? (Correct if False)


What is the background of the powerpoint suppose to represent?