Classification cluster analysis and related techniques
Download
1 / 20

Classification: Cluster Analysis and Related Techniques - PowerPoint PPT Presentation


  • 387 Views
  • Updated On :

Classification: Cluster Analysis and Related Techniques. Tanya , Caroline , Nick. Introduction to Classification. Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

Related searches for Classification: Cluster Analysis and Related Techniques

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Classification: Cluster Analysis and Related Techniques' - etta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Introduction to classification
Introduction to Classification

  • Search for divisions within data → identify groups of individuals with similar characteristics and cluster them together

  • Help researchers explore data and generate hypotheses like ordination

    • Ordination techniques vs. Classification techniques


Objective
Objective ??

  • What is a cluster?

  • No formal rule exists for identifying clusters→ it is subjective; you make the call


Hierarchical vs non hierarchical
Hierarchical vs. Non-Hierarchical

  • Hierarchical divide data into clusters and looks for relationships between them to create higher order clusters→ create dendrograms

    • Dendrograms subdivide a set of individuals into progressively smaller clusters until a stopping condition is encountered

  • Non-hierarchical divide data into clusters without looking at relationships between clusters



Hierarchical te chnique s
Hierarchical Techniques

  • Monothetic vs. Polythetic

    • Monothetic imposes classifications based on the presence or absence of one attribute at a time

      • Association analysis

    • Polythetic uses all information within data

      • Most common modern approach

      • Cluster analysis

      • TWINSPAN


Cluster analysis
Cluster Analysis

  • Many procedures and algorithms may be used to create a valid dendrogram

  • Similar in technique to Bray-Curtis Ordination

  • Procedure:

    • Square Matrix of Dissimilarities →Find lowest distance in matrix →Identify pair that generated this →Fuse two observations together (First Cluster)





Rules for cluster formation
Rules for cluster formation

  • Single- link clustering (AKA Nearest- neighbor clustering)

    • Clusters are defined by fusing the individual pairs with the smallest distance

    • Chaining- two individuals ending up in the same cluster despite having a big dissimilarity → occurs if linked by closely connected points

    • Constituent clusters may increase in size gradually with each fusion adding one or small number of elements →inconclusive and hard to interpret


Other rules
Other Rules

  • Complete- Link Clustering

    • Allows fusion between members separated by the greatest distance

    • Exact opposite of Single Link

    • May end up separating individuals that are very similar

  • Minimum Variance Clustering (Ward’s technique)

    • Intermediate


Interpretation
Interpretation

  • There are NO objective rules for interpreting dendrograms

  • Use dendrogram for Hypothesis Formation → look for divisions that coincide with existing knowledge about the data → Metadata (Chapter 1)

  • Complementary Analysis


Divisive classification techniques
Divisive Classification Techniques

  • Takes an entire dataset and divides it into categories

  • As always, the boundaries for these categories is subjective

  • On a plus though, this forces us to admit that there is some uncertainty which a software package wouldn’t tell us


Twinspan
TWINSPAN

  • Acronym for Two-way indicator species analysis

  • Polythetic divisive classification technique

  • Output is in two-way tables


Twinspan tables
TWINSPAN Tables

  • There are two ordered lists, one for species and one for observations

  • There are two dendrograms, one to classify species, and one to classify observations

  • Pseudospecies are constructs that convert continuous distributions to a presence/absence (discrete)


Homework
HOMEWORK!!!!!!

1) What is the difference between Hierarchical and Non- Hierarchical classification technique

2) Define Cluster

3) T/F There can be only one valid dendrogram for a single data set? (Correct if False)

**********Bonus**********

What is the background of the powerpoint suppose to represent?


ad