Multivariate analysis in community ecology Gerry Quinn Deakin University
Data sets in community ecology • Multivariate abundance data • Sampling or experimental units • plots, cores, panels, quadrats …… • usually in hierarchical spatial or temporal structure • Abundances recorded for multiple taxa in each unit • simple counts, densities, % cover, presence-absence …… • Environmental variables recorded in each unit • pH, salinity, temperature, nutrients, sediment load, elevation …..
Typical aims • Examine spatial and temporal patterns in species composition • assemblage/community “structure”, more than simply biodiversity (e.g. taxon richness/diversity) • test formal hypotheses about spatial and temporal differences in composition • Relate patterns to unit (or higher) level environmental predictors • typical linear model type question • Determine which taxa are most important in “driving” the patterns • which taxa most typify differences across spatial and temporal hierarchies
Why multivariate? • Individual taxa of main interest • concern over multiple univariate hypothesis testing (Type 1 error rates) • referees and editors won’t accept paper with 50-100 ANOVAs • Community (assemblage) structure interest • recognition of limitations of univariate biodiversity (richness, diversity, evenness) measures • hypotheses about community/assemblage composition • Most multivariate analyses in community ecology also incorporate univariate (individual taxa or environmental predictors) models
Forest bird communities • Does bird community composition vary between forest types? • 5 types (box-ironbark, river redgum, Gippsland manna gum etc.) plus mixed • Maximum bird abundance (across 4 seasons) • 102 species across 37 sites • Mac Nally (1989) beechworthonline.com.au Swift parrot - Wikipedia
Estuary nematode communities • Does nematode community composition vary between sites and with environmental variables? • Nematode abundance (6 seasonal “replicates) • 182 species across 19 “sites” • Environmental variables • 6 (sediment particle size, % organic matter etc.) at each site • Clarke & Warwick (1993) Exe estuary - Wikipedia Marine nematodes http://www.ipm.iastate.edu
Impact assessment • Does sessile marine animal community composition vary between sewage impact and control sites? • 3 control and 1 impact locations • 4 randomly chosen times • replicate sites and photographic quadrats at each location • Percent cover of 58 taxa • Classical “beyond” BACI design • split-plot type linear model • Terlizziet al (2005) http://www.conisma.it/total/t_aim.html
Three broad approaches • Eigenanalyses • distance measure implied • Distance-based analyses • distance measure explicit and user-selected • Multi-species linear models • combine taxon-specific univariate (linear) models • no distance measure required
Eigenanalysis methods • Principal components analysis (PCA) • implied Euclidean distance • Correspondence analysis (CA) • implied chi-square distance • Canonical correspondence analysis (CCA/CANOCO) • constrains ordination based on linear modelling with environmental variables • Strengths • biplots of sample and species ordinations • CCA provides measures of fit with covarying environmental variables CajoterBraak
Rodents in habitat fragments etc. Bolger et al (1997)
Rodent data – CA biplot Axis 2 Acuna El mac Rr 54th Street Baja Zena 32nd Street Sth Oakcrest Axis 1 Florida Mm 7 fragments
Rodent data – CCA triplot Axis 2 Mc Pe Sandmark 34th Street Laurel Balboa Mm Nl Area Spruce Dist Age Axis 1 El mac Acuna 54th Street Edison Montanosa Rr
Issues • Both methods “compress” distances at ends of axes (so-called arch or horseshoe effect) • detrendedCA brute force “fix” for this effect • CA and CCA implicitly up-weight rarer taxa by use of chi-square distance • No choice of distance measure Comp 2 Comp 1 PCA bird community data
Distance-based methods • Include principal coordinates analysis (PCoA), multidimensional scaling (MDS), generalised dissimilarity modelling (GDS) • Hypothesis testing • compare groups using multi-response permutation procedure (MRPP), analysis of similarities (ANOSIM), permutational multivariate ANOVA (PERMANOVA) • relate to environmental variables with Mantel test, BIO-ENV John Curtis Bob Clarke Marti Anderson
Distance-based methods • Strengths • flexibility of distance/dissimilarity measure, standardisation and transformation • consistency in that ordination and subsequent analyses based on original dissimilarities • some dissimilarities can be “decomposed” into relative taxon contributions (similarity percentages - SIMPER)
Issues • Flexible choice of distance/dissimilarity measure • ecologists nearly always default to Bray-Curtis • does B-C represent ecological differences of interest? • Modelling dissimilarities tricky • appropriate probability distributions – permutation procedures usually applied – robustness for complex models? • PERMANOVA only partitions SS not likelihoods • lack of independence – rely on permutation robustness • Limited predictive capacity • Distance-based methods cannot easily separate location and dispersion effects
Location vs dispersion • Wartonet al (2012)
Location vs dispersion • Transformation of abundances may help BUT many taxa have very skewed distributions • Issue recognised by PRIMER/PERMANOVA • “we can consider the homogeneity of dispersions to be included as part of the general null hypothesis of "no differences" among groups being tested by PERMANOVA (even though the focus of the PERMANOVA test is to detect location effects)” (PERMANOVA manual p.22) • On going debate PRIMER/PERMANOVA vsmvabund
“Univariate” linear model approach • Fit separate generalised linear models to each taxon • based on –ve binomial distribution (over-dispersed counts) • Testing overall group or covariate effects • sum likelihood ratio (LR) tests across taxa • use permutation (resampling) methods to generate test statistic • Relative taxon contribution to patterns • LR statistic as measure of strength of individual taxon contributions • Strengths • linear models framework, univariate predictive capacity • handles mean-variance relationship • Issues • not an “ordination” method David Warton
Methods in community ecology • Journals searched 2011-2012 • Austral Ecology • Oikos • Analyses of community/assemblage (species abundance incl. pres-abs data) • 62 papers found • Methods used • overall multivariate “philosophy” • choice of dissimilarity measure (if relevant) • transformation/standardisation used • modeling (hypothesis testing) method • choice of “ordination” plot
Eigenanalyses Majority of “ordinations” based on biplots, many with vectors fitted for environmental predictors (triplots)
Distance/dissimilarity • Why do ecologists default to Bray-Curtis? • Faith et al (1987 – Vegetatio) strongly recommended B-C as robust indicator of ecological gradients • ranges between 0 (identical samples) and 1 (no species in common) • handles joint absences (taxa missing from both samples) • default in PRIMER/PERMANOVA, PC-ORD • Does B-C represent patterns ecologists are really interested in?
Distance-based Majority of “ordinations” based on non-metric MDS, 3 papers used cluster analysis
Transformations • Transformations of abundances common in ecology • log (y+1) or square/fourth root • original PRIMER program had 4th root as default! • Most common reason - to reduce the influence of most abundant (dominant) taxa and give relatively greater weighting to rarer taxa • each taxon will be affected differently depending on its distribution? • effects on interaction terms almost never considered • Issues of unequal dispersions almost never raised in ecological papers • “it is not at all difficult to understand that transformations will also affect relative dispersions in multivariate space” (PERMANOVA manual p. 97)
Standardisations None Sample • Invertebrate assemblages in lake (Quinn et al 1996) • Four site-season combinations • nMDS on Bray-Curtis • Four standardisations: • None • By sample totals • By taxa totals • Double • Bray-Curtis vs Canberra Taxa Double
Bayesian approaches • Detecting transitions between upslope and riparian vegetation • management of stream riparian zones • Based on plant assemblage data (% cover) along transects away from stream • pairwise Canberra distances between quadrats along each transect • Aim - to find the model with the highest probability of being the break between riparian and upslope vegetation • usual MCMC estimation of models Acheron River
Bayes factors > 10 Higher elevation sites Lower elevation sites Mac Nally et al (2008) Plant Ecology
Bayesian approaches • Maybe more robust than ML for complex models • already being used for variance estimation and confidence (credible) intervals in some mixed model software • Straightforward(?) under mvabundgeneralised linear model approach • select suitable probability distributions for parameters • use uninformative prior if appropriate • More difficult with distance-based methods • but can be adapted (see Mac Nally 2005 Divers & Distr) • other examples using MDS and clustering (Oh & Raftery 2007 J Comp Graph Stat) focus on graphical representation (“ordination”)
Questions for discussion • Is the confounding of location and dispersion a “fatal” flaw for distance-based measures? • more direct comparisons between distance-based and linear model approaches needed • Comparison to other new methods • generalised dissimilarity modelling (Ferrier et al 2007) • gradient forests (Ellis et al 2012) • If distance-based measures are used: • what does Bray-Curtis actually measure ecologically? • What do multivariate models actually predict?
Questions for discussion • Should ecologists re-think their use of transformations? • NOT just a multivariate issue! • How do ecologists determine optimum sample sizes for community ecology • power characteristics will vary between taxa in linear models approach • power for distance-based permutation analyses?