QUANTITATIVE PALAEOECOLOGY

QUANTITATIVE PALAEOECOLOGY Lecture 1. Introduction BIO-351

Contents What is palaeoecology? What are palaeoecological data? Why attempt quantification in palaeoecology? What are the main approaches to quantification in palaeoecology? What are the major numerical techniques in quantitative palaeoecology? How to transform palaeoecological data? What are the basics behind the major techniques (some revision!)?

What is Palaeoecology? Palaeoecology is, in theory, the ecology of the past and is a combination of biology and geology. In practice, it is largely concerned with the reconstruction of past communities, landscapes, environments, and ecosystems. It is difficult to study the ecology of organisms in the past and hence deduce organism – environment relationships in the past. Often the only record of the past environment is the fossil record. Cannot use the fossil record to reconstruct the past environment, and then use the past environment to explain changes in the fossil record!

There are several approaches to palaeoecology • Descriptive – basic description, common • Narrative - ‘story telling’, frequent • Analytical - rigorous hypothesis testing, rare • Qualitative - common • Quantitative – increasing • Descriptive - common • Deductive - rare, but increasing • Experimental – very rare

Why Study Palaeoecology? • Present-day ecology benefits from historical perspective • "Palaeoecology can provide the only record of complete in situ successions. The framework of classical succession theory (probably the most well known and widely discussed notion of ecology) rests largely upon the inferences from separated areas in different stages of a single hypothetical process (much like inferring phylogeny from the comparative analogy of modern forms). Palaeo-ecology can provide direct evidence to supplement ecological theory." • S.J. Gould, 1976 • "There is scarcely a feature in the countryside today which does not have its explanation in an evolution whose roots pass deep into the twilight of time. Human hands have played a leading role in this evolutionary process, and those who study vegetation cannot afford to neglect history." C.D. Pigott, 1978 • 2. Past analogue for future • 3. Intellectual challenge and desire to understand our past • 4. Reconstruction of past environment important to evaluate extent of natural variability • 5. 'Coaxing history to conduct experiments' (Deevey, 1969) • 6. Fun!

Mechanisms and modes of studying environmental change over different timescales (modified from Oldfield, 1983)

Descriptive historical science, depends on inductive reasoning. Uniformitarianism “present is key to the past”. Method of multiple working hypotheses. Simplicity “Ockham’s razor”. Sound taxonomy essential. Language – largely biological and geological. Data frequently quantitative and multivariate. Philosophy of Palaeoecology

What are Palaeoecological Data? Presence/absence or, more commonly, counts of fossil remains in sediments (lake muds, peats, marine sediments, etc). Fossils - pollen diatoms chironomids cladocera radiolaria testate amoebae mollusca ostracods plant macrofossils foraminifera chrysophyte cysts - biochemical markers (e.g. pigments, lipids, DNA) Sediments - geochemistry grain size physical properties composition magnetics stable isotopes (C,N,O)

Data are usually quantitative and multivariate (many variables (e.g. 30-300 taxa), many samples (50-300)). Quantitative data usually expressed as percentages of some sum (e.g. total pollen). Data may contain many zero values (taxa absent in many samples). Closed, compositional data, containing many zero values, strong inter-relationships between variables. If not percentages, data are presence/absence, categorical classes (e.g. <5, 5-10, 10-25, >25 individuals), or ‘absolute’ values (e.g. pollen grains cm-2 year-1). Samples usually in known stratigraphical order (time sequence). Some types of data may be modern ‘surface’ samples (e.g. top 1 cm of lake mud) and associated modern environmental data. Such data form ‘training sets’ or ‘calibration data-sets’.

Palaeoecological data are thus usually • stratigraphical sequences at one point in space or samples from one point in time but geographically dispersed • percentage data • contain many zero values

Multivariate Data Matrix Matrix X with n columns x m rows. n x m matrix. Order (n x m). subscript X21 Xik element in row two column one row i column k

Why Attempt Quantification in Palaeoecology? • Data are very time consuming (and expensive) to collect. • Data are quantitative counts. Why spend time on counting if the quantitative aspect of the data is then ignored? • Data are complex, multivariate, and often stratigraphically ordered. Methods to help summarise, describe, characterise, and interpret data are needed (Lectures 3 and 5). • Quantitative environmental reconstructions (e.g. lake-water pH, mean July temperature) important in much environmental science (e.g. to validate model hindcasts or back-predictions) (Lecture 4). • Often easier to test hypotheses using numerical methods (Lecture 5).

Reasons for Quantifying Palaeoecology 1: Data simplification and data reduction “signal from noise” 2: Detect features that might otherwise escape attention. 3: Hypothesis generation, prediction, and testing. 4: Data exploration as aid to further data collection. 5: Communication of results of complex data. Ease of display of complex data. 6: Aids communication and forces us to be explicit. “The more orthodox amongst us should at least reflect that many of the same imperfections are implicit in our own cerebrations and welcome the exposure which numbers bring to the muddle which words may obscure”.D Walker (1972) 7: Tackle problems not otherwise soluble. Hopefully better science. 8:Fun!

What are the Main Approaches to Quantification in Palaeoecology? • Model building • explanatory • statistical • Hypothesis generation ‘exploratory data analysis’ (EDA) • detective work • Hypothesis testing ‘confirmatory data analysis’ (CDA) • CDA and EDA – different aims, philosophies, methods • “We need both exploratory and confirmatory” • J.W. Tukey (1980)

Model Building in Palaeoecology Model building approach Cause of sudden and dramatic extinction of large mammals in North America 10-12,000 years ago at end of Pleistocene. One hypothesis - arrival and expansion of humans into the previously uninhabited North American continent, resulting in overkill and extinction. Model - arrival of humans 12,000 years ago across Bering Land Bridge. Start model with 100 humans at Edmonton, Alberta. Population doubles every 30 years. Wave of 300,000 humans reaching Gulf of Mexico in 300 years, populated area of 780 x 106 ha. Population could easily kill a biomass of 42 x 109 kg corresponding to an animal density of modern African plains. Model predicts mammal extinction in 300 years, then human population crash to new, low population density.

A hypothetical model for the spread of man and the overkill of large mammals in North America. Upon arrival the population of hunters reached a critical density, and then moved southwards in a quarter-circle front. One thousand miles south of Edmonton, the front is beginning to sweep past radiocarbon-dated Palaeoindian mammoth kill sites, which will be overrun in less than 2000 years. By the time the front has moved nearly 2000 miles to the Gulf of Mexico, the herds of North America will have been hunted to extinction. (After Mosimann and Martin, 1975.)

CONFIRMATORY DATA ANALYSIS EXPLORATORYDATA ANALYSIS Real world ’facts’ Hypotheses Real world ‘facts’ Observations Measurements Data Observations Measurements Data Data analysis Statistical testing Patterns ‘Information’ Hypothesis testing Hypotheses Decisions Theory

What are the Major Numerical Techniques in Palaeoecology? • Exploratory data analysis • 1a. Numerical summaries - means • medians • standard deviations • ranges • 1b. Graphical approaches - box-and-whisker plots • scatter plots • stratigraphical diagrams • 1c. Multivariate data analysis - classification • ordination (including discriminant analysis)

What are the Major Numerical Techniques in Palaeoecology? • Confirmatory data analysis or hypothesis testing • Statistical modelling (regression analysis) • Quantitative environmental reconstruction (calibration = inverse regression) • Time-series analysis

1. Exploratory Data Analysis 1a. Summary Statistics • Measures of location ‘typical value’ • (1) Arithmetic mean • (2) Weighted mean • (3) Mode ‘most frequent’ value • (4) Median ‘middle values’ Robust statistic • (5) Trimmed mean 1 or 2 extreme observations at both tails deleted • (6) Geometric mean R

Q1 Q2 Q3 (B) Measures of dispersion (1) Range A = 0.37 B = 0.07 (2) Interquartile range ‘percentiles’ (3) Mean absolute deviation ignore negative signs Mean absolute difference 10/n = 2.5

(B) Measures of dispersion (cont.) (4) Variance and standard deviation Variance = mean of squares of deviation from mean Root mean square value SD (5) Coefficient of variation Relative standard deviation Percentage relative SD (independent of units) mean (6) Standard error of mean R

CI around median 95% Median  1.58 (Q3) / (n)½ quartile 1b. Graphical Approaches (A) Graphical display of univariate data Box-and-whisker plots – box plots R

Box plots for samples of more than ten wing lengths of adult male winged blackbirds taken in winter at 12 localities in the southern United States, and in order of generally increasing latitude. From James et al. (1984a). Box plots give the median, the range, and upper and lower quartiles of the data.

(B) Graphical display of bivariate or trivariate data R

Triangular arrangement of all pairwise scatter plots for four variables. Variables describe length and width of sepals and petals for 150 iris plants, comprising 3 species of 50 plants. Three-dimensional perspective view for the first three variables of the iris data. Plants of the three species are coded A,B and C.

(C) Graphical display of multivariate data FOURIER PLOTS Andrews (1972) Plot multivariate data into a function. where data are [x1, x2, x3, x4, x5... xm] Plot over range -π ≤ t ≤ π Each object is a curve. Function preserves distances between objects. Similar objects will be plotted close together. MULTPLOT

Andrews' plot for artificial data

Andrews’ plots for all twenty-two Indian tribes.

Stratigraphical plot of multivariate palaeoecological data

Other types of graphical display of multivariate data involve some dimension reduction methods (e.g. ordination or classification techniques), namely multivariate data analysis.

1c. Multivariate Data Analysis EUROPEAN FOOD (From A Survey of Europe Today, The Reader’s Digest Association Ltd.) Percentage of all households with various foods in house at time of questionnaire. Foods by countries. Country

Classification Dendrogram showing the results of minimum variance agglomerative cluster analysis of the 16 European countries for the 20 food variables listed in the table. Key: Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F France, GB Great Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL Holland, P Portugal, S Sweden, SF Finland

Ordination Key: Countries: A Austria, B Belgium, CH Switzerland, D West Germany, E Spain, F France, GB Great Britain, I Italy, IRL Ireland, L Luxembourg, N Norway, NL Holland, P Portugal, S Sweden, SF Finland Correspondence analysis of percentages of households in 16 European countries having each of 20 types of food.

Minimum spanning tree fitted to the full 15-dimensional correspondence analysis solution superimposed on a rotated plot of countries from previous figure.

Geometric models Pollen data - 2 pollen types x 15 samples Depths are in centimetres, and the units for pollen frequencies may be either in grains counted or percentages. Adam (1970)

Alternate representations of the pollen data Palynological representation Geometrical representation In (a) the data are plotted as a standard diagram, and in (b) they are plotted using the geometric model. Units along the axes may be either pollen counts or percentages. Adam (1970)

Geometrical model of a vegetation space containing 52 records (stands). A: A cluster within the cloud of points (stands) occupying vegetation space. B: 3-dimensional abstract vegetation space: each dimension represents an element (e.g. proportion of a certain species) in the analysis (X Y Z axes). A, the results of a classification approach (here attempted after ordination) in which similar individuals are grouped and considered as a single cell or unit. B, the results of an ordination approach in which similar stands nevertheless retain their unique properties and thus no information is lost (X1 Y1 Z1 axes). N. B. Abstract space has no connection with real space from which the records were initially collected.

Concept of Similarity, Dissimilarity, Distance and Proximity sij – how similar objectiis objectj Proximity measure DC or SC Dissimilarity=Distance _________________________________ Convert sijdij sij = C – dijwhere C is constant

2. Hypothesis Testing or Confirmatory Data Analysis Hypothesis of interest may by ‘human impact on the landscape caused major changes in the lake-water nutrient status’. Called H1 – alternative hypothesis. Require ‘response’ variables (Y) e.g. lake-water total P reconstructed from fossil diatoms. Require ‘predictor’ or ‘explanatory’ variables (X) e.g. terrestrial pollen of unambiguous indicators of human impact (e.g. cereal pollen). Need to quantify the predictive power of X to explain variation in Y Y = f (X) e.g. Y = b0 + b1X (linear regression)

Null hypothesis (H0) is the opposite of our hypothesis (H1), namely that human impact had no effect on the lake-water nutrient status; i.e. b1 = 0 in Y = b0 + b1X (H0) b1 0 in Y = b0 + b1X (H1) Can do a regression-type analysis of Y in relation to X, estimate b1. How to evaluate statistical significance when data are non-normal and samples are not random? Use so-called randomisation or Monte Carlo permutation tests (Lecture 5). R CANOCO

3. Statistical Modelling or Regression Analysis Regression model Y = b0 + b1X [Inverse regression (= calibration) X = a0 + a1Y ] Types of regression depend on numbers of variables in Y and X Y = 1 X = 1 simple linear or non-linear regression Y = 1 X > 1 linear or non-linear multiple regression Y > 1 X 1 linear or non-linear multivariate regression (Y = response variable(s) X = predictor or explanatory variable(s)) Lectures 2 and 5 R CANOCO

4. Calibration (=Inverse Regression) Quantitative Environmental Reconstruction Xm = gYm + error whereXm = modern environment (e.g. July temperature) Ym = modern biological data (e.g. diatom %) g = modern ‘transfer function’ Xf = g Yf whereXf = past environmental variable Yf = fossil biological data Lecture 4 C2

5. Time-Series Analysis Values of one or more variables recorded over a long period of time as in a stratigraphical sequence. Values may vary with time. Variations may be long-term trends, short-term fluctuations, cyclical variation, and irregular or ‘random’ variation. Time-series analysis looks at correlation structure within a variable in relation to time, between variables in relation to time, trends within a variable, and periodicities or cycles within and between variables. Lecture 5 R

How to Transform Palaeoecological Data? Percentage data – square-root transformation helps to stabilise the variances and maximises the ‘signal’ to ‘noise’ ratio. Absolute data – log transformations (log(y+1)) helps to stabilise the variances and may maximise the ‘signal’ to ‘noise’ ratio. Often also very effective with percentage data. Stratigraphical data are in a fixed order. Need numerical methods that take account of this ordering (constrained classifications, constrained ordinations, restricted or constrained Monte Carlo permutation tests, time-series analysis). Basis of much quantitative palaeoecology is not only the stratigraphical ordering but also age chronology of the samples. Transformation of depth to age key stage. Chronology and age-depth modelling: Lecture 2.

What are the Basics Behind the Major Techniques? • Multivariate data analysis (Lecture 3) • Classification • Ordination • Constrained ordination (Lectures 4 and 5, Practical 4) • Confirmatory data analysis (Lecture 5, Practical 4) • Statistical modelling (Lecture 2, Practicals 1 and 2) • Quantitative environmental reconstruction (Lecture 4, Practical 3) • Time-series analysis (Lecture 5) • Only discuss topics 1, 2, and 3 in this lecture. Topics 4 and 5 will be covered in Lectures 4 and 5.

Classification – Two Major Types used in Palaeoecology 1. MULTIVARIATE DATA ANALYSIS • Agglomerative Hierarchical Cluster Analysis • Calculate matrix of proximity or dissimilarity coefficients between all pairs of n samples (½n(n-1)) • Clustering of objects into groups using stated criterion – ‘clustering’ or sorting strategy • Graphical display of results • Check for distortion

i Xi2 dij2 Variable 2 j Xj2 Xi1 Xj1 Variable 1 • Proximity or Distance or Dissimilarity Measures • Quantitative Data Euclidean distance dominated by large values Manhattan or city-block metric less dominated by large values sensitive to extreme values relates minima to average values and represents the relative influence of abundant and uncommon variables Bray & Curtis (percentage similarity)

QUANTITATIVE PALAEOECOLOGY