John Birks University of Bergen, University College London, and University of Oxford

Quantitative Environmental Reconstructions in Palaeoecology: Progress, current status, & future needs John Birks University of Bergen, University College London, and University of Oxford Tage Nilsson Lecture Centre for GeoBiosphere Science University of Lund, 7 March 2013

INTRODUCTION Early attempts at quantitative environmental reconstructions used presence of one or more ‘indicator species’ (e.g. Andersson, Samuelsson, Iversen, Grichuk, Coope) or species groups (e.g. Hustedt, Nygaard). Major development in Quaternary science occurred in 1971 with publication of the classic paper by Imbrie & Kipp. Paper laid the foundation of calibration functions (transfer functions) as a tool for the quantitative reconstruction of past environments using the whole fossil assemblage, not just a few indicator species. Paradigm shift, not only in palaeoceanography but also in quantitative palaeoecology.

Quickly followed by Webb & Bryson (1972) using pollen data in the Midwest, USA, to reconstruct climate. Used liner-based canonical correlation analysis. Palaeoclimatology

Basic Biological Assumptions Marine planktonic foraminifera - Imbrie & Kipp 1971 Foraminifera are a function of sea-surface temperature (SST)  Foraminifera can be used to reconstruct past SST Pollen is a function of regional vegetation – Webb & Bryson 1972 Regional vegetation is a function of climate  Pollen is an indirect function of climate and can be used to reconstruct past regional climate at a broad spatial scale Chironomids (aquatic non-biting midges) are a function of lake-water temperature – Walker et al. 1991 Lake-water temperature is a function of climate  Chironomids are an indirect function of climate and can be used to reconstruct past climate but problems may arise Freshwater diatoms are a function of lake-water chemistry – Renberg & Hellberg 1982  Diatoms can be used to reconstruct past lake-water chemistry

Basic Approach to Quantitative Environmental Reconstruction – Calibration-in-Space 1, , m taxa Y f t samples Fossil data (e.g. diatoms) ‘Proxy data’ Environmental variable (e.g. pH) 1 variable X f Unknown To be estimated or reconstructed t samples To solve for Xf, need modern data about species and pH from n samples

1, , m taxa Ym n samples Modern biology (e.g. diatoms) Modern environment (e.g. pH) 1 variable Xm n samples Model Ym in relation to Xm to derive modern calibration function Ûm Apply Ûm to Yf to estimate past environment Xf Imbrie & Kipp provided the basic theory and assumptions, a robust method, and modern and fossil data

Calibration-in-space Xm Ym transfer function Ûm Yf Xf Juggins & Birks (2012)

Alternative Approach – Calibration-in-Time X X 0 f Y Y 0 f To solve for Xf, model Y0 in relation toX0, derive and apply calibration function F0 to Yf to estimate Xf ^ Fossil data (e.g. diatoms) Environmental variable (e.g. pH) 1 1, , m taxa Known from historical data p observations p samples Unknown, to be reconstructed t samples t samples All done at one site

Potential problems • Temporal autocorrelation in Y0 and X0. How many independent samples are there? What is n? • Chronological – samplecorrelation between Y0 and X0 • Applicability – can the model be applied to other sites other than the site where the calibration is made? Similar problem of applicability with intra-lake approach (Ym and Xm to derive Ûm from one lake applied to other non-training set lakes). Only consider Calibration-in-Space

In palaeolimnology, after Nygaard’s (1956) , , and  indices and Merilainen’s (1967) calibration, first major step towards robust environmental reconstructions was made in 1982 by Renberg & Hellberg with their Index B ind = indifferent species (either side of pH7) acp = acidophilous (pH<7) acb = acidobiontic (pH<7, optimum 5.5 or less)alk = alkaliphilous (pH7 or more) alb = alkalibiontic (pH>7)

Renberg & Hellberg (1982) Represented a great breakthrough, only 30 years ago

State of the subject in palaeolimnology prior to 1989 1986 Discussed diatom-pH calibration functions in Lund with Rick Battarbee in 1986. Suggested how they might be improved

Major breakthrough occurred in 1989 as result of work of Cajo ter Braak with his 1987 doctoral thesis Advances in Ecological Research 1988

Several important papers that have been very influential on quantitative palaeolimnology

Through his work at the Research Institute for Nature Management at Leersum, ter Braak advised ecologists about data analysis and developed many new techniques to help answer particular ecological questions. One such ecologist was the diatomist Herman van Dam who was working on the impact of acidification on diatoms and water chemistry of Dutch moorland ponds (this work led ter Braak to publish his first paper on multivariate data analysis (principal component biplots) in 1982).

This collaboration led to ter Braak & van Dam (1989) Changed the approaches to quantitative environmental reconstruction in palaeolimnology (and in much of palaeoecology)

Fortunately coincided with Surface Water Acidification Project’s (SWAP) Palaeolimnology Programme led by Rick Battarbee and Ingemar Renberg 1987-1990.

ter Braak & van Dam (1989) 99 training-set diatom-pH samples; 61 independent test-set diatom samples. RMSEP is root mean squared error of prediction (‘standard error’). Generally want it as low as possible Set the scene for weighted-averaging based methods – computationally simple, heuristic equivalents to the theoretically more rigorous maximum-likelihood methods.

Biological Proxy-Data Properties • Contain many taxa (200-300) • Contain many zero values (absences) • Commonly expressed as proportions or percentages - "closed" compositional data • Multicollinearity between variables • Quantitative data are highly variable, invariably show a skewed distribution. Few common taxa, many rare taxa • Can show spatial autocorrelation e.g. forams, dinocysts, pollen • Taxa generally have non-linear relationship with their environment, and the relationship is often a unimodal function of the environmental variables

Species Response Models A straight line displays the linear relation between the abundance value (y) of a species and an environmental variable (x). Modelled by linear regression. LINEAR A unimodal relation between the abundance value (y) of a species and an environmental variable (x). (u=optimum or mode; t=tolerance; c=maximum). Modelled by Gaussian logit regression (GLR) UNIMODAL

Environmental Data Properties • Generally few variables, often show a skewed distribution • Strong multicollinearity (e.g. July mean temperature, growing season duration, annual mean temperature) • Often difficult to obtain (few modern climate stations, corrections for altitude of sampling sites, etc.) • Strong spatial autocorrelation (tendency of values at sites close to each other to resemble one another more than randomly selected sites). Values at one site can be partially predicted from its values at neighbouring sites. • Problem of nearly all data in real world. Recognised by Francis Galton in 1889. First methods to eliminate spurious correlation due to spatial position developed by ‘Student’ in 1914.

PROGRESS Since 1971, calibration functions widely used in palaeoceanography, terrestrial palaeoecology, and palaeolimnology Used with wide range of biological proxies • foraminifera, radiolaria, marine diatoms, coccolithophores • pollen, testate amoebae, mollusca, bryophytes, plant macrofossils • diatoms, chrysophytes, chironomids, ostracods, cladocerans Now many different numerical reconstruction methods – at least 26 methods published, many minor variants of established methods

Reconstruction methods can be divided into three main types(Birks et al. 2010) • Indicator-species approach – one or many taxa considered as presence/absence • Similarity-based assemblage methods involving a quantitative comparison between past assemblages Yf and modern assemblages Ym (e.g. MAT, smooth response surfaces) • Multivariate calibration methods involving a quantitative calibration functionÛm estimated from Xm and Ym, modern calibration or training data-set (e.g. weighted averaging regression and calibration) Concentrate on calibration-function approach

Approaches to Estimating Calibration Functions 1. Basic Numerical Models • Classical Approach Y = f(X) + error Biology Environment Estimate f by some mathematical procedure and 'invert' estimated (f) to find unknown past environment Xffrom fossil data Yf Xf f-1(Yf) Can be difficult computationally

Inverse Approach In practice, for various mathematical reasons, do an inverse regression or calibration X = g(Y) + error Xf = g(Yf) Obtain 'plug-in' estimate of past environment Xf from fossil data Yf f or g are calibration functions Easier to compute g and nearly always performs as well as classical approach

2. Assumed Species Response Model • Linear or unimodal • No response model assumed (linear or non-linear) • 3. Dimensionality of Model • Full (all species considered) • Reduced (selected components of species used) • 4. Estimation Procedure for Model • Global (estimate parametric functions, extrapolation possible) • Local (estimate non-parametric functions, extrapolation not possible) Birks et al. (2010)

Commonly Used Methods I = inverse; C = classical L = linear; U = unimodal; NA = not assumed; R = reduced dimensionality; F = full dimensionality; G = global parametric estimation; Ln = local non-parametric estimation CF = calibration-function based; S = similarity-based

Good reasons for preferring methods with assumed biological response model, full dimensionality, and global parametric estimation(ter Braak (1995), ter Braak et al. (1993), etc.) • Can test statistically if taxon A has a statistically significant relation to particular environmental variables • Can develop ‘artificial’ simulated data with realistic assumptions for numerical ‘experiments’ • Such methods have clear and testable assumptions – less of a ‘black box’ than e.g. artificial neural networks • Can develop model evaluation or diagnostic procedures analogous to regression diagnostics in statistical modelling • Having a statistical basis, can adopt well-established principles of statistical model selection and testing. Minimises ‘ad hoc’ aspects of MAT “To make sense of an observation, everyone needs a model … whether he or she knows it or not” Marc Kéry (2010)

Basic Requirements in Quantitative Palaeoenvironmental Reconstructions • Need biological system with abundant fossils that is responsive and sensitive to environmental variables of interest. • Need a large, high-quality training set of modern samples. Should be representative of the likely range of variables, be of consistent taxonomy and nomenclature, be of highest possible taxonomic detail, be of comparable quality (methodology, count size, etc.), and be from the same sedimentary environment. • Need fossil set of comparable taxonomy, nomenclature, quality, and sedimentary environment.

4. Need robust statistical methods for regression and calibration that can adequately model taxa and their environment with the lowest possible error of prediction and the lowest bias possible and sound methods for model selection. 5. Need means of establishing if reconstruction is statistically significant. 6. Need statistical estimation of standard errors of prediction for each reconstructed value. 7. Need statistical and ecological evaluation and validation of the reconstruction and of each reconstructed value. Birks et al. (1990)

PC1 PC2 Ym Xm PC3 Early Methods Used Principal components regression (PCR) = Imbrie & Kipp (1971) approach Multiple linear regression or quadratic regression of Xm on PC1, PC2, PC3, etc, to derive Ûm. Express Yf as principal components and apply Ûm to estimate Xf Principal components maximise variance withinYmonly Selection of PCA components done visually until recently. Now cross-validation is used to select model with fewest components, lowest root mean squared error of prediction (RMSEP), & lowest maximum bias. ‘Minimal adequate model’ in statistical modelling Inverse, linear, reduced dimensionality, global estimation. Linear response model is assumed, although non-linear responses are possible.

Index B approach Ind Acp Index B (Um) Xf Acb Ym + Xm + Yf (fossil data) pH recon-struction Alk Alb Inverse, linear, reduced dimensionality, global parametric estimation. Needs a priori taxon groupings

Related inverse multiple linear regression approach (Davis & Berge 1980, Charles 1982, Davis et al. 1983, Davis & Anderson 1984, Flower 1986) Ind Acp + Xm Um  Xf Acb Ym pH reconstruction + Yf (fossil data) Alk Alb Inverse, linear, reduced dimensionality, global parametric estimation. Linear model is assumed, although non-linear responses are possible. Can be done with a priori species groups or individual taxa (forward selection).

Major Methods Used Gaussian logit regression (GLR) and maximum likelihood (ML) calibration ter Braak & van Dam (1989) b0, b1, b2 ML calibration Ym + Xm Yf Xf b0, b1, b2 environmental reconstruction modern data fossil data b0, b1, b2 taxon GLR regression coefficients for all taxa Ûm Classical, unimodal, full dimensionality, global estimation. Robust to spatial autocorrelation. Can be computationally difficult. ML finds the most likely value of Xf that maximises the likelihood function given Yf and Ûm

Two-way weighted averaging regression and calibration (WA) ter Braak & van Dam (1989); Birks et al. (1990) U1 WA regression WA calibration Ym + Xm Yf Xf U2 modern data environmental reconstruction fossil data Ut taxa WA optima ‘calibration function’ Ûm Inverse, unimodal, full dimensionality, global parametric estimation. Robust to spatial autocorrelation. First used in Quaternary science by Lynts and Judd (1971) Science 171: 1143-1144

Ecologically plausible – based on unimodal species response model. • Mathematically simple but has a rigorous mathematical theory. Properties fairly well known now. • Empirically powerful: • does not assume linear responses • not hindered by too many taxa, in fact helped by many taxa! Full dimensionality • relatively insensitive to outliers • Tests with simulated and real data – at its best with noisy, taxon-rich compositional percentage data with many zero values over long environmental gradients. • Because of its computational simplicity, can derive error estimates for predicted inferred values by bootstrapping. • Does well in ‘non-analogue’ situations as it is not based on the assemblage as a whole but on INDIVIDUAL taxa optima and/or tolerances. Robust to spatial autocorrelation. Globalparametric estimation. • Ignores absences of taxa.

WA WA GLR GLR pH • Weaknesses • Sensitive to distribution of environmental variable in training set, leading to ‘edge effects’ where responses are truncated. J. Oksanen (2002) 2. Disregards residual correlations in biological data. Can extend WA to WA-partial least squares to include residual correlations in biological data in an attempt to improve estimates of taxon optima

Weighted averaging partial least squares regression and calibration (WA-PLS)ter Braak & Juggins (1993) and ter Braak et al. (1993) WA-PLS regression PLS1 WA-PLS calibration Ym Xm βm Yf Xf PLS2 coefficients (Ûm) PLS3 Components selected to maximise covariance between taxon weighted averages and environmental variable X Selection of number of PLS components to include based on cross-validation. Model selected should have fewest components possible and low RMSEP and maximum bias – minimal adequate model. Inverse, unimodal, reduced dimensionality, global parametric estimation. Can be sensitive to spatial autocorrelation.

Comparison of different methods Imbrie & Kipp (1971) data Model performance statistic is root mean squared error of prediction (RMSEP) based on leave-one-out cross-validation Linear Unimodal Shows importance of using a unimodal-based method(ter Braak et al. (1993))

Other Areas of Progress Besides the development of new methods for deriving calibration functions and of modern calibration data-sets, there have been major developments in model evaluation and selection and in reconstruction assessment, namely statistics of calibration functions and in understanding the strengths and weaknesses of different methods and in their underlying theory See Juggins (2013 QSR)

1. Model evaluation and selection Tendency to use several different methods and to select so-called ‘best’ method. Resulted in a shift from an obsession with the model with lowest RMSEP or, even worse, the highest r2. More concern with model performance statistics including estimates of bias and number of components fitted (e.g. in WA-PLS). Model performance usually based on some form of internal cross-validation (leave-one-out, n-fold cross-validation, or bootstrapping) or external cross-validation with independent test-set. Juggins & Birks (2012)

Birks & Simpson (2013) revisited the classical SWAP 167-sample diatom-pH calibration-set using modern methods (WA, WAPLS, GLR, MAT, etc.) 1. Internal cross-validation, done 50 times 167 samples  110 training-set samples 20 optimisation-samples (no. WAPLS components etc. 37 test-samples + + 2. External cross-validation, done 50 times 167 samples  167 training-set samples 23 external optimisation-samples 50 external test-samples + +

Internal cross-validation 37 test-samples 50 randomisations Birks & Simpson (2013)

External cross-validation 50 test-samples 50 randomisations Birks & Simpson (2013)

Internal cross-validation RMSEP values (I = inverse; C = classical; M = monotonic; T = Tolerance downweighting) WAI = WAC= WAM= WTM < WATI = WATC = MAT < WAPLS < GLR External cross-validation GLR < WAM= WTM < WAI = WAPLS < WAC < WATI < MAT < WATC Which to use as a guide to model selection? External cross-validation involving independent test-set samples is ‘the appropriate benchmark to compare methods’ because all sources of error are considered (ter Braak & van Dam 1989)

van der Voet (1994) randomisation test of models helps find ‘minimal adequate model’ (MAM). Model with good performance statistics and fewest number of fitted parameters. May be more than one MAM. More work needed on model selection using criteria like Akaike Information Criterion (AIC) where unnecessary parameters are penalised. Active research area in ecology and evolutionary biology today. Of course, performance of modern model is being assessed with other modern data, not with fossil data! Major problem. External cross-validation provides as rigorous a test as possible of performance.

2. Effects of spatial autocorrelation Estimating model performance in terms of RMSEP, r2, maximum bias, etc, assumes that the test-set is statistically independent of the training-set. Cross-validation in presence of spatial autocorrelation violates this assumption as test samples are not spatially and statistically independent. Spatial autocorrelation property of almost all environmental data and much ecological and biological data. Telford & Birks (2005) Quat. Sci. Rev. 24: 2173-2179 Telford (2006) Quat. Sci. Rev. 25: 1375-1382 Telford & Birks (2009) Quat. Sci. Rev. 28: 1309-1316 Telford & Birks (2011) Quat. Sci. Rev. 30: 3210-3213

Results show the apparent performance of some models is enhanced as a result of spatial autocorrelation in oceans and on land Problems in finding spatially independent test-sets to test inference models Telford & Birks (2009) have developed methods for cross-validating a calibration function in presence of spatial autocorrelation, h-block cross-validation Spatial autocorrelation does not appear to be a problem in many palaeolimnological calibration-sets. May be a problem in within-lake calibration-sets developed for water-level reconstructions (Velle et al. 2012)

3. Partitioning Root Mean Squared Error of Prediction Model uncertainty commonly expressed as RMSEP Can only hope to reduce RMSEP by 20-25%

4. Testing the statistical significance of a quantitative palaeoenvironmental reconstruction All calibration-function programs will produce output or ‘reconstruction’ Does the resulting reconstruction explain more of the variance in the fossil data than most (say 95%) reconstructions derived from calibration functions trained on random environmental data? If it does, then it is statistically significant. Global test of significance Telford & Birks 2011 Quat. Sci. Rev. 30: 1272-1278 H.H. Birks et al. 2012 Quat. Sci. Rev. 33: 100-120

John Birks University of Bergen, University College London, and University of Oxford

John Birks University of Bergen, University College London, and University of Oxford

Presentation Transcript

John Stein, Magdalen College, Oxford University, UK

University of Bergen

Oxford university

Mike Harrison Oxford University Centre for the Environment King’s College, University of London

Oxford University

Mary Davis, Oxford Brookes University John Morley, University of Manchester

The University of Oxford

Geophysical Institute University of Bergen

OXFORD UNIVERSITY

University College London

The University of Oxford

University of Oxford

UNIVERSITY OF OXFORD

University of Bergen

Erling Vårdal University of Bergen and

Øivin Andersen University of Bergen

University of bergen

Oxford University

John Betteridge University College London, London UK.

University of Bergen Library

University of Oxford

University of Oxford