- 85 Views
- Uploaded on
- Presentation posted in: General

Joint work between Eva K ü ster, Ingolf K ü hn ~ UFZ

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Analysing the link between traits & invasive spread in German flora: accounting for residence time

Joint work between

Eva Küster, Ingolf Kühn ~ UFZ

Adam Butler, Stijn Bierman, Glenn Marion ~ BioSS

Athens ALARM meeting, January 2007

- Direct data on the arrival, establishment & spread of invasive species are typically not available at the national or pan-European levels
- Indirect data about the traits & current spatial distribution of species that invaded in the past can be used to identify correlative relationships between traits and invasive success, accounting for phylogeny
- Data on traits are often missing or ambiguous, however, creating serious problems for the analysis – we look at how to address these using Bayesian methods

- We analyse data on German vascular plants
- Biolflor (www.ufz.de/biolflor):
database with information on traits & phylogeny of 3660 species

- Florkart (www.floraweb.de):
database with information on presence/absense of 4000+ species for 2995 grid cells within Germany

- We look at neophyte species (arrivals since 1490), excluding ephemerophytes: there are 388 such species
- We use the # of grid cells occupied as a measure of invasive success

Morphology

Life form

Growth form

Life span

Generative reproductive cycles

Propagation & dispersal

Types of storage organs

Existence of storage organs

Types of shoot metamorphoses

Types of root metamorphoses

Leaf traits

Leaf persistence

Leaf anatomy

Leaf form

Flowering phenology

Beginning of flowering season

Length of flowering season

End of flowering season

Genetics

Ploidy

DNA content

Niche breadth in Germany

# hemerobic levels

Urbanity

# of habitat types

# of vegetation formations

# phytosociological classes

Diaspores & germinules

Types of diaspores

Weights of diaspores

Weights of germinules

Invasive history

Mode of introduction

Residence time

Life strategy

Ecological strategy

Ruderal life strategy

Native global distribution

Floristic zones of native area

# floristic zones in native area

Continent of native area

# continents in native area

Native in old or new world?

Oceanity of native area

Amplitude of oceanity

Floral & reproductive biology

Strategy types of reproduction

Mating strategy

Pollen vector

Flower colour

Floral UV pattern

Floral UV reflection

Blossom type

- Regress log(# grid cells occupied) onto each of the ~40 individual traits in turn, in the presence of phylogenetic variables
- Retain only traits that are significant at the 95% level, exclude non-predictive traits, & then use cluster analysis to further reduce the set of traits
- Use AIC to select the best model from within this set of traits, including interactions
- At all stages, use only those species that have complete data for all traits currently in the model

- Compute the patristic distance matrix based on the phylogenetic codes given in biolflor
- For the current set of species –
- apply a principal coordinate analysis to the relevant part of the distance matrix
- retain only axes associated with positive eigenvalues
- then retain the axes that account for the first 80% of variation
- then regress log(# grid cell occupied) onto the remaining axes and retain only those that are significant at the 95% level

- The phylogenetic variables need to be recomputed whenever the set of species is changed

- A large number of species are currently excluded from the final analysis as data are missing on some of their traits
- This is inefficient, & could potentially lead to bias if the data are missing not at random
- The missing data arise from different sources –
- there being no record in the Biolflor database
- the qualifier in Biolflor suggesting that data quality is poor
- multiple states being recorded for a particular trait
- a very rare state being recorded

- Residence time is a particularly important variable because
- it has good explanatory power to describe occupancy
- It partly accounts for the dynamic nature of invasive processes
- it allows us to make time-specific predictions about occupancy

- However, data on German residence times are only available for 171 species, & for 35 of these only to the nearest century
- Some auxiliary data is available for neighbouring countries
- How can we properly include residence time into the analysis, given the large proportion of missing data?

- The aims of our research on this at BioSS –
- to explore how sensitive the results of inferences are to the assumptions that we make about missing data
- to analyse the data in such a way that species with missing data for some traits do not need to be excluded
- to relate the outputs from the the analysis to invasive risk

- We work with the Biolflor-Florkart data, and focus upon missing data for residence times; however, the methodological ideas are widely applicable

- Application to the prediction of invasive risk
- e.g. Use traits & phylogeny to infer the number of cells that a recently arrived species is likely to occupy after N years of residence
- This number is uncertain, so it will be a probability distribution rather than a single number

- An alternative approach to statistical modelling and inference, in which data are regarded as fixed and parameters are regarded as random
- Increasingly widely used: due to improvements in computational power it is now often possible to fit more advanced models using Bayesian inference than using classical statistical methods
- Particularly suitable for problems that involve missing data
- Implemented using free software called WinBUGS: extremely powerful but not particularly user-friendly…

- Basic model
- log yi ~ N( + xi + zi + ri, 2)
- …just the same as a GLM
- Prior distributions
We use uninformative priors

, , , ~ N(0,1000)

2 ~ Gamma(1/1000, 1/1000)

- Recast the UFZ methodology in a Bayesian context, and implement this in WinBUGS
- Use this to explore potential refinements or extensions to the current analysis
- Assess sensitivity to the assumptions about missing data, phylogenetic dependence and distribution of the response variable (log-normal or Binomial)
- Implementation is in WinBUGS
- develop ways of dealing more efficiently with missing data

- Bayesian
LPJ code: Ben Smith, Stephen Sitch, Sybil Schapoff

CRU data: David Viner

GCM data: PCMDI

Statistical methods: Jonathan Rougier, Chris Glasbey

Uncertainty analysis: Bjoern Reineking, Stijn Bierman

Notation:for species i:

yi = # of grid cells occupied

ri = residence time

xi = other trait data

zi = phylogenetic variables

MCMC details:

Burn-in = 5000, Sample = 2000

Thinning ratio = 1:50

- When data on residence times are missing, then we can assume that they are random variables
- We can use data on the other traits, phylogeny & number of grid cells occupied to infer the distribution of the residence time for a particular species i
e.g.

log ri ~ N(exp{a + bxi + czi + dyi}, s2)

- Use of the cut function ensures this does not bias inferences about , , , and
- Recast the UFZ methodology in a Bayesian context, and implement this in WinBUGS
- Use this to explore potential refinements or extensions to the current analysis
- Assess sensitivity to the assumptions about missing data, phylogenetic dependence and distribution of the response variable (log-normal or Binomial)
- Implementation is in WinBUGS
- develop ways of dealing more efficiently with missing data

- Bayesian
LPJ code: Ben Smith, Stephen Sitch, Sybil Schapoff

CRU data: David Viner

GCM data: PCMDI

Statistical methods: Jonathan Rougier, Chris Glasbey

Uncertainty analysis: Bjoern Reineking, Stijn Bierman

Pink result based on 124 species

Other results based on 345 species

42 species excluded

Pink result based on 135 species

Other results based on 379 species

8 species excluded

Pink result based on 135 species

Other results based on 379 species

8 species excluded

Pink result: 108 species

Other results: 329 species

58 species excluded

(Note: posterior probability that > 0 is always >0.99)

- Our model assumes that the data on residence times are missing at random, as does the approach of excluding missing data
- We can also consider possible mechanisms by which the missing data might be related to the variables of interest

Let oi = 1 if residence time observed for species i, 0 otherwise

- We could assume that
- oi ~ Binomial(1, logit-1{A + Bxi + Czi + Dyi + Eri})
- The parameter E cannot be estimated, but we can assess sensitivity to the value of it; we assume here that E is negative

- Relatively low proportions of missing data for the other key traits:can just exclude these when he look at traits individually, but more problematic when we look at effects of multiple traits
- Most “missing data” for the other key traits arise because rare or duplicate trait states are recorded in Biolflor
- We would like to incorporate this information directly into the analysis, rather than attempting to impute the missing values
- We can deal with duplicate states either by assuming:
- that the parameter for species that have both states is the average of the parameters for the two states; or
- by including a separate parameter for species that have duplicate traits

Missing data

in current

analysis

- The imputation model allows us to draw inferences about residence times for species where the arrival date is unknown
- The performance of the imputation model depends upon us it containing regressors that are strongly correlated with residence time in Germany
- Possibility of using data on residence in a neighbouring country, ni, as an explanatory variable:
log ri ~ N(exp{a + bxi + czi + dyi + eni }, s2)

- UFZ are using the species-level model to identify key traits for invasive success, & then a spatial approach to estimate impact of environmental change on these
- A non-spatial approach might involve grouping cells according to environmental characteristics, & fitting the species-level model seperately for each group of cells
- We are interesting in comparing these approaches