Niches, distributions… and data Miguel Nakamura Centro de Investigación en Matemáticas (CIMAT), Guanajuato, Mexico email@example.com Warsaw, November 2007
“Data” Environmental layers Presences
inferred using realized in Niche models Nature produces use Data Niche and distribution concepts conceived in Ecological theory defines Theoretical niche Distribution
Premise #1: an observation is the result of at least two, multi-factor processes • Biology: the fundamental niche, biotic conditions, sink populations, etc. • Humans: the collector introduces bias, methods used determine detection, etc.
Premise #2: randomness involved • If sites 1 and 2 both have equal conditions X as far as we can see, it does NOT necessarily follow that “species present at site 1 implies species present at site 2” • Reason: apart from conditions X, there may be other, non-visualized conditions, Z, that also influence presence. These may differ between site 1 and site 2. • Refer to probabilities of presence at a site having conditions X, instead of a deterministic statement, “species is present at a site”.
“Probability trees” • Graphical devices for tracking random experiments, especially when sequential processes or stages are involved. • Probabilities can be assigned to branches, for calculations. • Example: die is cast to observe number of spots (N), then N coins are tossed and number of heads counted.
Die # Heads Probability of this branch=(1/6)×(.50) .50 0 .50 1 1 0 .25 .50 1 2 1/6 .25 2 Probability of this cluster=sum of probabilities of individual branches 1/6 .125 0 .375 3 1/6 1 .375 2 1/6 .125 4 3 Begin 1/6 etc. 5 1/6 6
Species present? Site visited? Species detect? False absence False presence False absence False presence False presence False absence True presence True absence A site Elementary probability tree for describing occurrence data
Species moved? Biotic OK? Abiotic OK? Site visited? Species detect? Presence-only data: the probability of this branch is product of all probabilities in its path. This is data niche models will use. More-elaborate probability tree: “biological presence” has been expanded
Species moved? Biotic OK? Abiotic OK? Site visited? Species detect? E A×B×C×D×E D C B A Filling-in probabilities in the tree
Species moved? Biotic OK? Abiotic OK? Site visited? Species detect? E A×B×C×D×E D C Probability of detection (methods and effort of collection) B Suitability of abiotic conditions (resistance to temperature extremes, water stress, etc.) Sampling bias (accessibility, roads, etc.) A Suitability of biotic conditions (competitors, predators, mutualists, etc.) Motility of species (history, barriers, dispersal capacities, etc.) Interpreting the probabilities
Occurrence Data (reptiles) Spatial sampling bias
? Spatial sampling bias Environmental sampling bias e2 e1 Geographical space Environmental space
Species moved? Biotic OK? Abiotic OK? Site visited? Species detect? Prob=.32×.50=.16 .50 .32 Two different sampling schemes Important conclusion: Factors can combine in different ways and still produce the same observed presence rate! Two different species 1 Prob=.20×.80=.16 .80 .20 1 1 A pet example
Issue raised by pet example • Distribution of presence-only data is a function of all factors in the tree. Factors can combine in different ways and still produce the same observed presence rate! • Since observed data is probabilistically identical, any method that uses observed data only, is unable to discern between Species #1 and Species #2. • Sampling bias and other conditions become crucial.
Species moved? Biotic OK? Abiotic OK? Site visited? Species detect? E Data=A×B×C×D×E D C B Abiotically suitable area=C A Colonizable area=B×C Occupied area=A×B×C In general, areas of distribution≠data
Conclusions • One thing is distribution of species, and another issue is distribution of observed data. Relationship between data and the niche must be understood. • Previous tree diagram is far more complicated: • Interactions. • Sink populations. • Grid resolution (more on this shortly). • Recording errors, classification errors. • Some special cases allow for simplifications: • Uniform sampling. • Sure detection. • Unrestricted species motility.
Conclusions • Algorithms use observed data. They will all try to fit observed data to environmental variables. • This may or may not produce what you are interested in. It may if you are willing to make some assumptions regarding data. • It is your responsibility to determine if these assumptions are met and to interpret results accordingly. A modeling algorithm will not know better. • It is useful to think of “data” as including operational assumptions, not merely “numbers”.
Probability trees used to understand changes in grid resolution 1km 2km
E1 D1 C1 B1 E12 A1 D12 C12 B12 A12 E2 Site 1-2 D2 C2 B2 A2 Site 2 Site 1 Merging two sites
Is there a relationship between A1, B1, C1, D1, E1, A2, B2, C2, D2, E2 and A12, B12, C12, D12 , D12? • If new probabilities are derived from the pair of old sets, then merged tree is function of components. Will show that this cannot be done coherently.
To produce coherent interpretations for the new tree, the “biotic” probability in the new tree must necessarily depend on accessibility, biotic, and abiotic terms of the original trees. • Since this interpretation is senseless, the conclusion is that a change in resolution implies a new description of niches/distributions.