1 / 71

Monte-Carlo Particle Filters for Wildlife Population Dynamics

This presentation discusses the use of Monte-Carlo particle filters to fit and compare models for the dynamics of wild animal populations. It covers basic particle filtering techniques, practical tricks for implementation, and various applications. The focus is on the comparison of particle filters with Kalman filters and Markov chain Monte Carlo methods for inference in state-space models.

shobson
Download Presentation

Monte-Carlo Particle Filters for Wildlife Population Dynamics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Use of Monte-Carlo particle filters to fit and compare models for the dynamics of wild animal populations I always wanted to be a model…. Len Thomas Newton Inst., 21st Nov 2006

  2. Outline • 1. Introduction • 2. Basic particle filtering • 3. Tricks to make it work in practice • 4. Applications • (i) PF, Obs error fixed • (ii) PF vs KF, One colony model • (iii) PF vs MCMC • 5. Discussion

  3. References • Our work: http://www.creem.st-and.ac.uk/len/

  4. Joint work with… • Methods and framework: • Ken Newman, Steve Buckland: NCSE St Andrews • Seal models: • John Harwood, Jason Matthiopoulos: NCSE & Sea Mammal Research Unit • Many others at SMRU • Comparison with Kalman filter: • Takis Besbeas, Byron Morgan: NCSE Kent • Comparison with MCMC • Carmen Fernández: Univ. Lancaster

  5. 1. Introduction

  6. Answering questions about wildlife systems • How many ? • Population trends • Vital rates • What if ? • scenario planning • risk assessment • decision support • Survey design • adaptive management

  7. State space model State process densitygt(nt|nt-1 ; Θ) Observation process density ft(yt|nt ; Θ) Initial state density g0(n0 ; Θ) Bayesian approach, so: • Priors on Θ • Initial state density + state density gives prior on n1:T

  8. British grey seal • Population in recovery from historical exploitation • NERC • Special Committee on Seals

  9. Data • Aerial surveys of breeding colonies since 1960s count pups • Other data: intensive studies, radio tracking, genetic, counts at haul-outs

  10. Pup production estimates

  11. Orkney example colonies

  12. State process modelLife cycle graph representation pup 1 2 3 4 5 6+ density dependence here… … or here

  13. Density dependencee.g. in pup survival Carrying capacity χr

  14. More flexible models of density dependence

  15. movement depends on • distance • density dependence • site faithfulness pup 1 2 3 4 5 6+ North Sea Inner Hebrides pup 1 2 3 4 5 6+ pup 1 2 3 4 5 6+ Outer Hebrides pup 1 2 3 4 5 6+ Orkneys State process model4 regions

  16. SSMs of widllife population dynamics:Summary of Features • State vector high dimensional (seal model: 7 x 4 x 22 = 616). • Observations only available on a subset of these states (seal model: 1 x 4 x 22 = 88) • State process density is a convolution of sub-processes so hard to evaluate. • Parameter vector is often quite large (seal model: 11-12). • Parameters often partially confounded, and some are poorly informed by the data.

  17. Fitting state-space models • Analytic approaches • Kalman filter (Gaussian linear model; Besbeas et al.) • Extended Kalman filter (Gaussian nonlinear model – approximate) + other KF variations • Numerical maximization of the likelihood • Monte Carlo approximations • Likelihood-based (Geyer; de Valpine) • Bayesian • Rejection Sampling Damien Clancy • Markov chain Monte Carlo (MCMC; Bob O’Hara, Ruth King) • Sequential Importance Sampling (SIS) a.k.a.Monte Carlo particle filtering

  18. Inference tasks for time series data • Observe data y1:t = (y1,...,yt) • We wish to infer the unobserved states n1:t = (n1,...,nt) and parameters Θ • Fundamental inference tasks: • Smoothing p(n1:t, Θ| y1:t) • Filtering p(nt, Θt| y1:t) • Prediction p(nt+x| y1:t) x>0

  19. Filtering • Filtering forms the basis for the other inference tasks • Filtering is easier than smoothing (and can be very fast) • Filtering recursion: divide and conquor approach that considers each new data point one at a time y4 y3 y2 y1 p(n4|y1:4) p(n3|y1:3) p(n0) p(n1|y1) p(n2|y1:2) Only need to integrate over nt, not n1:t

  20. Monte-Carlo particle filters:online inference for evolving datasets • Particle filtering used when fast online methods required to produce updated (filtered) estimates as new data arrives: • Tracking applications in radar, sonar, etc. • Finance • Stock prices, exchange rates arrive sequentially. Online update of portfolios. • Medical monitoring • Online monitoring of ECG data for sick patients • Digital communications • Speech recognition and processing

  21. 2. Monte Carlo Particle Filtering Variants/Synonyms: Sequential Monte Carlo methods Sequential Importance Sampling (SIS) Sampling Importance Sampling Resampling (SISR) Bootstrap Filter Interacting Particle Filter Auxiliary Particle Filter

  22. Importance sampling • Want to make inferences about some function p(), but cannot evaluate it directly • Solution: • Sample from another function q() (the importance function) that has the same support as p() (or wider support) • Correct using importance weights

  23. Example:

  24. Importance sampling algorithm • Given p(nt|y1:t) and yt+1 want to update to p(nt+1|y1:t+1), • Prediction step:Make K random draws (i.e., simulate K “particles”) from importance function • Correction step:Calculate: • Normalize weights so that • Approximate the target density:

  25. Importance sampling:take home message • The key to successful importance sampling is finding a proposal q() that: • we can generate random values from • has weights p()/q() that can be evaluated • The key to efficient importance sampling is finding a proposal q() that: • we can easily/quickly generate random values from • has weights p()/q() that can be evaluated easily/quickly • is close to the target distribution

  26. Sequential importance sampling • SIS is just repeated application of importance sampling at each time step • Basic sequential importance sampling: • Proposal distribution q() = g(nt+1|nt) • Leads to weights • To do basic SIS, need to be able to: • Simulate forward from the state process • Evaluate the observation process density (the likelihood)

  27. Basic SIS algorithm • Generate K “particles” from the prior on {n0, Θ} and with weights 1/K: • For each time period t=1,...,T • For each particle i=1,...,K • Prediction step: • Correction step:

  28. Justification of weights

  29. Example of basic SIS • State-space model of exponential population growth • State model • Observation model • Priors

  30. Predict Correct Sample from prior Posterior at t=1 Prior at t=1 n0Θ0 w0 n1Θ0 w0 n1Θ1 w1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.063 0.034 0.558 0.202 0.010 0.063 0.063 0.000 0.000 0.003 11 12 14 13 16 16 20 14 9 16 1.055 1.107 1.195 0.974 0.936 1.029 1.081 1.201 1.000 0.958 17 18 11 15 20 17 17 7 6 22 1.055 1.107 1.195 0.974 0.936 1.029 1.081 1.201 1.000 0.958 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 0.1 17 18 11 15 20 17 17 7 6 22 1.055 1.107 1.195 0.974 0.936 1.029 1.081 1.201 1.000 0.958 Example of basic SISt=1 Obs: 12 gives f() 0.028 0.012 0.201 0.073 0.038 0.029 0.029 0.000 0.000 0.012

  31. Predict Correct Posterior at t=1 Posterior at t=2 Prior at t=2 Obs: 14gives f() n1Θ1 w! n2Θ1 w1 n2Θ2 w2 0.063 0.034 0.558 0.202 0.010 0.063 0.063 0.000 0.000 0.003 1.055 1.107 1.195 0.974 0.936 1.029 1.081 1.201 1.000 0.958 0.105 0.068 0.691 0.015 0.005 0.105 0.007 0.000 0.000 0.000 17 18 11 15 20 17 17 7 6 22 1.055 1.107 1.195 0.974 0.936 1.029 1.081 1.201 1.000 0.958 15 14 12 10 11 15 21 9 11 20 0.063 0.034 0.558 0.202 0.010 0.063 0.063 0.000 0.000 0.003 0.160 0.190 0.112 0.008 0.046 0.160 0.011 0.000 0.046 0.007 15 14 12 10 11 15 21 9 11 20 1.055 1.107 1.195 0.974 0.936 1.029 1.081 1.201 1.000 0.958 Example of basic SISt=2

  32. Problem: particle depletion • Variance of weights increases with time, until few particles have almost all the weight • Results in large Monte Carlo error in approximation • Can quantify: • From previous example:

  33. Problem: particle depletion • Worse when: • Observation error is small • Lots of data at any one time point • State process has little stochasticity • Priors are diffuse or not congruent with observations • State process model incorrect (e.g., time varying) • Outliers in the data

  34. Some intuition • In a (basic) PF, we simulate particles from the prior, and gradually focus in on the full posterior by filtering the particles using data from one time period at a time • Analogies with MCMC: • In MCMC, we take correlated samples from the posterior. We make proposals that are accepted stochastically. • Problem is to find a “good” proposal • Limitation is time – has the sampler converged yet? • In PF, we get an importance sample from the posterior. We generate particles from a proposal, that are assigned weights (and other stuff – see later). • Problem is to find a “good” proposal • Limitation is memory – do we have enough particles? • So, for each “trick” in MCMC, there is probably an analogous “trick” in PF (and visa versa)

  35. 3. Particle filtering “tricks” An advanced randomization technique

  36. Tricks: solutions to the problem of particle depletion • Pruning: throw out “bad” particles (rejection) • Enrichment: boost “good” particles (resampling) • Directed enrichment (auxiliary particle filter) • Mutation (kernel smoothing) • Other stuff • Better proposals • Better resampling schemes • …

  37. Rejection control • Idea: throw out particles with low weights • Basic algorithm, at time t: • Have a pre-determined threshold, ct, where 0 < ct <=1 • For i = 1, … , K, accept particle i with probability • If particle is accepted, update weight to • Now we have fewer than K samples • Can make up samples by sampling from the priors, projecting forward to the current time point and repeating the rejection control

  38. Rejection control - discussion • Particularly useful at t=1 with diffuse priors • Can have a sequence of control points (not necessarily every year) • Check points don’t need to be fixed – can trigger when variance of weights gets too high • Thresholds, ct, don’t need to be set in advance but can be set adaptively (e.g., mean of weights) • Instead of restarting at time t=0, can restart by sampling from particles at previous check point (= partial rejection control)

  39. Resampling: pruning and enrichment • Idea: allow “good” particles to amplify themselves while killing off “bad” particles • Algorithm. Before and/or after each time step (not necessarily every time step) • For j = 1, … , K • Sample independently from the set of particles according to the probabilities • Assign new weights • Reduces particle depletion of states as “children” particles with the same “parent” now evolve independently

  40. Resample probabilities • Should be related to the weights • (as in the bootstrap filter) • α could vary according to the variance of weights • α = ½ has been suggested • related to “future trend” – as in auxiliary particle filter

  41. Can obtain by projecting forward deterministically Directed resampling: auxiliary particle filter • Idea: Pre-select particles likely to have high weights in future • Example algorithm. • For j = 1, … , K • Sample independently from the set of particles according to the probabilities • Predict: • Correct: • If “future” observations are available can extend to look >1 time step ahead – e.g., protein folding application

  42. Kernel smoothing: enrichment of parameters through mutation • Idea: Introduce small “mutations” into parameter values when resampling • Algorithm: • Given particles • Let Vt be the variance matrix of the • For i = 1, … , K • Sample where h controls the size of the perturbations • Variance of parameters is now (1+h2)Vt, so need shrinkage to preserve 1st 2 moments

  43. Kernel smoothing - discussion • Previous algorithm does not preserve the relationship between parameters and states • Leads to poor smoothing inference • Possibly unreliable filtered inference? • Pragmatically – use as small a value of h as possible • Extensions: • Kernel smooth states as well as parameters • Local kernel smoothing

  44. Other “tricks” • Reducing dimension: • Rao Blackwellization – integrating out some part of the model • Better proposals: • Start with an importance sample (rather than from priors) • Conditional proposals • Better resampling: • Residual resamling • Stratified resampling • Alternative “mutation” algorithms: • MCMC within PF • Gradual focussing on posterior: • Tempering/anneling • …

  45. 4. Applications

  46. (i) Faray example • Motivation: Comparison with Kalman Filter (KF) via Integrated Population Modelling methods of Besbeas et al.

  47. Example State Process Model: Density dependent emigration • τ fixed at 1991 pup 1 2 3 4 5 6+ density dependent emigration

  48. Observation Process Model • Ψ = CV of observations

  49. Priors • Parameters: • Informative priors on survival rates from intensive studies (mark-recapture) • Informative priors on fecundity, carrying capacity and observation CV from expert opinion • Initial values for states in 1984: • For pups, assume • For other ages: • Stable age prior • More diffuse prior

  50. Fitting the Faray data • One colony: relatively low dimension problem • So few “tricks” required • Pruning (rejection control) in first time period • Multiple runs of sampler until required accuracy reached (note – ideal for parallelization) • Pruning of final results (to reduce number of particles stored)

More Related