MCMC and SMC for Nonlinear Time Series Models
Download
1 / 52

MCMC and SMC for Nonlinear Time Series Models Chiranjit Mukherjee STA395 Talk - PowerPoint PPT Presentation


  • 61 Views
  • Uploaded on

MCMC and SMC for Nonlinear Time Series Models Chiranjit Mukherjee STA395 Talk Department of Statistical Science, Duke University February 16, 2009. Outline. Problem Statement Markov Chain Monte Carlo

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'MCMC and SMC for Nonlinear Time Series Models Chiranjit Mukherjee STA395 Talk' - flavio


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

MCMC and SMC for Nonlinear Time Series Models

Chiranjit Mukherjee

STA395 Talk

Department of Statistical Science, Duke University

February 16, 2009


Outline

  • Problem Statement

  • Markov Chain Monte Carlo

    Dynamic Linear Models, Forward Filtering Backward Sampling, Nonlinear Models, Mixture of Gaussians, Approximate FFBS as a proposal to Metropolis-Hastings

  • Sequential Monte Carlo

    Importance Sampling, Sequential Importance Sampling, Optimal Proposal, Resampling, Auxiliary Particle Filters, Parameter Degeneracy, Marginal Likelihood Calculation, Issues with Resampling, Scalability of SMC techniques

  • Minimal Quorum Sensing Model

    Background, Differential Equations Model, Discretized Version, Features

  • Results

  • Summery

  • References


Problem Statement

  • We will focus on Markovian, nonlinear, non-GaussianState Space Models:

    Priors:

    System Evolution:

    Observation:

  • Given the data y1, y2, …, yTthe objective is to find the following posterior distribution:


MCMC Techniques for State Space Models

  • All primary MCMC algorithm designs perform the following two Gibbs steps

    in sequence:

  • Usually can be sampled using a Gibbs step or a Metropolis-Hastings within Gibbs step.

  • Sampling the latent states x0:Tfrom the joint conditional posterior

    is the primary challenge in this task.


Dynamic Linear Models

[West and Harrison, 1997]

where are all known .

One can sample from the joint distribution of x0:Tgiven and

using a Forward Filtering Backward Sampling Algorithm.

[Carter & Kohn, 1994]


Forward Filtering

Note that:

If are all Gaussian distributions,

is also Gaussian.

Filtering:

  • Start with

  • For update


Backward Sampling

Note that:

Since

and are Gaussian, is also Gaussian.

  • Sample:

  • For sample:


Nonlinear Dynamic Models

where ft, gt are known nonlinear functions and are all known.

An approximateFFBS is based on the Taylor Series expansion of the functions:

where and


Mixture Normal Approximation

Filtering:

When each of are Normal or mixture of Normals then is also a mixture of Normals.

Smoothing:

is mixture Normal and is Normal implies

is a mixture Normal.


Metropolis Step

In order to sample from for a general State Space model, we can

  • Use the approximate FFBS procedure to propose a sample and accept it

    with a Metropolis-Hastings step.

  • Let us call this proposal .

  • One can explicitly write an expression for the joint density as

    it is a product of Normal Densities or Mixture of Normal Densities.

  • One accepts the proposed sample with probability:

  • The main problem is as T increases, the approximation goes bad and the

    Metropolis acceptance rate falls down quickly.



Importance Sampling

  • Objective: Want to sample from ¼(x) which is difficult. We use an approximate

    distribution q(x) which is easy to sample from.

  • For any distribution q(.) such that ¼(x) > 0 implies q(x) > 0, we have

    where


A Sequential Importance Sampling Approach

Let where

Key Idea: If is not too different from then we should be able to reuse our estimate of as an Importance Distribution for .

ALGORITHM

  • Start with sampling from the prior:

  • Suppose at time (n-1) we have the following particulate approximation:

  • Update:


Updating the IS Approximation

  • We want to reuse the samples from

    used to build approximation of .

  • This only makes sense if

  • We select :

  • Unnormalized particle weights are updated in the following way


A Simple SIS for State Space Models

For a State Space model

Let

If we use the following proposal:

Then


Optimal Importance Distribution

  • The algorithm described above collapses as n increases, because after a few steps

    only few particles have non-negligible weights.

  • An optimal zero-variance proposal at time t is simply given by:

  • For performing SIS in this optimal setting we need for

    which is not readily available in general.

  • Instead people deploy a Locally Optimum Importance Distribution, which

    consists of selecting at time t that minimizes the variance of

    the importance weights.


Locally Optimal Importance Distribution

It follows that:

and

In the case of State Space Models:

and


Resampling

  • Even with locally optimal proposal, as time index n increases, the variance of the

    unnormalized weights {wn (x0:n)} tend to increase, and all the mass is concentrated

    on a few particles/samples.

  • We wish to focus our computational efforts on high-density regions of the space.

  • IS approximation:

  • Resample with weights M times:

    to build the new approximation

  • Now the samples become statistically dependent, so hard to analyze theoretically. However resampling is a necessary step to avoid particle degeneracy.


Resampling

With the Locally Optimal Filter a Standard SIS Algorithm would be:

  • Sample:

  • Compute weights:

  • Resample to obtain equal weighted samples

    An alternative strategy is:

  • Calculate weights:

  • Resample to obtain equal weighted samples

  • Sample:

  • This algorithm ensures more diverse particles at time n.

  • Changing of the order can be performed because

    is independent of xn.


  • Auxiliary Particle Filter

    • For a general State Space Model it is not always possible to either explicitly sample from or calculate weights

    • We can use an approximation to , say

    • In literature it is often suggested to take

      where is the mean, median or the mode of the distribution

    • Let:

      ALGORITHM

    • Compute weights:

    • Resample to obtain

    • Sample:

    • Calculate weights:


    Degeneracy Issues

    • The SMC strategy performs remarkably well in terms of estimating marginals

    • However the joint distribution is poorly estimated when n is large.

    • One can not hope to estimate a target distribution with increasing dimension with fixed

      precision when the number of particles remains fixed.

    • Since we are interested in the marginal , SMC serves well for our purpose.

    • For bounded functions Á and p>1, we can expect results of the following form

      if the model has nice forgetting/mixing properties. ML is increasing in L.


    Degeneracy in the Parameter Space

    • All the algorithms we have described so far tries to minimize degeneracy in the state space. Resampling is performed in order to achieve diverse particles for xn.

    • However we have sampled particles for µ ~ ¼ (µ) right in the beginning and the resampling step would reduce the distinct particles of µ as time n increases.

    • [Liu & West, 2001] suggests using a smooth kernel density for and sample

      µ particles from the smoothed density to break degeneracy.

    • Let denote samples from time n posterior (not that µ is time dependent). If

      [Liu & West, 2001] suggests:

      where ; and are sample mean and

      variance of .


    Liu & West, 2001

    • The authors suggested shrinkage in order avoid over-dispersion of the smooth

      kernel density

    • Choice of h comes from the choice of discounting factor usually 0.95-0.99

    • They also recommend using Auxiliary Particle Filtering to improve performance.

      ALGORITHM

  • For calculate , and

  • Resample with weights

  • Sample: and

  • Evaluate the corresponding weights:


  • Using Sufficient Statistics

    • Another approach to break particle degeneracy in the parameter space is to use conditional sufficient statistics st for the parameters.

    • One can propagate the following joint distribution over time

    • Usually the conditional sufficient statistic follows a recursive relationship:

    • One can use any of the algorithms for updating the conditional distribution of the

      states. For example with the locally optimal importance distribution one should

      have the following relationship:

    • Note that unlike smooth kernel density approximation technique to avoid degeneracy, this is an exact technique. So it should be used whenever possible.


    Marginal Likelihood Calculation

    • Often times we need to compute the marginal likelihood for model comparison purpose.

    • For a general State Space Model, marginal likelihood of is:

    • Note that for the case of Vanilla filter (with the resampling step):


    Issues with Resampling

    • The most intuitive resampling scheme is Multinomial resampling. At time n we do

      where Ni = # times particle i is replicated.

    • has complexity O(M).

    • has complexity O(M2).

    • Resampling becomes the bottleneck computation in a SMC procedure if a Multinomial sampler is used.

    0

    W1

    W2

    W3

    W4

    W5

    W6

    WM-2

    WM-1

    1


    A Faster Resampling Scheme

    Systematic Resampling:

    • Like Multinomial, but only one random

      sample

    • Complexity O(M)

    0

    W1

    W2

    W3

    W4

    W5

    W6

    WM-2

    WM-1

    1


    Scalability of SMC

    • Every SMC algorithm has three essential steps:

      (i) Sampling Step - Generate from

      (ii) Importance Step – Compute particle weights

      (iii) Resampling Step – Draw M particles from with

      probability proportional to weight

    • Sampling and Importance steps are completely parallelizable without the need of

      any from of communication between the particles.

    • Resampling step needs communication while normalizing the weights.

    • Some algorithms need further communication, like Liu & West need to compute

      sample mean and variance and .

    • If we implement a SMC algorithm on a distributed architecture we should

      transfer some particles from particle surplus processors to particle deficient processors

      after a resampling step in order to keep the computational load even.


    Resampling on a Distributed Architecture

    ALGORITHM (1 master, K slaves)

    • Each slave processor calculates the total weight of processor k and sends

      it to the master processor.

    • Master processor performs Inter-Resampling:

    • Master processor sends back to processor k.

    • Each slave processor performs Intra-Resampling: (in parallel)

    • Particle Routing – to equalize computational load of the processors:

      -- This depends on the architecture.



    Minimal Quorum Sensing Motif

    Tanouchi Y, Tu D, Kim J, You L (2008) : “Noise Reduction by Diffusional Dissipation in a Minimal Quorum Sensing Motif”. PLoS Comput Biol 4(8).

    • Two genes, encoding proteins LuxI and LuxR

    • LuxI is AHL synthase

    • AHL freely diffuses in and out of the cell

    • As cell density increases, AHL density increases in the environment and in the cell

    • At sufficient high concentration, AHL binds to and activates LuxR

    • This will in turn activate downstream genes.

    Ai: Intracellular AHL level

    Ae: Extracellular AHL level

    R: LuxR protein level

    C: AHL-LuxR complex level



    Discretized Model

    The Stochastic Differential Equation

    When discretized, will yield the following difference equation:

    For our Minimal QS Motif the discretized version is the following:

    where


    Some Notations

    Let

    With these notations our discretized model becomes:

    Let us use the notation for

    Then,



    As a State Space Model

    Systems Equation:

    We assume that we can observe xt = (Ai,t, Ae,t, Rt, Ct)’ with some measurement errors.

    Let yt denote the observations made for the unknown states xt. Let us represent it as:

    Observational Equation:

    where V is unknown.

    This makes our model fall into the general category of State Space Models:


    Features of this Model

    • This model is nonlinear.

    • System evolution variance matrixis not fixed. depends on

      latent states and parameters.

    • So the basic assumption for a DLM (that are known)

      does not hold here.

    • This does not make any problem is Forward Filtering.

    • Note that for Backward Sampling the key identity is:

    • Now has xt appearing in the variance matrix, so

      is no longer a Gaussian density.


    MCMC Algorithm

    • Note that and

      are linear in the mean.

    • We have used the following approximation to run a FFBS

      that approximates the distributions

      and

    • As mentioned before, proposed states are accepted with a Metropolis –Hastings

      acceptance step.

    • Complete conditional distribution of V is Inverse-Wishart. It is updated using a

      Gibbs step.

    • Component parameters of µ appear in . There the complete conditional

      for µ parameters are NOT Gaussian. We update them using a Random-Walk

      Metropolis-Hastings step within Gibbs.


    Synthetic Data

    We do not have real data.

    For data simulation:

    • We use values for parameter µ as suggested in the literature.

    • For V we’ve made an arbitrary choice.

    • The choice for Ai,0, Ae,0, R0, C0 are the expected values at a steady state.

      We have generated synthetic observations y1, y2, …, y999, y1000


    Bayesian Analysis

    Prior Distributions:

    • Relatively flat Normal distributions truncated over zero for the µ parameters.

    • Relatively flat Normal distributions truncated over zero for the initial states

      Ai,0, Ae,0, R0, C0 .

    • Inverse Wishart distribution for the unknown variance matrix V.

      An Identification Issue:

      Since the parameters P, Vc, Ve are involved in the model only through the

      ratios and we do not have identifiability for all the three parameters.

      We can only learn about these two ratios. Therefore we use the ratios as model

      parameters rather than the individual ones.


    MCMC Results

    • We have run the MCMC for 106 iterations and the following results are from the

      last 105 iterations of the generated Markov Chain.



    SMC Algorithm

    • We have used Auxiliary Particle Filter to reduce particle degeneracy.

    • For the observational equation variance matrix V, a sufficient statistics structure

      exists. We use the sufficient statistics to exactly sample from

      on each step.

    • For the parameters in µ no sufficient statistics structure exists. We use the

      kernel smoothing technique to reduce particle degeneracy in the parameter space.

    • We have run our particle filters with 106 particles.


    Quantiles

    Content

    RED for MCMC

    GREEN for SMC


    Title

    Content

    RED MCMC

    GREEN SMC



    Smoothed Posteriors at time T = 1000

    RED MCMC

    GREEN SMC

    GREY PRIOR



    Summery

    • For nonlinear, non-Gaussian State space models with long time series data MCMC is slow, and has issues with convergence.

    • Sequential Monte Carlo techniques provide an alternative class of non-iterative algorithms to solve this class of problems.

    • For a long time series SMC methods suffer from degeneracy issues, particularly while computing entities like Marginal Likelihood.

    • SMC is scalable, so with enough resources one can imagine of tackling problems with big data which otherwise takes an enormous amount of time to solve with MCMC methods.

    • Model comparison becomes handy with easy computation of marginal likelihood.


    References

    • M West. Approximating Posterior Distributions by Mixtures. Journal of Royal Statistical Socity, (55): 409–422, 1993a.

    • M West. Mixture Models, Monte Carlo, Bayesian Updating and Dynamic Models. J.H.Newton (ed.), Computing Science and Statistics: Proceedings of 24th Symposium on the Interface, pages 325–333, 1993b.

    • C K Carter and R Cohn. On Gibbs Sampling for State Space Models. Biometrica, 81(3):541–553, August 1994.

    • J Liu and M West. Combined Parameter and State Estimation in Simulation-based Filtering. Sequential Monte Carlo Methods in Practice, pages 197–223, 2000.

    • P Fearnhead. MCMC, Sufficient Statistics, and Particle Filters. Journal of Computational and Graphical Statistics, (11):848–862, 2002.

    • G Storvik. Particle Filters in State Space Models with the Presence of Unknown Static Parameters. IEEE. Trans. of Signal Processing, (50):281–289, 2002.


    References

    • S J Godsill, A Doucet, and M West. Monte Carlo Smoothing for Nonlinear Time Series. Journal of the American Statistical Association, 99(465):DOI: 10.1198/016214504000000151, March 2004.

    • M Boli´c, P M Djuri´c, and S Hong. New Resampling Algorithms for Particle Filters. IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings, April 2003.

    • MBoli´c, PM Djuri´c, and S Hong. Resampling Algorithms for Particle Filters: A Computational Complexity Perspective. EURASIP Journal of Applied Signal Processing, (15):2267–2277, 2004.

    • M S Johannes and N Polson. Particle Filtering and Parameter Learning. Social Science Research Network, page http://ssrn.com/abstract=983646, March 2007.

    • C M Carvalho, M Johannes, H F Lopes, and N Polson. Particle Learning and Smoothing. Working Paper, 2008.

    • Y Tanouchi, D Tu, J Kim, and L You. Noise Reduction by Diffusional Dissipation in a Minimal Quorum Sensing Motif. PLoS Computational Biology, 4(8):e1000167.doi:10.1371/journal.pcbi.1000167, August 2008.



    ad