Particle Filtered MCMC-MLE with
1 / 1

1. Motivation - PowerPoint PPT Presentation

  • Uploaded on

Particle Filtered MCMC-MLE with Connections to Contrastive Divergence. Arthur Asuncion, Qiang Liu, Alexander Ihler, Padhraic Smyth Department of Computer Science, University of California, Irvine. 1. Motivation

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' 1. Motivation' - todd

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Particle Filtered MCMC-MLE with Connections to Contrastive Divergence

Arthur Asuncion, Qiang Liu, Alexander Ihler, Padhraic Smyth

Department of Computer Science, University of California, Irvine

  • 1. Motivation

  • Undirected models are useful in many settings. Consider models in exponential family form:

  • Task: Given i.i.d. data,

  • estimate parameters accurately and quickly

  • Maximum likelihood estimation (MLE):

  • Need to resort to approximate techniques:

    • Pseudolikelihood / composite likelihoods

    • Sampling-based techniques (e.g. MCMC-MLE)

    • Contrastive divergence (CD) learning

  • We propose particle filtered MCMC-MLE

  • 3. Contrastive Divergence (CD)

  • Widely-used machine learning algorithm for learning undirected models [Hinton, 2002]

  • CD can be motivated by taking gradient of log-likelihood directly:

  • CD-n samples from current model (approx.):

    • Initialize chains at empirical data distribution

    • Only run n MCMC steps

  • Persistent CD: initialize chains at samples at previous iteration [Tieleman, 2008]

  • 5. Experimental Analysis

  • Visible Boltzmann machines:

  • Exponential random graph models (ERGMs):

  • Conditional random fields (CRFs):

  • Restricted Boltzmann machines (RBMs):

Partition function usually intractable

Network statistics:

# edges

# 2-stars

# triangles

Run MCMC under θ for n steps

  • 2. MCMC-MLE

  • Widely used in statistics [Geyer, 1991]

  • Idea: draw samples from alternate distribution p(x|θ0) using MCMC, to approximate the likelihood:

  • To optimize approximate likelihood, use gradient:

  • Degeneracy problems if θ moves far from initial θ0

  • 4. Particle Filtered MCMC-MLE (PF)

  • Use sampling-importance-resampling (SIR) with MCMC rejuvenation to estimate gradient

  • Monitor effective sample size (ESS):

  • If ESS (“health” of particles) is low:

    • Resample particles in proportion to w

    • Rejuvenate with n MCMC steps based on θ

  • PF can avoid MCMC-MLE’s degeneracy issues

  • PF can be potentially faster than CD since it only “rejuvenates” when ESS is low

  • As the number of particles approaches infinity, PF recovers the exact log-likelihood gradient

Experiments on MNIST data.

500 hidden units used.

Monte Carlo approximation

  • 6. Conclusions

  • Particle filtered MCMC-MLE can avoid the degeneracy issues of MCMC-MLE by performing resampling and rejuvenation

  • Particle filtered MCMC-MLE is sometimes faster than CD since it only rejuvenates when needed

  • There is a unified view of all these algorithms

MCMC-MLE uses importance sampling to estimate gradient

Run MCMC under θ for n steps

Update θ using approximate gradient

Run MCMC under p(x|θ0) until equilibrium

If ESS is low,

resample and rejuvenate

Calculate weight and check ESS

PF can be viewed as a “hybrid” between MCMC-MLE and CD

Calculate new weight