- By
**todd** - Follow User

- 73 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' 1. Motivation' - todd

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Particle Filtered MCMC-MLE with Connections to Contrastive Divergence

Arthur Asuncion, Qiang Liu, Alexander Ihler, Padhraic Smyth

Department of Computer Science, University of California, Irvine

- 1. Motivation
- Undirected models are useful in many settings. Consider models in exponential family form:
- Task: Given i.i.d. data,
- estimate parameters accurately and quickly
- Maximum likelihood estimation (MLE):
- Need to resort to approximate techniques:
- Pseudolikelihood / composite likelihoods
- Sampling-based techniques (e.g. MCMC-MLE)
- Contrastive divergence (CD) learning

- We propose particle filtered MCMC-MLE

- 3. Contrastive Divergence (CD)
- Widely-used machine learning algorithm for learning undirected models [Hinton, 2002]
- CD can be motivated by taking gradient of log-likelihood directly:
- CD-n samples from current model (approx.):
- Initialize chains at empirical data distribution
- Only run n MCMC steps

- Persistent CD: initialize chains at samples at previous iteration [Tieleman, 2008]

- 5. Experimental Analysis
- Visible Boltzmann machines:
- Exponential random graph models (ERGMs):
- Conditional random fields (CRFs):
- Restricted Boltzmann machines (RBMs):

Partition function usually intractable

Network statistics:

# edges

# 2-stars

# triangles

Run MCMC under θ for n steps

…

- 2. MCMC-MLE
- Widely used in statistics [Geyer, 1991]
- Idea: draw samples from alternate distribution p(x|θ0) using MCMC, to approximate the likelihood:
- To optimize approximate likelihood, use gradient:
- Degeneracy problems if θ moves far from initial θ0

- 4. Particle Filtered MCMC-MLE (PF)
- Use sampling-importance-resampling (SIR) with MCMC rejuvenation to estimate gradient
- Monitor effective sample size (ESS):
- If ESS (“health” of particles) is low:
- Resample particles in proportion to w
- Rejuvenate with n MCMC steps based on θ

- PF can avoid MCMC-MLE’s degeneracy issues
- PF can be potentially faster than CD since it only “rejuvenates” when ESS is low
- As the number of particles approaches infinity, PF recovers the exact log-likelihood gradient

Experiments on MNIST data.

500 hidden units used.

Monte Carlo approximation

- 6. Conclusions
- Particle filtered MCMC-MLE can avoid the degeneracy issues of MCMC-MLE by performing resampling and rejuvenation
- Particle filtered MCMC-MLE is sometimes faster than CD since it only rejuvenates when needed
- There is a unified view of all these algorithms

MCMC-MLE uses importance sampling to estimate gradient

Run MCMC under θ for n steps

Update θ using approximate gradient

Run MCMC under p(x|θ0) until equilibrium

If ESS is low,

resample and rejuvenate

Calculate weight and check ESS

…

PF can be viewed as a “hybrid” between MCMC-MLE and CD

Calculate new weight

Download Presentation

Connecting to Server..