Expectation maximization belief propagation
Download
1 / 28

Expectation-Maximization & Belief Propagation - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

Expectation-Maximization & Belief Propagation. Alan Yuille Dept. Statistics UCLA. 1. Chair. Goal of this Talk. The goal is to introduce the Expectation-Maximization (EM) and Belief Propagation (BP) algorithms.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Expectation-Maximization & Belief Propagation' - zelda-pierce


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Expectation maximization belief propagation

Expectation-Maximization& Belief Propagation

Alan Yuille

Dept. Statistics UCLA


Goal of this talk

1. Chair

Goal of this Talk.

  • The goal is to introduce the Expectation-Maximization (EM) and Belief Propagation (BP) algorithms.

  • EM is one of the major algorithms used for inference for models where there are hidden/missing/latent variables.



Images are piecewise smooth
Images are piecewise smooth

Assume that images are smooth except at sharp discontinuities (edges). Justification from the statistics of real images (Zhu & Mumford).


Graphical model potential
Graphical Model & Potential

The Graphical Model.

An undirected graph.

Hidden Markov Model.

The potential. If the gradient in u

becomes too large, then the line process

is activated and the smoothness is cut.


The posterior distribution
The Posterior Distribution

  • We apply Bayes rule to get a posterior distribution:


Line process off and on
Line Process: Off and On

  • Illustration of Line Processes.

No Edge

Edge


Choice of task
Choice of Task.

What do we want to estimate?






Neural networks and the brain
Neural Networks and the Brain

  • An early variant of this algorithm was formulated as a Hopfield network.

  • Koch, Marroquin, Yuille (1987)

  • It is just possible that a variant of this algorithm is implemented in V1 – Prof. Tai Sing Lee (CMU).


Em for mixture of two gaussians
EM for Mixture of two Gaussians

  • A mixture model is of form:


Em for a mixture of two gaussians
EM for a Mixture of two Gaussians

  • Each observation has been generated by one of two Gaussians. But we do not know the parameters (i.e. mean and variance) of the Gaussians and we do not know which Gaussian generated each observation.

  • Colours indicate the assignment of points to clusters (red and blue). Intermediates (e.g. purple) represent probabilistic assignments. The ellipses represent the current parameters values of each cluster.


Expectation maximization summary
Expectation-Maximization: Summary

  • We can apply EM to any inference problem with hidden variables.

  • The following limitations apply:

  • (1) Can we perform the E and M steps? For the image problem, the E step was analytic and the M step required solving linear equations.

  • (2) Does the algorithm converge to the global maximum of P(u|d)? This is true for some problems, but not for all.


Expectation maximization summary1
Expectation Maximization: Summary

  • For an important class of problems – EM has a nice symbiotic relationship with dynamic programming (see next lecture).

  • Mathematically, the EM algorithm falls into a class of optimization techniques known as Majorization (Statistics) and Variational Bounding (Machine Learning). Majorization (De Leeuw) is considerably older…


Belief propagation bp and message passing
Belief Propagation (BP) and Message Passing

  • BP is an inference algorithm that is exact for graphical models defined on trees. It is similar to dynamic programming (see next lecture).

  • It is often known as “loopy BP” when applied to graphs with closed loops.

  • Empirically, it is often a successful approximate algorithm for graphs with closed loops. But it tends to degrade badly when the number of closed loops increases.


Bp and message parsing
BP and Message Parsing

  • We define a distribution (undirected graph)

  • BP comes in two forms: (I) sum-product, and (II) max-product.

  • Sum product (Pearl) is used for estimating the marginal distributions of the variables x.


Message passing sum product
Message Passing: Sum Product

  • Sum-product proceeds by passing messages between nodes.


Message parsing max product
Message Parsing: Max Product

  • The max-product algorithm (Gallager) also uses messages but it replaces the sum by a max.

  • The update rule is:


Beliefs and messages
Beliefs and Messages

  • We construct “beliefs” – estimates of the marginal probabilities – from the messages:

  • For graphical models defined on trees (i.e.no closed loops):

  • (i) sum-product will converge to the marginals of the distribution P(x).

    (ii) max-product converges to the maximum probability states of P(x).

    But this is not very special, because other algorithms do this – see next lecture.


Loopy bp
Loopy BP

  • The major interest in BP is that it performs well empirically when applied to graphs with closed loops.

  • But:

  • (i) convergence is not guaranteed (the algorithm can oscillate)

  • (ii) the resulting beliefs are only approximations to the correct marginals.


Bethe free energy
Bethe Free Energy

  • There is one major theoretical result (Yedidia et al).

  • The fixed points of BP correspond to extrema of the Bethe free energy.

  • The Bethe free energy is one of a set of approximations to the free energy.


Bp without messages
BP without messages.

  • Use the beliefs to construct local approximations B(.) to the distribution.

  • Update beliefs by repeated marginalization


Bp without messages1
BP without messages

  • Local approximations (consistent on trees).


Another viewpoint of bp
Another Viewpoint of BP

  • There is also a relationship between BP and Markov Chain Monte Carlo (MCMC).

  • BP is like a deterministic form of the Gibbs sampler.

  • MCMC will be described in later lectures.


Summary of bp
Summary of BP

  • BP gives exact results on trees (similar to dynamic programming).

  • BP gives surprisingly good approximate results on graphs with loops. No guarantees of convergence, but fixed points of BP correspond to extrema of the Bethe Free energy.

  • BP can be formulated without messages.

  • BP is like a deterministic version of the Gibbs sampler in MCMC.


ad