Extending expectation propagation for graphical models
This presentation is the property of its rightful owner.
Sponsored Links
1 / 71

Extending Expectation Propagation for Graphical Models PowerPoint PPT Presentation


  • 71 Views
  • Uploaded on
  • Presentation posted in: General

Extending Expectation Propagation for Graphical Models. Yuan (Alan) Qi Joint work with Tom Minka. Motivation. Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics.

Download Presentation

Extending Expectation Propagation for Graphical Models

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Extending expectation propagation for graphical models

Extending Expectation Propagation for Graphical Models

Yuan (Alan) Qi

Joint work with Tom Minka


Motivation

Motivation

  • Graphical models are widely used in real-world applications, such as wireless communications and bioinformatics.

  • Inference techniques on graphical models often sacrifice efficiency for accuracy or sacrifice accuracy for efficiency.

  • Need a method that better balances the trade-off between accuracy and efficiency.


Motivation1

What we want

Motivation

Current

Techniques

Error

Computational Time


Outline

Outline

  • Background on expectation propagation (EP)

  • Extending EP on Bayesian networks for dynamic systems

    • Poisson tracking

    • Signal detection for wireless communications

  • Tree-structured EP on loopy graphs

  • Conclusions and future work


Outline1

Outline

  • Background on expectation propagation (EP)

  • Extending EP on Bayesian networks for dynamic systems

    • Poisson tracking

    • Signal detection for wireless communications

  • Tree-structured EP on loopy graphs

  • Conclusions and future work


Graphical models

x1

x2

x1

x2

y1

y2

y1

y2

Graphical Models


Inference on graphical models

Inference on Graphical Models

  • Bayesian inference techniques:

    • Belief propagation (BP): Kalman filtering /smoothing, forward-backward algorithm

    • Monte Carlo: Particle filter/smoothers, MCMC

  • Loopy BP: typically efficient, but not accurate on general loopy graphs

  • Monte Carlo: accurate, but often not efficient


Expectation propagation in a nutshell

Expectation Propagation in a Nutshell

  • Approximate a probability distribution by simpler parametric terms:

    For directed graphs:

    For undirected graphs:

  • Each approximation term lives in an exponential family (e.g. Gaussian)


Ep in a nutshell

EP in a Nutshell

  • The approximate term minimizes the following KL divergence by moment matching:

Where the leave-one-out approximation is


Limitations of plain ep

Limitations of Plain EP

  • Can be difficult or expensive to analytically compute the needed moments in order to minimize the desired KL divergence.

  • Can be expensive to compute and maintain a valid approximation distribution q(x), which is coherent under marginalization.

    • Tree-structured q(x):


Three extensions

Three Extensions

1. Instead of choosing the approximate term to minimize the following KL divergence:

use other criteria.

2. Use numerical approximation to compute moments: Quadrature or Monte Carlo.

3. Allow the tree-structured q(x) to be non-coherentduring the iterations. It only needs to be coherent in the end.


Efficiency vs accuracy

Efficiency vs. Accuracy

Loopy BP (Factorized EP)

Error

Extended EP ?

Monte Carlo

Computational Time


Outline2

Outline

  • Background on expectation propagation (EP)

  • Extending EP on Bayesian networks for dynamic systems

    • Poisson tracking

    • Signal detection for wireless communications

  • Tree-structured EP on loopy graphs

  • Conclusions and future work


Object tracking

Object Tracking

Guess the position of an object given noisy observations

Object


Bayesian network

x1

x2

xT

y1

y2

yT

Bayesian Network

e.g.

(random walk)

want distribution of x’s given y’s


Approximation

Approximation

Factorized and Gaussian in x


Message interpretation

xt

yt

Message Interpretation

= (forward msg)(observation msg)(backward msg)

Forward Message

Backward Message

Observation Message


Ep on dynamic systems

EP on Dynamic Systems

  • Filtering: t = 1, …, T

    • Incorporate forward message

    • Initialize observation message

  • Smoothing: t = T, …, 1

    • Incorporate the backward message

    • Compute the leave-one-out approximation by dividing out the old observation messages

    • Re-approximate the new observation messages

  • Re-filtering: t = 1, …, T

    • Incorporate forward and observation messages


Extension of ep

Extension of EP

  • Instead of matching moments, use any method for approximate filtering.

    • Examples: statistical linearization, unscented Kalman filter (UKF), mixture of Kalman filters

      Turn any filtering method into a smoothing method!

  • All methods can be interpreted as finding linear/Gaussian approximations to original terms.


Example poisson tracking

Example: Poisson Tracking

  • is an integer valued Poisson variate with mean


Poisson tracking model

Poisson Tracking Model


Extension of ep approximate observation message

Extension of EP: Approximate Observation Message

  • is not Gaussian

  • Moments of x not analytic

  • Two approaches:

    • Gauss-Hermite quadrature for moments

    • Statistical linearization instead of moment-matching (Turn unscented Kalman filters into a smoothing method)

  • Both work well


Approximate vs exact posterior

Approximate vs. Exact Posterior

p(xT|y1:T)

xT


Extended ep vs monte carlo accuracy

Extended EP vs. Monte Carlo: Accuracy

Mean

Variance


Accuracy efficiency tradeoff

Accuracy/Efficiency Tradeoff


Ep for digital wireless communication

EP for Digital Wireless Communication

  • Signal detection problem

  • Transmitted signal st =

  • vary to encode each symbol

  • Complex representation:

Im

Re


Binary symbols gaussian noise

Binary Symbols, Gaussian Noise

  • Symbols are 1 and –1 (in complex plane)

  • Received signal yt =

  • Optimal detection is easy


Fading channel

Fading Channel

  • Channel systematically changes amplitude and phase:

  • changes over time


Benchmark differential detection

Benchmark: Differential Detection

  • Classical technique

  • Use previous observation to estimate state

  • Binary symbols only


Bayesian network for signal detection

s2

sT

s1

y1

yT

y2

x1

xT

x2

Bayesian network for Signal Detection


Extended ep joint signal detector and channel estimation

Extended-EP Joint Signal Detector and Channel Estimation

  • Turn mixture of Kalman filters into a smoothing method

  • Smoothing over the last observations

  • Observations before act as prior for the current estimation


Computational complexity

Computational Complexity

  • Expectation propagation O(nLd2)

  • Stochastic mixture of Kalman filters O(LMd2)

  • Rao-blackwised particle smoothersO(LMNd2)

    • n: Number of EP iterations (Typically, 4 or 5)

    • d: Dimension of the parameter vector

    • L: Smooth window length

    • M: Number of samples in filtering (Often larger than 500)

    • N: Number of samples in smoothing (Larger than 50)

    • EP is about 5,000 times faster than Rao-blackwised particle smoothers.


Experimental results

Experimental Results

(Chen, Wang, Liu 2000)

Signal-Noise-Ratio

Signal-Noise-Ratio

EP outperforms particle smoothers in efficiency with comparable accuracy.


Bayesian networks for adaptive decoding

e2

eT

e1

y1

yT

y2

x1

xT

x2

Bayesian Networks for Adaptive Decoding

The information bits et are coded by a convolutional error-correcting encoder.


Ep outperforms viterbi decoding

EP Outperforms Viterbi Decoding

Signal-Noise-Ratio


Outline3

Outline

  • Background on expectation propagation (EP)

  • Extending EP on Bayesian networks for dynamic systems

    • Poisson tracking

    • Signal detection for wireless communications

  • Tree-structured EP on loopy graphs

  • Conclusions and future work


Inference on loopy graphs

X1

X2

X3

X4

X5

X6

X7

X8

X9

X10

X11

X12

X13

X14

X15

X16

Inference on Loopy Graphs

Problem: estimate marginal distributions of the variables indexed by the nodes in a loopy graph, e.g., p(xi), i = 1, . . . , 16.


4 node loopy graph

4-node Loopy Graph

Joint distribution is product of pairwise potentials for all edges:

Want to approximate by a simpler distribution


Bp vs treeep

BP vs. TreeEP

TreeEP

BP


Junction tree representation

Junction Tree Representation

p(x) q(x) Junction tree

p(x) q(x) Junction tree


Two kinds of edges

Two Kinds of Edges

  • On-tree edges, e.g., (x1,x4):exactly incorporated into the junction tree

  • Off-tree edges, e.g., (x1,x2): approximated by projecting them onto the tree structure


Kl minimization

KL Minimization

  • KL minimization moment matching

  • Match single and pairwise marginals of

  • Reduces to exact inference on single loops

    • Use cutset conditioning

and


Matching marginals on graph

x5 x7

x5 x7

x1 x2

x1 x2

x1 x3

x1 x4

x3 x6

x3 x5

x1 x3

x3 x5

x3 x6

x1 x4

x1 x2

x1 x3

x1 x4

x3 x5

x3 x4

x5 x7

x3 x6

x1 x2

x1 x2

x1 x3

x1 x4

x1 x3

x1 x4

x3 x5

x3 x5

x5 x7

x3 x6

x5 x7

x3 x6

x6 x7

Matching Marginals on Graph

(1) Incorporate edge (x3 x4)

(2) Incorporate edge (x6 x7)


Drawbacks of global propagation

Drawbacks of Global Propagation

  • Update all the cliques even when only incorporating one off-tree edge

    • Computationally expensive

  • Store each off-tree data message as a whole tree

    • Require large memory size


Solution local propagation

Solution: Local Propagation

  • Allow q(x) be non-coherentduring the iterations. It only needs to be coherent in the end.

  • Exploit the junction tree representation: only locallypropagate information within the minimal loop (subtree) that is directly connected to the off-tree edge.

    • Reduce computational complexity

    • Save memory


Extending expectation propagation for graphical models

x1 x2

x5 x7

x1 x2

x1 x2

x5 x7

x5 x7

x1 x2

x1 x3

x1 x4

x3 x5

x1 x3

x3 x6

x3 x6

x3 x5

x1 x3

x1 x4

x1 x4

x3 x5

x3 x6

x1 x3

x1 x4

x3 x4

x3 x5

x3 x5

x5 x7

x3 x6

x6 x7

(1) Incorporate edge(x3 x4)

(2) Propagate evidence

On this simple graph, local propagation runs roughly 2 times faster and uses 2 times less memory to store messages than plain EP

(3) Incorporate edge (x6 x7)


New interpretation of treeep

New Interpretation of TreeEP

  • Marry EP with Junction algorithm

  • Can perform efficiently over hypertrees and hypernodes


4 node graph

4-node Graph

  • TreeEP = the proposed method

  • GBP = generalized belief propagation on triangles

  • TreeVB = variational tree

  • BP = loopy belief propagation = Factorized EP

  • MF = mean-field


Fully connected graphs

Fully-connected graphs

  • Results are averaged over 10 graphs with randomly generated potentials

  • TreeEP performs the same or better than all other methods in both accuracy and efficiency!


8x8 grids 10 trials

8x8 grids, 10 trials


Treeep versus bp and gbp

TreeEP versus BP and GBP

  • TreeEP is always more accurate than BP and is often faster

  • TreeEP is much more efficient than GBP and more accurate on some problems

  • TreeEP converges more often than BP and GBP


Outline4

Outline

  • Background on expectation propagation (EP)

  • Extending EP on Bayesian networks for dynamic systems

    • Poisson tracking

    • Signal detection for wireless communications

  • Tree-structured EP on loopy graphs

  • Conclusions and future work


Conclusions

Conclusions

  • Extend EP on graphical models:

    • Instead of minimizing KL divergence, use other sensible criteria to generate messages. Effectively turn any filtering method into a smoothing method.

    • Use quadrature to approximate messages.

    • Local propagation to save the computation and memory in tree structured EP.


Conclusions1

Extended EP

Conclusions

State-of-art

Techniques

  • Extended EP algorithms outperform state-of-art inference methods on graphical models in the trade-off between accuracy and efficiency

Error

Computational Time


Future work

Future Work

  • More extensions of EP:

    • How to choose a sensible approximation family (e.g. which tree structure)

    • More flexible approximation: mixture of EP?

    • Error bound?

    • Bayesian conditional random fields

  • More real-world applications


Extending expectation propagation for graphical models

End

Contact information:

[email protected]


Extended ep accuracy improves significantly in only a few iterations

Extended EP Accuracy Improves Significantly in only a Few Iterations


Ep versus bp

EP versus BP

  • EP approximation is in a restricted family, e.g. Gaussian

  • EP approximation does not have to be factorized

  • EP applies to many more problems

    • e.g. mixture of discrete/continuous variables


Ep versus monte carlo

EP versus Monte Carlo

  • Monte Carlo is general but expensive

  • EP exploits underlying simplicity of the problem if it exists

  • Monte Carlo is still needed for complex problems (e.g. large isolated peaks)

  • Trick is to know what problem you have


Loopy belief propagation

(Loopy) Belief propagation

  • Specialize to factorized approximations:

  • Minimize KL-divergence = match marginals of (partially factorized) and (fully factorized)

    • “send messages”

“messages”


Limitation of bp

Limitation of BP

  • If the dynamics or measurements are not linear and Gaussian, the complexity of the posterior increases with the number of measurements

  • I.e. BP equations are not “closed”

    • Beliefs need not stay within a given family

*

* or any other exponential family


Approximate filtering

Approximate filtering

  • Compute a Gaussian belief which approximates the true posterior:

  • E.g. Extended Kalman filter, statistical linearization, unscented filter, assumed-density filter


Ep perspective

EP perspective

  • Approximate filtering is equivalent to replacing true measurement/dynamics equations with linear/Gaussian equations

Gaussian

implies

Gaussian


Ep perspective1

EP perspective

  • EKF, UKF, ADF are all algorithms for:

Linear,

Gaussian

Nonlinear,

Non-Gaussian


Terminology

Terminology

  • Filtering: p(xt|y1:t )

  • Smoothing: p(xt|y1:t+L ) where L>0

  • On-line: old data is discarded (fixed memory)

  • Off-line: old data is re-used (unbounded memory)


Kalman filtering belief propagation

Kalman filtering / Belief propagation

  • Prediction:

  • Measurement:

  • Smoothing:


Approximate an edge by a tree

Approximate an Edge by a Tree

Each potential f ain p is projected onto the tree-structure of q

Correlations between two nodes are not lost, but projected onto the tree


Graphical models1

x1

x2

x1

x2

y1

y2

y1

y2

x1

x2

x1

x2

y1

y2

y1

y2

Graphical Models


Ep on dynamic systems1

x1

x2

x1

x2

y1

y2

y1

y2

x1

x2

x1

x2

y1

y2

y1

y2

EP on Dynamic Systems


Ep on boltzman machines

x1

x2

x1

x2

y1

y2

y1

y2

x1

x2

x1

x2

y1

y2

y1

y2

EP on Boltzman machines


Future work1

x1

x2

x1

x2

y1

y2

y1

y2

x1

x2

x1

x2

y1

y2

y1

y2

Future Work


  • Login