Divergence measures and message passing
Download
1 / 42

Divergence measures and message passing - PowerPoint PPT Presentation


  • 202 Views
  • Uploaded on

Divergence measures and message passing. Tom Minka Microsoft Research Cambridge, UK. with thanks to the Machine Learning and Perception Group. Message-Passing Algorithms. Outline. Example of message passing Interpreting message passing Divergence measures

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Divergence measures and message passing' - tea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Divergence measures and message passing l.jpg

Divergence measures and message passing

Tom Minka

Microsoft Research

Cambridge, UK

with thanks to the Machine Learning and Perception Group



Outline l.jpg
Outline

  • Example of message passing

  • Interpreting message passing

  • Divergence measures

  • Message passing from a divergence measure

  • Big picture


Outline4 l.jpg
Outline

  • Example of message passing

  • Interpreting message passing

  • Divergence measures

  • Message passing from a divergence measure

  • Big picture


Estimation problem l.jpg
Estimation Problem

b

y

d

e

x

f

z

a

c


Estimation problem6 l.jpg

0

1

0

1

0

1

?

?

?

Estimation Problem

b

y

d

e

x

f

z

a

c



Estimation problem8 l.jpg
Estimation Problem

Queries:

Want to do these quickly



Belief propagation10 l.jpg
Belief Propagation

Final

y

x

z


Belief propagation11 l.jpg
Belief Propagation

Marginals:

(Exact)

(BP)

Normalizing constant:

0.45 (Exact)

0.44 (BP)

Argmax:

(0,0,0) (Exact)

(0,0,0) (BP)


Outline12 l.jpg
Outline

  • Example of message passing

  • Interpreting message passing

  • Divergence measures

  • Message passing from a divergence measure

  • Big picture


Message passing distributed optimization l.jpg
Message Passing = Distributed Optimization

  • Messages represent a simpler distribution q(x) that approximates p(x)

    • A distributed representation

  • Message passing = optimizing q to fit p

    • q stands in for p when answering queries

  • Parameters:

    • What type of distribution to construct (approximating family)

    • What cost to minimize (divergence measure)


How to make a message passing algorithm l.jpg
How to make a message-passing algorithm

  • Pick an approximating family

    • fully-factorized, Gaussian, etc.

  • Pick a divergence measure

  • Construct an optimizer for that measure

    • usually fixed-point iteration

  • Distribute the optimization across factors


Outline15 l.jpg
Outline

  • Example of message passing

  • Interpreting message passing

  • Divergence measures

  • Message passing from a divergence measure

  • Big picture


Slide16 l.jpg

Let p,q be unnormalized distributions

Kullback-Leibler (KL) divergence

Alpha-divergence ( is any real number)

Asymmetric, convex



Minimum alpha divergence l.jpg
Minimum alpha-divergence

q is Gaussian, minimizes D(p||q)

= -1


Minimum alpha divergence19 l.jpg
Minimum alpha-divergence

q is Gaussian, minimizes D(p||q)

= 0


Minimum alpha divergence20 l.jpg
Minimum alpha-divergence

q is Gaussian, minimizes D(p||q)

= 0.5


Minimum alpha divergence21 l.jpg
Minimum alpha-divergence

q is Gaussian, minimizes D(p||q)

= 1


Minimum alpha divergence22 l.jpg
Minimum alpha-divergence

q is Gaussian, minimizes D(p||q)

= 1


Properties of alpha divergence l.jpg
Properties of alpha-divergence

  • £ 0 seeks the mode with largest mass (not tallest)

    • zero-forcing: p(x)=0 forces q(x)=0

    • underestimates the support of p

  • ¸ 1 stretches to cover everything

    • inclusive: p(x)>0 forces q(x)>0

    • overestimates the support of p

[Frey,Patrascu,Jaakkola,Moran 00]


Structure of alpha space l.jpg
Structure of alpha space

inclusive (zero

avoiding)

zero

forcing

BP,

EP

MF

0

1

TRW

FBP,

PEP


Other properties l.jpg
Other properties

  • If q is an exact minimum of alpha-divergence:

  • Normalizing constant:

  • If a=1: Gaussian q matches mean,variance of p

    • Fully factorized q matches marginals of p


Two node example l.jpg
Two-node example

x

y

  • q is fully-factorized, minimizes a-divergence to p

  • q has correct marginals only for  = 1 (BP)


Two node example27 l.jpg
Two-node example

Bimodal

distribution

= 1 (BP)

  • = 0 (MF)

  •  0.5


Two node example28 l.jpg
Two-node example

Bimodal

distribution

= 1


Lessons l.jpg
Lessons

  • Neither method is inherently superior – depends on what you care about

  • A factorized approx does not imply matching marginals (only for =1)

  • Adding y to the problem can change the estimated marginal for x (though true marginal is unchanged)


Outline30 l.jpg
Outline

  • Example of message passing

  • Interpreting message passing

  • Divergence measures

  • Message passing from a divergence measure

  • Big picture



Distributed divergence minimization32 l.jpg
Distributed divergence minimization

  • Write p as product of factors:

  • Approximate factors one by one:

  • Multiply to get the approximation:


Global divergence to local divergence l.jpg
Global divergence to local divergence

  • Global divergence:

  • Local divergence:


Message passing l.jpg
Message passing

  • Messages are passed between factors

  • Messages are factor approximations:

  • Factor a receives

    • Minimize local divergence to get

    • Send to other factors

    • Repeat until convergence

  • Produces all 6 algs


Global divergence vs local divergence l.jpg
Global divergence vs. local divergence

In general, local ¹ global

  • but results are similar

  • BP doesn’t minimize global KL, but comes close

MF

0

local ¹ global

local = global

no loss from message passing


Experiment l.jpg
Experiment

  • Which message passing algorithm is best at minimizing global Da(p||q)?

  • Procedure:

  • Run FBP with various aL

  • Compute global divergence for various aG

  • Find best aL (best alg) for each aG


Results l.jpg
Results

  • Average over 20 graphs, random singleton and pairwise potentials: exp(wij xi xj)

  • Mixed potentials (w ~ U(-1,1)):

    • best aL = aG (local should match global)

    • FBP with same a is best at minimizing Da

      • BP is best at minimizing KL


Outline38 l.jpg
Outline

  • Example of message passing

  • Interpreting message passing

  • Divergence measures

  • Message passing from a divergence measure

  • Big picture


Hierarchy of algorithms l.jpg
Hierarchy of algorithms

  • Power EP

  • exp family

  • D(p||q)

  • Structured MF

  • exp family

  • KL(q||p)

  • FBP

  • fully factorized

  • D(p||q)

  • EP

  • exp family

  • KL(p||q)

  • MF

  • fully factorized

  • KL(q||p)

  • TRW

  • fully factorized

  • D(p||q),a>1

  • BP

  • fully factorized

  • KL(p||q)


Matrix of algorithms l.jpg
Matrix of algorithms

  • MF

  • fully factorized

  • KL(q||p)

  • Structured MF

  • exp family

  • KL(q||p)

  • TRW

  • fully factorized

  • D(p||q),a>1

approximation family

Other families?

(mixtures)

divergence

measure

  • BP

  • fully factorized

  • KL(p||q)

  • EP

  • exp family

  • KL(p||q)

  • FBP

  • fully factorized

  • D(p||q)

  • Power EP

  • exp family

  • D(p||q)

Other

divergences?


Other message passing algorithms l.jpg
Other Message Passing Algorithms

Do they correspond to divergence measures?

  • Generalized belief propagation [Yedidia,Freeman,Weiss 00]

  • Iterated conditional modes [Besag 86]

  • Max-product belief revision

  • TRW-max-product [Wainwright,Jaakkola,Willsky 02]

  • Laplace propagation [Smola,Vishwanathan,Eskin 03]

  • Penniless propagation [Cano,Moral,Salmerón 00]

  • Bound propagation [Leisink,Kappen 03]


Future work l.jpg
Future work

  • Understand existing message passing algorithms

  • Understand local vs. global divergence

  • New message passing algorithms:

    • Specialized divergence measures

    • Richer approximating families

  • Other ways to minimize divergence


ad