Genome evolution:. Lecture 8: Belief propagation. You are P(H|our data). I am P(H|all data). You are P(H|our data). Simple Tree: Inference as message passing. s. s. s. s. s. s. s. DATA. We need:. Understanding the tree model (and BNs): reversing edges.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Lecture 8: Belief propagation
I am P(H|all data)
You are P(H|our data)
Simple Tree: Inference as message passing
Understanding the tree model (and BNs): reversing edges
The joint probability of the simple tree model:
Can we change the position of the root and keep the joint probability as is?
If the potentials are condition probabilities, what will be Z?
Not necessarily 1! (can you think of an example?)
Things are difficult when there are several modes
Potentials can be defined on discrete, real valued etc.
it is also common to define general log-linear models directly:
Find the factors parameterization:
No (also not in BN!)
Forward sampling (likelihood weighting):
Structural variational inference:
Directed models are sometimes more natural and easy to understand.
Their popularity stems from their original role as expressing knowledge in AI
They are not very natural for modeling physical phenomena, except for time-dependent processes
Undirected models are analogous to well-developed models in statistical physics (e.g., spin glass models)
We borrow computational ideas from physicists (the guys are big with approximations)
The models are convex which give them important algorithmic properties (Wainwright and Jordan 2003 and further development in recent time)
(any value attainable by xi)->real values
Messages update rules:
Messages from variables to factors:
Messages from factors to variables:
The algorithm proceeds by updating messages:
Initialize all messages to uniform
Iterate until no message change:
Update factors to variables messages
Update variables to factors messages
Beliefs on factor inputs
This is far from mean field, since for example:
BP on Tree = Up-Down
This is not a hypothetical scenario – it frequently happens when there is too much symmetry
For example, most mutational effects are double stranded and so symmetric which can result in loops.
Theorem: beliefs are LBP fixed points if and only if they are locally optimal for the Bethe free energy
Region average energy
Region Free energy
Region-based average energy
Region-based free energy
We compensate for the multiple counting of variables using the multiplicity constant
We can add larger regions
As long as we update the multipliers:
Claim: For valid regions, if the regions’ beliefs are exact:
then the average region-based energy is exact:
We cannot guarantee much on the region-based entropy:
Claim: the region-based entropy is exact when the model is a uniform distribution
Proof: exercise. This means that the entropy count the correct number of degrees of freedom – e.g. for binary variables, H=Nlog2
Definition: a region based free energy approximation is said to be max-ent normal if its region-based entropy is maximized when the beliefs are uniform.
An non max-ent approximation can minimize the region free energy by selecting erroneously high entropy beliefs!
Claim: The Bethe regions gives a max-ent normal approximation (i.e. it maximize the region-based entropy on the uniform distribution)
(maximal on uniform)
(nonnegative, and 0 on uniform)
Start with a complete graph and binary factors
Add all variable triplets, pairs and singleton as regions
triplets = 1 (20 overall)
pairs = -3 (15 overall)
singletons = 6 (6 overall) ( guarantee consistency)
Look at the consistent beliefs:
The Region entropy (for any region) = ln2. The total region entropy is:
We claimed before the entropy of the uniform distribution will be exact: 6ln2
We want to solve a variational problem:
While enforcing constraints on the regions’ beliefs:
Unlike the structured variational approximation we discussed before, and although the beliefs are (regionally) compatible, we can have cases with optimal beliefs that are not representing a true global posterior distribution
Optimal region beliefs are identical to the factors:
It can be shown that this cannot be the result of any joint distribution on the three variables
(note the negative feedback loop here)
Claim: When it converges, LBP finds a minimum of the Bethe free energy.
Proof idea: we have an optimization problem (minimum energy) with constraints (beliefs are consistent and adds up to 1). We write down a Lagrangian that expresses both minimization goal and constraints, and show that it is minimized when the LBP update rules are holding.
Important technical point: we shall assume that in the fixed point all beliefs are non zero. This can be shown to hold if all factors are “soft” (do not contain zero values for any assignment).
Large region beliefs are normalized
Variable region beliefs are normalized
Take the derivatives with respect to each ba and bi:
So here are the conditions:
And we can solve them if:
We saw before these conditions, with the marginalization constraint, are generating the update rules! So L minimum -> LBP fixed point is proven.
The other direction quite direct – see Exercise
LBP is in fact computing the lagrange multipliers – a very powerful observation
A region graph is graph on subsets of nodes in the factor graph, with valid multipliers (as defined above)
D(R) – Decedents of R
P(R) – Parents of R
D(R) – Decedents of R
P(R) – Parents of R
N(I,J) = I not in D(P)+P
J in D(P)+P but not D(R)+R
D(I,J) = I in D(P)+P but not D(R)+R
J in D(R)+R
LBP is very attractive for users: really simple to implement, very fast
LBP performance is limited by the size of region assignments Xa which can grow rapidly with the factor’s degrees or the size of large regions
GLBP will be powerful when large regions can capture significant dependencies that are not captured by individual factors – think small positive loop or other symmetric effects
LBP messages can be computed synchronously (factors->variables->factors…), other scheduling options may boost up performance considerably
LBP is just one (quite indirect) way by which Bethe energies can be minimized. Other approaches are possible – which can be guaranteed to converge
The Bethe/Region energy minimization can be further constraint to force beliefs are realizable. This gives rise to the concept of Wainwright-Jordan marginal polytope and convex algorithms on it.