1 / 21

Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes

Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes. Michal Rosen-Zvi Computer Science Division, UC Berkeley Michael I. Jordan Computer Science Division and the Statistics Department, UC Berkeley. Outline. Introduction The exponential family distributions

leroy
Download Presentation

Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Direct Approach to Cluster Variation Method in Graphs with Discrete Nodes Michal Rosen-Zvi Computer Science Division, UC Berkeley Michael I. Jordan Computer Science Division and the Statistics Department, UC Berkeley

  2. Outline • Introduction • The exponential family distributions • Variational approximation • Gibbs sampling • Undirected graphs • New approach to Gibbs sampling • Belief propagation revisited • The FNA • Directed graphs if time allows 

  3. Estimating marginals of discrete random variables P(x|Q)=exp[QT f(x)-A(Q)] The features In a quadratic model: fij=xixj f i=xi Log partition function The exponential family form

  4. Some definitions Marginalizing over the mth node mm=Sxi\xmP(x| ) x set of random variables • set of parameters mmn=Sxi\{xm xn}P(x| ) xi={0,1} binary

  5. What set is this??? Variational presentation of the log-partition function A(Q)=ln Sxexp[QT f(x)] It is a convex function (A*)*= A A*(m)=supqR|I|{mTq- A(Q)} A(Q)=supmM{mTq- A*(m)}

  6. Dual parameters = marginals m=Eq[f(x)] For discrete random variables, M, is the marginal ploytope defined by M:={m R|I| | p(·) s.t. Sxif(x)p(x)=m} CV Approximations: pseudo marginals, m

  7. Mean field Factorizing the joint probability distribution P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Only one Lagrange multiplier for each node but found by iterative algorithm – the numeric results might not be in the approx. M The objective function is not concave Pseudomarginals set is convex lies withinM

  8. Mean field (cont.) P(x|m)=Pi P(xi |mi)= Pi [mi xi (1- mi )1-xi ] Appr. the canonical parm. + Padding with zeros A*(m)=supqR|I|{mTq- Ai(qi)} For pairwise and single nodes iter. qi = qi + qijmi/2 A(Q)=supmM{mTq- A*(m)} A(Q)=supmM{miqi+mimjqij - milnmi - (1-mi)ln(1-mi)} The objective function is not concave Pseudomarginals set is convex lies within M

  9. Gibbs Sampling • Local updates according to the conditional probability, p(xi=1)= xN(i)p(xN(i))(jN(i)ijxj) (y)=exp(y)/[1+exp(y)] • The measure converges to the Gibbs distribution – the exponential form • All moments are calculated using samples from the equilibrium

  10. p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xi=1,xj=1)=… p(xi=1,xj=1,xk=1)=… Gibbs Sampling – dual space pt+1(xi=1)=(1-1/N)pt(xi=1)+ 1/NxN(i)pt(xN(i))(jN(i)ijxj) A set of 2N fixed point equations yields exact relations between marginals

  11. Gibbs Sampling and Bethe app. p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~xijN(i)p(xixj)/p(xi)|N(i)|-1 mi=xN(i) xi jN(I)p(xixj)/p(xi)|N(i)|-1(jN(i)ijxj) mij=f(mi, mij’ij’) j’ stands for all neighbors of i and j.

  12. Gibbs Sampling and the Factorized Neighbors Algorithm p(xi=1)=xN(i)p(xN(i))(jN(i)ijxj) p(xN(i))= ~jN(i)p(xj) The FNA: mi= xN(i)jN(i)p(xi)(jN(i)ijxj)

  13. The F. N. A. • The approximation is less restricted than MF • The algorithm is not exact on trees • The approximation is more restricted than Bethe Some comparisons for graphs with N nodes M edges and up to n neighbors. Time complexity: MF: O(N) Bethe: O(M) FNA: O(Nexp(n)) Space complexity: MF: O(N) Bethe: O(M) FNA: O(N)

  14. The F. N. A. results on a grid

  15. Pseudomarginals Vs. Exact

  16. Errors i=(mi- mi)2/2

  17. Directed Gibbs sampling and the parents factored app. • As soon as a node is chosen all its descendents are updated • The local updates are according to the parents’ current state. • Factorized parents assumption p(x(i))=j(i) p(xj) p(xi=1)=x(i)j(i) p(xj)(j(i)ijxj)

  18. Directed Gibbs Sampling – dual space pt+1(xi=1)=(1-i)pt(xi=1)+ x(i)[1/Npt(x(i))+ (1/N-i) pt+1(x (i))](j(i)ijxj) p(xi=1)=x(i)p(x(i))(j(i)ijxj) p(xi=1,xj=1)=x(i)\j, x(j)p(x(i)\j, x(j)) (k(j)jkxk) (k(i)\jikxk+ ij)

  19. Back to the CVM approach P(x|m)=Pi P(xi,x(i)|mi, (i))/… Padding with zeros to a higher space A*(m)=entropy of some approx. canonical set For pairwise and single nodes iterations: qi = qi qij =qij qi, (i) =0 A(Q)=supmM{mTq- A*(m)} The objective function is concave Pseudomarginals set is not necessarily within M

  20. Directed lattice

  21. Numerical results parents fact. Evidence: x17=x18=x19=1 FPA makes use of the exact results in the evidence-free graph

More Related