1 / 67

Algorithms for MAP estimation in Markov Random Fields

Algorithms for MAP estimation in Markov Random Fields. Vladimir Kolmogorov University College London. Energy function. q. unary terms (data). pairwise terms (coherence). p. - x p are discrete variables (for example, x p {0,1} ) - q p ( • ) are unary potentials

Download Presentation

Algorithms for MAP estimation in Markov Random Fields

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for MAP estimationin Markov Random Fields Vladimir Kolmogorov University College London

  2. Energy function q unary terms (data) pairwise terms (coherence) p - xp are discrete variables (for example, xp{0,1}) - qp(•) are unary potentials - qpq(•,•) are pairwise potentials

  3. Minimisation algorithms • Min Cut / Max Flow [Ford&Fulkerson ‘56] [Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables) [Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables) + If applicable, gives very accurate results – Can be applied to a restricted class of functions • BP – Max-product Belief Propagation [Pearl ‘86] + Can be applied to any energy function – In vision results are usually worse than that of graph cuts – Does not always converge • TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05] + Can be applied to any energy function + For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]

  4. E E E Main idea: LP relaxation • Goal: Minimize energy E(x) under constraints xp{0,1} • In general, NP-hard problem! • Relax discreteness constraints: allow xp[0,1] • Results in linear program. Can be solved in polynomial time! tight not tight Energy function with discrete variables LP relaxation

  5. E E E Solving LP relaxation • Too large for general purpose LP solvers (e.g. interior point methods) • Solve dual problem instead of primal: • Formulate lower bound on the energy • Maximize this bound • When done, solves primal problem (LP relaxation) • Two different ways to formulate lower bound • Via posiforms: leads to maxflow algorithm • Via convex combination of trees: leads to tree-reweighted message passing Lower bound on the energy function Energy function with discrete variables LP relaxation

  6. Notation and Preliminaries

  7. 0 0 3 5 4 1 2 0 0 Energy function - visualisation label 0 label 1 node p edge (p,q) node q

  8. 0 0 3 5 4 vector of all parameters 1 2 0 Energy function - visualisation label 0 label 1 0 node p edge (p,q) node q

  9. 0 0 + 1 4 1 1 Reparameterisation 0 -1 0 4 5 -1 2

  10. 4 4 -1 1 1 -1 0 0 +1 Reparameterisation 0 0 3 • Definition. is a reparameterisation of if they define the same energy: 5 1 2 • Maxflow, BP and TRW perform reparameterisations

  11. Part I: Lower bound viaposiforms ( maxflow algorithm)

  12. - lower bound on the energy: Lower bound via posiforms[Hammer, Hansen, Simeone’84] maximize non-negative

  13. Outline of part I • Maximisation algorithm? • Consider functions of binary variables only • Maximising lower bound for submodular functions • Definition of submodular functions • Overview of min cut/max flow • Reduction to max flow • Global minimum of the energy • Maximising lower bound for non-submodular functions • Reduction to max flow • More complicated graph • Part of optimal solution

  14. Submodular functions of binary variables • Definition:E is submodular if every pairwise term satisfies • Can be converted to “canonical form”: zero cost 5 0 0 1 2 3 4 0 2 1

  15. Overview of min cut/max flow

  16. Min Cut problem source 2 1 Directed weighted graph 1 3 2 4 5 sink

  17. Min Cut problem source Cut: 2 1 S = {source, node 1} T = {sink, node 2, node 3} 1 3 2 4 5 sink

  18. Min Cut problem source • Task: Compute cut with minimum cost Cut: 2 1 S = {source, node 1} T = {sink, node 2, node 3} 1 3 Cost(S,T) = 1 + 1 = 2 2 4 5 sink

  19. Maxflow algorithm source 2 1 1 3 2 4 5 sink value(flow)=0

  20. Maxflow algorithm source 2 1 1 3 2 4 5 sink value(flow)=0

  21. Maxflow algorithm source 1 1 0 3 3 4 4 sink value(flow)=1

  22. Maxflow algorithm source 1 1 0 3 3 4 4 sink value(flow)=1

  23. Maxflow algorithm source 1 0 0 4 3 3 3 sink value(flow)=2

  24. Maxflow algorithm source 1 0 0 4 3 3 3 sink value(flow)=2

  25. Maxflow algorithm source 1 0 0 4 3 3 3 sink value(flow)=2

  26. Maximising lower bound for submodular functions:Reduction to maxflow

  27. Maxflow algorithm and reparameterisation source 2 1 5 0 0 1 3 1 2 3 4 2 4 5 0 2 1 sink 0 value(flow)=0

  28. Maxflow algorithm and reparameterisation source 2 1 5 0 0 1 3 1 2 3 4 2 4 5 0 2 1 sink 0 value(flow)=0

  29. Maxflow algorithm and reparameterisation source 1 1 4 0 0 0 3 0 3 3 4 3 4 4 0 1 1 sink 1 value(flow)=1

  30. Maxflow algorithm and reparameterisation source 1 1 4 0 0 0 3 0 3 3 4 3 4 4 0 1 1 sink 1 value(flow)=1

  31. Maxflow algorithm and reparameterisation source 1 0 3 0 0 0 4 0 3 4 3 3 3 3 0 1 0 sink 2 value(flow)=2

  32. Maxflow algorithm and reparameterisation source 1 0 3 0 0 0 4 0 3 4 3 3 3 3 0 1 0 sink 2 value(flow)=2

  33. Maxflow algorithm and reparameterisation source 1 0 0 0 4 0 3 3 3 0 0 0 sink 2 minimum of the energy: value(flow)=2

  34. Maximising lower bound for non-submodular functions

  35. E Arbitrary functions of binary variables maximize non-negative • Can be solved via maxflow [Boros,Hammer,Sun’91] • Specially constructed graph • Gives solution to LP relaxation: for each node xp{0, 1/2, 1} LP relaxation

  36. Arbitrary functions of binary variables 0 0 1/2 1 1 1/2 1 1/2 1/2 Part of optimal solution [Hammer, Hansen, Simeone’84]

  37. Part II: Lower bound viaconvex combination of trees ( tree-reweighted message passing)

  38. Convex combination of trees [Wainwright, Jaakkola, Willsky ’02] • Goal: compute minimum of the energy for q : • In general, intractable! • Obtaining lower bound: • Split q into several components: q = q1+ q2 + ... • Compute minimum for each component: • Combine F(q1), F(q2), ... to get a bound on F(q) • Use trees!

  39. Convex combination of trees (cont’d) graph tree T tree T’ maximize lower bound on the energy

  40. TRW algorithms • Goal: find reparameterisation maximizing lower bound • Apply sequence of different reparameterisation operations: • Node averaging • Ordinary BP on trees • Order of operations? • Affects performance dramatically • Algorithms: • [Wainwright et al. ’02]: parallel schedule • May not converge • [Kolmogorov’05]: specific sequential schedule • Lower bound does not decrease, convergence guarantees

  41. Node averaging 0 4 1 0

  42. Node averaging 2 2 0.5 0.5

  43. Belief propagation (BP) on trees • Send messages • Equivalent to reparameterising node and edge parameters • Two passes (forward and backward)

  44. 3 0 Belief propagation (BP) on trees • Key property (Wainwright et al.): Upon termination qp gives min-marginals for node p:

  45. TRW algorithm of Wainwright et al. with tree-based updates (TRW-T) Run BP on all trees “Average” all nodes • If converges, gives (local) maximum of lower bound • Not guaranteed to converge. • Lower bound may go down.

  46. Sequential TRW algorithm (TRW-S)[Kolmogorov’05] Run BP on all trees containing p Pick node p “Average” node p

  47. Main property of TRW-S • Theorem: lower bound never decreases. • Proof sketch: 0 4 1 0

  48. Main property of TRW-S • Theorem: lower bound never decreases. • Proof sketch: 2 2 0.5 0.5

  49. TRW-S algorithm • Particular order of averaging and BP operations • Lower bound guaranteed not to decrease • There exists limit point that satisfies weak tree agreement condition • Efficiency?

  50. Efficient implementation Run BP on all trees containing p Pick node p “Average” node p inefficient?

More Related