Algorithms for MAP estimation in Markov Random Fields

Algorithms for MAP estimationin Markov Random Fields Vladimir Kolmogorov University College London

Energy function q unary terms (data) pairwise terms (coherence) p - xp are discrete variables (for example, xp{0,1}) - qp(•) are unary potentials - qpq(•,•) are pairwise potentials

Minimisation algorithms • Min Cut / Max Flow [Ford&Fulkerson ‘56] [Grieg, Porteous, Seheult ‘89] : non-iterative (binary variables) [Boykov, Veksler, Zabih ‘99] : iterative - alpha-expansion, alpha-beta swap, … (multi-valued variables) + If applicable, gives very accurate results – Can be applied to a restricted class of functions • BP – Max-product Belief Propagation [Pearl ‘86] + Can be applied to any energy function – In vision results are usually worse than that of graph cuts – Does not always converge • TRW - Max-product Tree-reweighted Message Passing [Wainwright, Jaakkola, Willsky ‘02] , [Kolmogorov ‘05] + Can be applied to any energy function + For stereo finds lower energy than graph cuts + Convergence guarantees for the algorithm in [Kolmogorov ’05]

E E E Main idea: LP relaxation • Goal: Minimize energy E(x) under constraints xp{0,1} • In general, NP-hard problem! • Relax discreteness constraints: allow xp[0,1] • Results in linear program. Can be solved in polynomial time! tight not tight Energy function with discrete variables LP relaxation

E E E Solving LP relaxation • Too large for general purpose LP solvers (e.g. interior point methods) • Solve dual problem instead of primal: • Formulate lower bound on the energy • Maximize this bound • When done, solves primal problem (LP relaxation) • Two different ways to formulate lower bound • Via posiforms: leads to maxflow algorithm • Via convex combination of trees: leads to tree-reweighted message passing Lower bound on the energy function Energy function with discrete variables LP relaxation

Notation and Preliminaries

0 0 3 5 4 1 2 0 0 Energy function - visualisation label 0 label 1 node p edge (p,q) node q

0 0 3 5 4 vector of all parameters 1 2 0 Energy function - visualisation label 0 label 1 0 node p edge (p,q) node q

0 0 + 1 4 1 1 Reparameterisation 0 -1 0 4 5 -1 2

4 4 -1 1 1 -1 0 0 +1 Reparameterisation 0 0 3 • Definition. is a reparameterisation of if they define the same energy: 5 1 2 • Maxflow, BP and TRW perform reparameterisations

Part I: Lower bound viaposiforms ( maxflow algorithm)

- lower bound on the energy: Lower bound via posiforms[Hammer, Hansen, Simeone’84] maximize non-negative

Outline of part I • Maximisation algorithm? • Consider functions of binary variables only • Maximising lower bound for submodular functions • Definition of submodular functions • Overview of min cut/max flow • Reduction to max flow • Global minimum of the energy • Maximising lower bound for non-submodular functions • Reduction to max flow • More complicated graph • Part of optimal solution

Submodular functions of binary variables • Definition:E is submodular if every pairwise term satisfies • Can be converted to “canonical form”: zero cost 5 0 0 1 2 3 4 0 2 1

Overview of min cut/max flow

Min Cut problem source 2 1 Directed weighted graph 1 3 2 4 5 sink

Min Cut problem source Cut: 2 1 S = {source, node 1} T = {sink, node 2, node 3} 1 3 2 4 5 sink

Min Cut problem source • Task: Compute cut with minimum cost Cut: 2 1 S = {source, node 1} T = {sink, node 2, node 3} 1 3 Cost(S,T) = 1 + 1 = 2 2 4 5 sink

Maxflow algorithm source 2 1 1 3 2 4 5 sink value(flow)=0

Maximising lower bound for submodular functions:Reduction to maxflow

Maxflow algorithm and reparameterisation source 2 1 5 0 0 1 3 1 2 3 4 2 4 5 0 2 1 sink 0 value(flow)=0

Maxflow algorithm and reparameterisation source 1 0 0 0 4 0 3 3 3 0 0 0 sink 2 minimum of the energy: value(flow)=2

Maximising lower bound for non-submodular functions

E Arbitrary functions of binary variables maximize non-negative • Can be solved via maxflow [Boros,Hammer,Sun’91] • Specially constructed graph • Gives solution to LP relaxation: for each node xp{0, 1/2, 1} LP relaxation

Arbitrary functions of binary variables 0 0 1/2 1 1 1/2 1 1/2 1/2 Part of optimal solution [Hammer, Hansen, Simeone’84]

Part II: Lower bound viaconvex combination of trees ( tree-reweighted message passing)

Convex combination of trees [Wainwright, Jaakkola, Willsky ’02] • Goal: compute minimum of the energy for q : • In general, intractable! • Obtaining lower bound: • Split q into several components: q = q1+ q2 + ... • Compute minimum for each component: • Combine F(q1), F(q2), ... to get a bound on F(q) • Use trees!

Convex combination of trees (cont’d) graph tree T tree T’ maximize lower bound on the energy

TRW algorithms • Goal: find reparameterisation maximizing lower bound • Apply sequence of different reparameterisation operations: • Node averaging • Ordinary BP on trees • Order of operations? • Affects performance dramatically • Algorithms: • [Wainwright et al. ’02]: parallel schedule • May not converge • [Kolmogorov’05]: specific sequential schedule • Lower bound does not decrease, convergence guarantees

Node averaging 0 4 1 0

Node averaging 2 2 0.5 0.5

Belief propagation (BP) on trees • Send messages • Equivalent to reparameterising node and edge parameters • Two passes (forward and backward)

3 0 Belief propagation (BP) on trees • Key property (Wainwright et al.): Upon termination qp gives min-marginals for node p:

TRW algorithm of Wainwright et al. with tree-based updates (TRW-T) Run BP on all trees “Average” all nodes • If converges, gives (local) maximum of lower bound • Not guaranteed to converge. • Lower bound may go down.

Sequential TRW algorithm (TRW-S)[Kolmogorov’05] Run BP on all trees containing p Pick node p “Average” node p

Main property of TRW-S • Theorem: lower bound never decreases. • Proof sketch: 0 4 1 0

Main property of TRW-S • Theorem: lower bound never decreases. • Proof sketch: 2 2 0.5 0.5

TRW-S algorithm • Particular order of averaging and BP operations • Lower bound guaranteed not to decrease • There exists limit point that satisfies weak tree agreement condition • Efficiency?

Efficient implementation Run BP on all trees containing p Pick node p “Average” node p inefficient?

Algorithms for MAP estimation in Markov Random Fields