Tractable Higher Order Models in Computer Vision ( Part II )

Tractable Higher Order Models in Computer Vision (Part II) Presented by Xiaodan Liang Slides from Carsten Rother,Sebastian Nowozin, PusohmeetKhli Microsoft Research Cambridge

Part II • Submodularity • Move making algorithms • Higher-order model : Pn Potts model

Feature selection

Factoring distributions Problem inherently combinatorial!

Example: Greedy algorithm for feature selection

Key property: Diminishing returns Selection A = {} Selection B = {X2,X3} Y“Sick” Y“Sick” X2“Rash” X3“Male” X1“Fever” Adding X1will help a lot! Adding X1doesn’t help much Theorem [Krause, Guestrin UAI ‘05]: Information gain F(A) in Naïve Bayes models is submodular! New feature X1 + s B Large improvement Submodularity: A + s Small improvement

~63% Why is submodularity useful? Theorem [Nemhauser et al ‘78] Greedy maximization algorithm returns Agreedy: F(Agreedy) ¸ (1-1/e) max|A| k F(A) • Greedy algorithm gives near-optimal solution! • For info-gain: Guarantees best possible unless P = NP! [Krause, Guestrin UAI ’05]

Submodularity in Machine Learning • Many ML problems are submodular, i.e., for F submodular require: • Minimization: A* = argmin F(A) • Structure learning (A* = argmin I(XA; XV\A)) • Clustering • MAP inference in Markov Random Fields • … • Maximization: A* = argmax F(A) • Feature selection • Active learning • Ranking • …

Set functions

A [ B AÅB Submodular set functions • Set function F on V is called submodular if • Equivalent diminishing returns characterization: + ¸ + B A + S B Large improvement Submodularity: A + S Small improvement

Submodularity and supermodularity

Example: Mutual information

Closedness properties F1,…,Fm submodular functions on V and 1,…,m > 0 Then: F(A) = ii Fi(A) is submodular! Submodularity closed under nonnegative linear combinations! Extremely useful fact!! • F(A) submodular ) P() F(A) submodular! • Multicriterion optimization: F1,…,Fm submodular, i¸0 )i i Fi(A) submodular

Submodularity and Concavity g(|A|) |A|

Maximum of submodular functions Suppose F1(A) and F2(A) submodular. Is F(A) = max(F1(A),F2(A))submodular? F(A) = max(F1(A),F2(A)) F1(A) F2(A) |A| max(F1,F2) not submodular in general!

Minimum of submodular functions Well, maybe F(A) = min(F1(A),F2(A)) instead? F({b}) – F(;)=0 < F({a,b}) – F({a})=1 min(F1,F2) not submodular in general! But stay tuned

Submodularity and convexity

x{b} 2 1 x{a} -1 0 1 -2 The submodular polyhedron PF Example: V = {a,b} x({b}) · F({b}) PF x({a,b}) · F({a,b}) x({a}) · F({a})

Lovasz extension

w{b} 2 1 w{a} -1 0 1 -2 Example: Lovasz extension g(w) = max {wT x: x2PF} g([0,1]) = [0,1]T [-2,2] = 2 = F({b}) g([1,1]) = [1,1]T [-1,1] = 0 = F({a,b}) [-2,2] {b} {a,b} [-1,1] w=[0,1]want g(w) {} {a} Greedy ordering:e1 = b, e2 = a  w(e1)=1 > w(e2)=0 xw(e1)=F({b})-F(;)=2 xw(e2)=F({b,a})-F({b})=-2  xw=[-2,2]

Why is this useful? Theorem [Lovasz ’83]:g(w) attains its minimum in [0,1]n at a corner! If we can minimize g on [0,1]n, can minimize F…(at corners, g and F take same values) g(w) convex (and efficient to evaluate) F(A) submodular Does the converse also hold? No, consider g(w1,w2,w3) = max(w1,w2+w3) {a} {b} {c} F({a,b})-F({a})=0 < F({a,b,c})-F({a,c})=1

Minimizing a submodular function Ellipsoid algorithm Interior Points algorithm

Example: Image denoising

Y1 Y2 Y3 X1 X2 X3 Y4 Y5 Y6 X4 X5 X6 Y7 Y8 Y9 X7 X8 X9 Example: Image denoising Pairwise Markov Random Field P(x1,…,xn,y1,…,yn) = i,ji,j(yi,yj) ii(xi,yi) Wantargmaxy P(y | x) =argmaxy log P(x,y) =argminyi,j Ei,j(yi,yj)+i Ei(yi) Ei,j(yi,yj) = -log i,j(yi,yj) Xi: noisy pixels Yi: “true” pixels When is this MAP inference efficiently solvable(in high treewidth graphical models)?

MAP inference in Markov Random Fields[Kolmogorov et al, PAMI ’04, see also: Hammer, Ops Res ‘65]

Constrained minimization

Multi-Label problems

Move making expansions move and swap move for this problem

Metric and Semi metric Potential functions

if the pairwise potential functions define a metric then the energy function in equation (8) can be approximately minimized using alpha expansions. • if pairwise potential functions defines a semi-metric, it can be minimized using alpha beta-swaps.

Move Energy • Each move: • A transformation function: • The energy of a move t: • The optimal move: Submodular set functions play an important role in energy minimization as they can be minimized in polynomial time

The swap move algorithm

The expansion move algorithm

Higher order potential • The class of higher order clique potentials for which the expansion and swap moves can be computed in polynomial time The clique potential take the form:

Question you should be asking: • Show that move energy is submodularfor all xc Can my higher order potential be solved using α-expansions?

Moves for Higher Order Potentials • Form of the Higher Order Potentials Clique Inconsistency function: Pairwise potential: xj xi xk Sum Form c xm xl Max Form

Theoretical Results: Swap • Move energy is always submodular if non-decreasing concave. proofs

Condition for Swap move Concave Function:

Prove • all projections on two variables of any alpha beta-swap move energy are submodular. • The cost of any configuration

substitute Constraints 1: Lema 1: Constraints2: The theorem is true

Condition for alpha expansion • Metric:

Moves for Higher Order Potentials • Form of the Higher Order Potentials Clique Inconsistency function: Pairwise potential: xj xi xk Sum Form c xm xl Max Form

Image Segmentation n = number of pixels E(X) = ∑ ci xi + ∑dij|xi-xj| E: {0,1}n→R 0 →fg, 1→bg i i,j Image Segmentation Unary Cost [Boykov and Jolly ‘ 01] [Blake et al. ‘04] [Rotheret al.`04]

Pn Potts Potentials Patch Dictionary (Tree) { 0 if xi = 0, i ϵ p Cmax otherwise h(Xp) = Cmax 0 p • [slide credits: Kohli]

Pn Potts Potentials n = number of pixels E: {0,1}n→R 0 →fg, 1→bg E(X) = ∑ ci xi+ ∑dij|xi-xj| +∑hp(Xp) i i,j p { 0 if xi = 0, i ϵ p Cmax otherwise h(Xp) = p • [slide credits: Kohli]

Theoretical Results: Expansion • Move energy is always submodular if increasing linear See paper for proofs

PN Potts Model c

Tractable Higher Order Models in Computer Vision ( Part II )

Tractable Higher Order Models in Computer Vision ( Part II )

Presentation Transcript

Statistical Models of Appearance for Computer Vision

Motion in Computer Vision

Graphical Models in Vision .

Multidimensionality and Higher-Order Factor Models

Higher Order Derivatives

Hierarchical Models of Vision: Machine Learning/Computer Vision

Higher order programming

6 DIFFERENT MODELS of Higher Order Thinking

Higher-order functions in ML

Higher-Order Functions

Higher-order functions in OCaml

HIGHER ORDER THINKING

Higher-Order Functions

Higher Order Linear Models

Higher Order Tries

Attention in Computer Vision

Higher Order Aberration

Higher Order Thinking

Higher order derivatives

Higher Order Functions

Higher Order Surfaces

Higher Order Tries