Fast Parallel and Adaptive Updates
Download
1 / 37

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers - PowerPoint PPT Presentation


  • 78 Views
  • Uploaded on

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers. Ozgur Sumer, U. Chicago Umut Acar , MPI-SWS Alexander Ihler , UC Irvine Ramgopal Mettu , UMass Amherst. Graphical models. Structured ( neg ) energy function Goal: Examples. Pairwise :. A. C. A. C. B. A.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers' - ronald


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Fast Parallel and Adaptive Updates

for Dual-Decomposition Solvers

Ozgur Sumer, U. Chicago

UmutAcar, MPI-SWS

Alexander Ihler, UC Irvine

RamgopalMettu, UMass Amherst


Graphical models
Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B


Graphical models1
Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

    • Stereo depth

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Stereo image pair

MRF model

Depth


Graphical models2
Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

    • Stereo depth

    • Protein design & prediction

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B


Graphical models3
Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

    • Stereo depth

    • Protein design & prediction

    • Weighted constraint satisfaction problems

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B



Dual decomposition methods1
Dual decomposition methods

  • Decompose graph into smaller subproblems

  • Solve each independently; optimistic bound

  • Exact if all copies agree

Original

Decomposition


Dual decomposition methods2
Dual decomposition methods

  • Decompose graph into smaller subproblems

  • Solve each independently; optimistic bound

  • Exact if all copies agree

  • Enforce lost equality constraints via Langrange multipliers

Original

Decomposition


Dual decomposition methods3
Dual decomposition methods

Same bound by different names

  • Dual decomposition (Komodakis et al. 2007)

  • TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)

  • Soft arc consistency (Cooper & Schiex 2004)

Original

Decomposition


Dual decomposition methods4

Relaxed

problems

Energy

MAP

Consistent solutions

Dual decomposition methods

Original

Decomposition


Optimizing the bound
Optimizing the bound

Subgradient descent

  • Find each subproblem’s optimal configuration

  • Adjust entries for mis-matched solutions


Optimizing the bound1
Optimizing the bound

Subgradient descent

  • Find each subproblem’s optimal configuration

  • Adjust entries for mis-matched solutions


Optimizing the bound2
Optimizing the bound

Subgradient descent

  • Find each subproblem’s optimal configuration

  • Adjust entries for mis-matched solutions


Equivalent decompositions
Equivalent decompositions

  • Any collection of tree-structured parts are equivalent

  • Two extreme cases

    • Set of all individual edges

    • Single “covering tree” of all edges; variables duplicated

Original graph

“Edges”

Covering tree


Speeding up inference
Speeding up inference

  • Parallel updates

    • Easy to perform subproblems in parallel

      (e.g. Komodakis et al. 2007)

  • Adaptive updates


Some complications
Some complications…

  • Example: Markov chain

    • Can pass messages in parallel, but…

    • If xn depends on x1, takes O(n) time anyway

    • Slow “convergence rate”

  • Larger problems are more “efficient”

  • Smaller problems are easily parallel & adaptive

  • Similar effects in message passing

    • Residual splash (Gonzales et al. 2009)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Cluster trees
Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees1
Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees2
Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees3
Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees4
Cluster trees

  • Alternative means of parallel computation

  • Eliminate variables in alternative order

    • Eliminate some intermediate (degree 2) nodes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Cluster trees5
Cluster trees

  • Alternative means of parallel computation

  • Eliminate variables in alternative order

    • Eliminate some intermediate (degree 2) nodes

    • Balanced: depth log(n)

x10

x5

x2 x6

x3 x8

x1x4 x7x9

x1---x2---x3--x4---x5---x6--x7--x8---x9---x10


Adapting to changes
Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Adapting to changes1
Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Adapting to changes2
Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Adapting to changes3
Adapting to changes

  • 1st pass: update O(log n) cluster functions

  • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes


Adapting to changes4
Adapting to changes

  • 1st pass: update O(log n) cluster functions

  • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes


Experiments
Experiments

  • Random synthetic problems

    • Random, irregular but “grid-like” connectivity

  • Stereo depth images

    • Superpixel representation

    • Irregular graphs

  • Compare “edges” and “cover-tree”

  • 32-core Intel Xeon, Cilk++ implementation


Synthetic problems
Synthetic problems

  • Larger problems improve convergence rate


Synthetic problems1
Synthetic problems

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead


Synthetic problems2
Synthetic problems

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead

Parallelism


Synthetic models
Synthetic models

  • As a function of problem size





Stereo depth3
Stereo depth

  • Time to convergence for different problems


Conclusions
Conclusions

  • Fast methods for dual decomposition

    • Parallel computation

    • Adaptive updating

  • Subproblem choice

    • Small problems: highly parallel, easily adaptive

    • Large problems: better convergence rates

  • Cluster trees

    • Alternative form for parallel & adaptive updates

    • Benefits of both large & small subproblems


ad