Fast Parallel and Adaptive Updates
This presentation is the property of its rightful owner.
Sponsored Links
1 / 37

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers PowerPoint PPT Presentation


  • 51 Views
  • Uploaded on
  • Presentation posted in: General

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers. Ozgur Sumer, U. Chicago Umut Acar , MPI-SWS Alexander Ihler , UC Irvine Ramgopal Mettu , UMass Amherst. Graphical models. Structured ( neg ) energy function Goal: Examples. Pairwise :. A. C. A. C. B. A.

Download Presentation

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Fast parallel and adaptive updates for dual decomposition solvers

Fast Parallel and Adaptive Updates

for Dual-Decomposition Solvers

Ozgur Sumer, U. Chicago

UmutAcar, MPI-SWS

Alexander Ihler, UC Irvine

RamgopalMettu, UMass Amherst


Graphical models

Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B


Graphical models1

Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

    • Stereo depth

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Stereo image pair

MRF model

Depth


Graphical models2

Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

    • Stereo depth

    • Protein design & prediction

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B


Graphical models3

Graphical models

  • Structured (neg) energy function

  • Goal:

  • Examples

    • Stereo depth

    • Protein design & prediction

    • Weighted constraint satisfaction problems

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B


Dual decomposition methods

Dual decomposition methods

Original


Dual decomposition methods1

Dual decomposition methods

  • Decompose graph into smaller subproblems

  • Solve each independently; optimistic bound

  • Exact if all copies agree

Original

Decomposition


Dual decomposition methods2

Dual decomposition methods

  • Decompose graph into smaller subproblems

  • Solve each independently; optimistic bound

  • Exact if all copies agree

  • Enforce lost equality constraints via Langrange multipliers

Original

Decomposition


Dual decomposition methods3

Dual decomposition methods

Same bound by different names

  • Dual decomposition (Komodakis et al. 2007)

  • TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)

  • Soft arc consistency (Cooper & Schiex 2004)

Original

Decomposition


Dual decomposition methods4

Relaxed

problems

Energy

MAP

Consistent solutions

Dual decomposition methods

Original

Decomposition


Optimizing the bound

Optimizing the bound

Subgradient descent

  • Find each subproblem’s optimal configuration

  • Adjust entries for mis-matched solutions


Optimizing the bound1

Optimizing the bound

Subgradient descent

  • Find each subproblem’s optimal configuration

  • Adjust entries for mis-matched solutions


Optimizing the bound2

Optimizing the bound

Subgradient descent

  • Find each subproblem’s optimal configuration

  • Adjust entries for mis-matched solutions


Equivalent decompositions

Equivalent decompositions

  • Any collection of tree-structured parts are equivalent

  • Two extreme cases

    • Set of all individual edges

    • Single “covering tree” of all edges; variables duplicated

Original graph

“Edges”

Covering tree


Speeding up inference

Speeding up inference

  • Parallel updates

    • Easy to perform subproblems in parallel

      (e.g. Komodakis et al. 2007)

  • Adaptive updates


Some complications

Some complications…

  • Example: Markov chain

    • Can pass messages in parallel, but…

    • If xn depends on x1, takes O(n) time anyway

    • Slow “convergence rate”

  • Larger problems are more “efficient”

  • Smaller problems are easily parallel & adaptive

  • Similar effects in message passing

    • Residual splash (Gonzales et al. 2009)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Cluster trees

Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees1

Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees2

Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees3

Cluster trees

  • Alternative means of parallel computation

    • Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

  • Simple chain model

    • Normally, eliminate variables “in order” (DP)

    • Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10


Cluster trees4

Cluster trees

  • Alternative means of parallel computation

  • Eliminate variables in alternative order

    • Eliminate some intermediate (degree 2) nodes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Cluster trees5

Cluster trees

  • Alternative means of parallel computation

  • Eliminate variables in alternative order

    • Eliminate some intermediate (degree 2) nodes

    • Balanced: depth log(n)

x10

x5

x2 x6

x3 x8

x1x4 x7x9

x1---x2---x3--x4---x5---x6--x7--x8---x9---x10


Adapting to changes

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Adapting to changes1

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Adapting to changes2

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10


Adapting to changes3

Adapting to changes

  • 1st pass: update O(log n) cluster functions

  • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes


Adapting to changes4

Adapting to changes

  • 1st pass: update O(log n) cluster functions

  • 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes


Experiments

Experiments

  • Random synthetic problems

    • Random, irregular but “grid-like” connectivity

  • Stereo depth images

    • Superpixel representation

    • Irregular graphs

  • Compare “edges” and “cover-tree”

  • 32-core Intel Xeon, Cilk++ implementation


Synthetic problems

Synthetic problems

  • Larger problems improve convergence rate


Synthetic problems1

Synthetic problems

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead


Synthetic problems2

Synthetic problems

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead

Parallelism


Synthetic models

Synthetic models

  • As a function of problem size


Stereo depth

Stereo depth


Stereo depth1

Stereo depth


Stereo depth2

Stereo depth


Stereo depth3

Stereo depth

  • Time to convergence for different problems


Conclusions

Conclusions

  • Fast methods for dual decomposition

    • Parallel computation

    • Adaptive updating

  • Subproblem choice

    • Small problems: highly parallel, easily adaptive

    • Large problems: better convergence rates

  • Cluster trees

    • Alternative form for parallel & adaptive updates

    • Benefits of both large & small subproblems


  • Login