Loading in 5 sec....

Fast Parallel and Adaptive Updates for Dual-Decomposition SolversPowerPoint Presentation

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

- 78 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers' - ronald

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Cluster trees

Fast Parallel and Adaptive Updates

for Dual-Decomposition Solvers

Ozgur Sumer, U. Chicago

UmutAcar, MPI-SWS

Alexander Ihler, UC Irvine

RamgopalMettu, UMass Amherst

Graphical models

- Structured (neg) energy function
- Goal:
- Examples

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Graphical models

- Structured (neg) energy function
- Goal:
- Examples
- Stereo depth

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Stereo image pair

MRF model

Depth

Graphical models

- Structured (neg) energy function
- Goal:
- Examples
- Stereo depth
- Protein design & prediction

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Graphical models

- Structured (neg) energy function
- Goal:
- Examples
- Stereo depth
- Protein design & prediction
- Weighted constraint satisfaction problems

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Dual decomposition methods

Original

Dual decomposition methods

- Decompose graph into smaller subproblems
- Solve each independently; optimistic bound
- Exact if all copies agree

Original

Decomposition

Dual decomposition methods

- Decompose graph into smaller subproblems
- Solve each independently; optimistic bound
- Exact if all copies agree
- Enforce lost equality constraints via Langrange multipliers

Original

Decomposition

Dual decomposition methods

Same bound by different names

- Dual decomposition (Komodakis et al. 2007)
- TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)
- Soft arc consistency (Cooper & Schiex 2004)

Original

Decomposition

Optimizing the bound

Subgradient descent

- Find each subproblem’s optimal configuration
- Adjust entries for mis-matched solutions

Optimizing the bound

Subgradient descent

- Find each subproblem’s optimal configuration
- Adjust entries for mis-matched solutions

Optimizing the bound

Subgradient descent

- Find each subproblem’s optimal configuration
- Adjust entries for mis-matched solutions

Equivalent decompositions

- Any collection of tree-structured parts are equivalent
- Two extreme cases
- Set of all individual edges
- Single “covering tree” of all edges; variables duplicated

Original graph

“Edges”

Covering tree

Speeding up inference

- Parallel updates
- Easy to perform subproblems in parallel
(e.g. Komodakis et al. 2007)

- Easy to perform subproblems in parallel
- Adaptive updates

Some complications…

- Example: Markov chain
- Can pass messages in parallel, but…
- If xn depends on x1, takes O(n) time anyway
- Slow “convergence rate”

- Larger problems are more “efficient”
- Smaller problems are easily parallel & adaptive
- Similar effects in message passing
- Residual splash (Gonzales et al. 2009)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Cluster trees

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

Cluster trees

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

Cluster trees

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

Cluster trees

- Alternative means of parallel computation
- Eliminate variables in alternative order
- Eliminate some intermediate (degree 2) nodes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Cluster trees

- Alternative means of parallel computation
- Eliminate variables in alternative order
- Eliminate some intermediate (degree 2) nodes
- Balanced: depth log(n)

x10

x5

x2 x6

x3 x8

x1x4 x7x9

x1---x2---x3--x4---x5---x6--x7--x8---x9---x10

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Adapting to changes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

Adapting to changes

- 1st pass: update O(log n) cluster functions
- 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes

Adapting to changes

- 1st pass: update O(log n) cluster functions
- 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes

Experiments

- Random synthetic problems
- Random, irregular but “grid-like” connectivity

- Stereo depth images
- Superpixel representation
- Irregular graphs

- Compare “edges” and “cover-tree”
- 32-core Intel Xeon, Cilk++ implementation

Synthetic problems

- Larger problems improve convergence rate

Synthetic problems

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead

Synthetic problems

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead

Parallelism

Synthetic models

- As a function of problem size

Stereo depth

- Time to convergence for different problems

Conclusions

- Fast methods for dual decomposition
- Parallel computation
- Adaptive updating

- Subproblem choice
- Small problems: highly parallel, easily adaptive
- Large problems: better convergence rates

- Cluster trees
- Alternative form for parallel & adaptive updates
- Benefits of both large & small subproblems

Download Presentation

Connecting to Server..