Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Download Presentation

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Loading in 2 Seconds...

- 60 Views
- Uploaded on
- Presentation posted in: General

Fast Parallel and Adaptive Updates for Dual-Decomposition Solvers

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Fast Parallel and Adaptive Updates

for Dual-Decomposition Solvers

Ozgur Sumer, U. Chicago

UmutAcar, MPI-SWS

Alexander Ihler, UC Irvine

RamgopalMettu, UMass Amherst

- Structured (neg) energy function
- Goal:
- Examples

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

- Structured (neg) energy function
- Goal:
- Examples
- Stereo depth

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Stereo image pair

MRF model

Depth

- Structured (neg) energy function
- Goal:
- Examples
- Stereo depth
- Protein design & prediction

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

- Structured (neg) energy function
- Goal:
- Examples
- Stereo depth
- Protein design & prediction
- Weighted constraint satisfaction problems

Pairwise:

A

C

A

C

B

A

C

Factor

Graph

Markov

Random Field

Bayesian

Network

B

B

Original

- Decompose graph into smaller subproblems
- Solve each independently; optimistic bound
- Exact if all copies agree

Original

Decomposition

- Decompose graph into smaller subproblems
- Solve each independently; optimistic bound
- Exact if all copies agree
- Enforce lost equality constraints via Langrange multipliers

Original

Decomposition

Same bound by different names

- Dual decomposition (Komodakis et al. 2007)
- TRW, MPLP (Wainwright et al. 2005; Globerson & Jaakkola 2007)
- Soft arc consistency (Cooper & Schiex 2004)

Original

Decomposition

Relaxed

problems

Energy

MAP

Consistent solutions

Original

Decomposition

Subgradient descent

- Find each subproblem’s optimal configuration
- Adjust entries for mis-matched solutions

Subgradient descent

- Find each subproblem’s optimal configuration
- Adjust entries for mis-matched solutions

Subgradient descent

- Find each subproblem’s optimal configuration
- Adjust entries for mis-matched solutions

- Any collection of tree-structured parts are equivalent
- Two extreme cases
- Set of all individual edges
- Single “covering tree” of all edges; variables duplicated

Original graph

“Edges”

Covering tree

- Parallel updates
- Easy to perform subproblems in parallel
(e.g. Komodakis et al. 2007)

- Easy to perform subproblems in parallel
- Adaptive updates

- Example: Markov chain
- Can pass messages in parallel, but…
- If xn depends on x1, takes O(n) time anyway
- Slow “convergence rate”

- Larger problems are more “efficient”
- Smaller problems are easily parallel & adaptive
- Similar effects in message passing
- Residual splash (Gonzales et al. 2009)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

- Alternative means of parallel computation
- Applied to Bayes nets (Pennock 1998; Namasivayam et al 2006)

- Simple chain model
- Normally, eliminate variables “in order” (DP)
- Each calculation depends on all previous results

x1--x2---x3---x4---x5--x6---x7----x8---x9--x10

- Alternative means of parallel computation
- Eliminate variables in alternative order
- Eliminate some intermediate (degree 2) nodes

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

- Alternative means of parallel computation
- Eliminate variables in alternative order
- Eliminate some intermediate (degree 2) nodes
- Balanced: depth log(n)

x10

x5

x2 x6

x3 x8

x1x4 x7x9

x1---x2---x3--x4---x5---x6--x7--x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

- 1st pass: update O(log n) cluster functions
- 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes

- 1st pass: update O(log n) cluster functions
- 2nd pass: mark changed configurations, repeat decoding: O(m log n/m)

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

x1---x2---x3---x4---x5---x6---x7---x8---x9---x10

n = sequence length; m = # of changes

- Random synthetic problems
- Random, irregular but “grid-like” connectivity

- Stereo depth images
- Superpixel representation
- Irregular graphs

- Compare “edges” and “cover-tree”
- 32-core Intel Xeon, Cilk++ implementation

- Larger problems improve convergence rate

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead

Larger problems improve convergence rate

Adaptivity helps significantly

Cluster overhead

Parallelism

- As a function of problem size

- Time to convergence for different problems

- Fast methods for dual decomposition
- Parallel computation
- Adaptive updating

- Subproblem choice
- Small problems: highly parallel, easily adaptive
- Large problems: better convergence rates

- Cluster trees
- Alternative form for parallel & adaptive updates
- Benefits of both large & small subproblems