Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth

1 / 1

Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth - PowerPoint PPT Presentation

Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth. Amortized Inference. General Recipe.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

PowerPoint Slideshow about 'Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth' - robert-marshall

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

Margin-based Decomposed Amortized Inference

Gourab Kundu, Vivek Srikumar, Dan Roth

Amortized Inference

General Recipe

Key Observations: (1) In NLP, we solve a lot of inference problems, at least one per sentence. (2)Redundancy of structures: The number of observed structures (blue solid line) is much smaller than the number of inputs (red dotted line). Moreover, the distribution of observed structures is highly skewed (inset). (Eg. for POS, a small number of tag sequences are much more frequent than the rest.)  Pigeon Principle Applies.

Research Question: Can we solve the k-th inference instance much fast than the 1st?

Amortized inference (Srikumar et al 2012) shows how computation from earlier inference instances can be used to speed up inference for new, previously unseen instances.

If CONDITION(problem cache, newproblem)

then (no need to call the solver)

SOLUTION(new problem) = old solution

Else

Call base solver and update cache

End

+A theorem guaranteeing correctness

2. Decomposed Amortized Inference using Lagrangian Relaxation

ILP formulations

1. Margin based Amortization

• Given a problem: max cqT y s.t. MT y <= b:
• Partitionconstraints into two sets, say C1 and C2 (with constraints M2T y – b2)
• Define L(λ): max y ϵ C1 cqT y – λT (M2T y – b2), solved using any amortized algorithm
• Dual problem: minλ >= 0 L(λ), solved using gradient descent over dual variables λ.
• The dual can be decomposed into multiple smaller problems if no constraint in C1 has active variables from two different smaller problems. Even otherwise, it has fewer constraints than the original problem. Considering the dual can have two advantages from the perspective of amortization: (1) Smaller problems (2) Higher chance of cache hits
• Let B be the solution yp for inference
• problem p with objective cp, denoted
• by the red hyperplane and let A be the
• second best assignment.
• For a new objective function cq (blue), if the margin δ is greater than the sum of
• the decrease in the objective value of yp
• and the maximum increase in the
• objective of another solution (Δ), then
• the solution to the new inference
• problem will still be yp.
• ILP (Integer Linear Programming) is a general formulation for inference in structured prediction tasks [Roth & Yih, 04, 07]
• Inference using ILP has been successful in NLP tasks e.g. SRL, Dep. Parsing, Information extraction and more.
• ILP can be expressed as:
• max cx max2x1 + 3x2
• s.t. Ax ≤ b x1 + x2 <= 1
• x integer
• Amortized inference depends on having an ILP formulation; but multiple solvers might be used.

More redundancy among

smaller structures

Redundancy in components of structures: We extend amortization to cases where the full structured output is not repeatedby storing partial computation for future inference problems.

• Formally , the condition is given by: - ( cq - cp)T yp + Δ <= δ
• At the caching stage, δ is computed for each ILP and stored in
• cache.
• At test time, computing Δ exactly requires solving an ILP.
• Instead, we compute an approximate Δ by solving a relaxed problem
• which can be done efficiently.

Experimental Setup

Semantic Role Labeling

Entities and Relations

[Roth & Yih 2004]

[Punyakanok, et al 2008]

We simulate a long-running NLP process by caching problems and solutions from Gigaword corpus. We used a database engine to cache ILPs and their solutions along their and structured margin. We compare our approaches to a state-of-the-art ILP solver (Gurobi) and also to Theorem 1 from (Srikumar et al.2012).

This work continues the original work on Amortization: Srikumar, Kundu and Roth. On Amortizing Inference Cost for Structured Prediction. EMNLP, 2012

Solve only one in four problems

Solve only one in six problems

Wall clock improvements too

This research is sponsored by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053, Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0181, DARPA under agreement number FA8750-13-2-0008, Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155.