1 / 36

Inference

Inference. Shalmoli Gupta, Yuzhe Wang. Outline. Introduction Dual Decomposition Incremental ILP Amortizing Inference. What is Inference. NLP problems, map input to structured output is the set of output for is a scoring function assigning score to the ouput

bevan
Download Presentation

Inference

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Inference Shalmoli Gupta, Yuzhe Wang

  2. Outline • Introduction • Dual Decomposition • Incremental ILP • Amortizing Inference

  3. What is Inference • NLP problems, map input to structured output • is the set of output for • is a scoring function assigning score to the ouput • It can be learned using any learning model • POS Tagging : is sentence and set of all possible tag sequences • Parsing : is sentence and set of all parse trees • Solving the inference problem • LP / ILP solvers • Dynamic Programming

  4. Difficulty in Inference • Search space grows exponentially with • Problem becomes intractable / polynomial but very slow • 3 different approaches : • Dual Decomposition • Incremental ILP (cutting plane) • Amortized Inference

  5. Lagrangian Relaxation • Original Problem : • Choose a set , s.t • For any • Let • Dual Objective : • For any • Dual Problem : Makes the problem NP Hard Lagrangian Multiplier Linear Constraint Use subgradient Algorithm

  6. Dual Decomposition • Special case of LR • Decoding problem : s.t • Clearly the problem is similar to LR • Lagrangian : • Dual Objective : • Dual Problem : • is convex, but may not differentiable. • Iteratively solved using subgradient descent Specifies that two vectors coherent Lagrangian Multiplier Linear Constraint

  7. Dual Decomposition • Initialize the lagrangian multiplierto 0 • In each iteration find the structure • If , then return as soln • Will be optimal solution to original decoding problem • Else update the multiplier • DD problem is infact dual of LP relaxation of original decoding problem (Primal) • If this is tight , then optimal soln to original problem always found

  8. Non-Projective Dependency Parsing • Edge indicates : head-word, : modifier • Forms a directed spanning tree rooted at “root” • Non-Projective : Dependency edges can cross • : Set of well-formed dependency parsers • optimal parse tree : root John saw a bird yesterday which was a Parrot : Assigns scores to parse trees

  9. Arc-Factored Model • Scores (e.g. ) can be obtained using any learning method • Optimal Parser found using simple MST algorithm root John saw a bird yesterday which was a Parrot

  10. Sibling Model • Again individual scores obtained using any learning • is NP-Hard root John saw a bird yesterday which was a Parrot : Set of well-formed parse trees

  11. Optimal Modifiers for each Head-word • Easy to find set of modifier which maximizes • possible choices for each head-word • Can be solved using dynamic programming in root John saw a bird yesterday which was a Parrot

  12. A Simple Algorithm • Find the optimal set of modifiers for each word • If it’s a valid parse tree we are done !! • Resulting parser may not be well formed • May contain cycles ! • Computes root John saw a bird yesterday which was a Parrot : Set of graphs (Relaxed from set of valid parse trees)

  13. Parsing : Dual Decomposition • Consider a more generalized model : • Rewrite as : s.t. • Without constraint ObjFn : Sibling Model Arc-Factored Model Ensures MST Individual Decoding

  14. Parsing : Dual Decomposition • Use Lagrangian multiplier and move constraint to objfn • Dual Objective fn : • Dual Problem : • Use subgradient algorithm

  15. Algorithm Outline • Set • For k = 1 to K do (By Individual Decoding) if ( return else update

  16. Comparison LP/ILP

  17. References • A Tutorial on Dual Decomposition and Lagrangian Relaxation for Inference in Natural Language Processing, Alexander M. Rush and Michael Collins, JAIR'13 • Dual Decomposition for Parsing with Non-Projective Head Automata, Terry Koo, Alexander M. Rush, Michael Collins, TommiJaakkola, and David Sontag, EMNLP, 2010.

  18. Inference Part 2

  19. Inremental Integer Linear Programming 1. Many problems and models can be reformulated into ILP ex. HMM 2. Incorporate more constraints

  20. Non-projective Dependency Parsing "come" is a Head "I'll" is a Child of "come" "subject" is the Label between "come" and "I'll" Non-projective: Dependency can cross

  21. Model Object function x is sentence, y is a set of labelled dependencies, f(i,j,l) is feature for token i, j with label l

  22. Constraints T1: For every non-root token in x there exists exactly one head; the root token has no head. T2: There are no cycles A1: Head are not allowed to have more than one outgoing edge labeled l for all l in a set of labels U C1: In a symmetric coordination there is exactly one argument to the right of the conjunction and at least one argument to the left C4: Arguments of a coordination must have compatible word classes. P1: Two dependencies must not cross if one of their labels is in a set of labels P.

  23. Reformulate into ILP Labelled edges: Existence of a dependency between tokens i and j: Objective function:

  24. Reformulate constraints Only one head(T1): Label Uniqueness(A1): Symmetric Coordination(C1): No Cycles(T2):

  25. Algorithm Bx: Constraints added in advance Ox: Objective function Vx: Variables W: Violated constraints

  26. Experiment results LAC: Labelled accuracy UAC: Unlabelled accuracy LC: Percentage of sentences with 100% labelled accuracy UC: Percentage of sentences with 100% unlabelled accuracy

  27. Runtime Evaluation

  28. Amortizing Inference Algorithm Idea: Under some conditions, we can re-use earlier solutions for certain instances Why? 1. Only a small fraction of structure occurs, compared with large space of possible structures 2. The distribution of observed structures is heavily skewed towards a small number of them # of Examples>># of ILPs >># of Solutions

  29. Amortizing Inference Conditions Considering solving a 0-1 ILP problem Idea of Theorem 1: For every inference variable that is active in the solution, increasing the corresponding objective value will not change the optimal assignment to the variables. For variable whose value in solution is 0, decreasing the objective value will not change the optimal solution

  30. Amortizing Inference Conditions Theorem 1. Let p denote an inference problem posed as an integer linear program belonging to an equivalence class [P], and q~[P] be another inference instance in the same equivalence class. Define d(c)=cq-cp to be the difference of the objective coefficients of the ILPs. Then yp is the solution of problem q if for each i in {1,...,np}, we have (2yp,i-1)d(ci)>0

  31. Amortizing Inference Conditions Idea of Theorem 2: Suppose y* is optimal solution, cpy<=cpy*, cqy<=cqy* Then (x1cp+x2cq)y<=(x1cp+x2cq)y* Theorem 2: y* is solution which has objective coefficients (x1cp+x2cq)

  32. Amortizing Inference Conditions Theorem 3: Define D(c,x)=cq-sum(xjcp,j), if there is some x s.t. x positive and for any i (2yp,i-1)D(c,x)>=0 then q has the same optimal solution.

  33. Conditions

  34. Approximation schemes 1. Most frequent solution 2. Top-K approximation 3. Approximations to theorem 1 and theorem 3

  35. Experiment Result

  36. Thank you!

More Related