1 / 44

Parsing with Soft and Hard Constraints on Dependency Length

Parsing with Soft and Hard Constraints on Dependency Length. Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {jason,nasmith}@cs.jhu.edu. Premise. here at IWPT 2005: Burstein Sagae & Lavie Tsuruoka & Tsujii

marcel
Download Presentation

Parsing with Soft and Hard Constraints on Dependency Length

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing with Soft and Hard Constraints on Dependency Length Jason Eisner and Noah A. Smith Department of Computer Science / Center for Language and Speech Processing Johns Hopkins University {jason,nasmith}@cs.jhu.edu IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  2. Premise here at IWPT 2005: Burstein Sagae & Lavie Tsuruoka & Tsujii Dzikovska and Rosé ... Many parsing consumers (IE, ASR, MT) will benefit more from fast, precise partial parsing than from full, deep parses that are slow to build. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  3. Outline of the Talk • The Short Dependency Preference • Review of split bilexical grammars (SBGs) • O(n3) algorithm • Modeling dependency length • Experiments • Constraining dependency length in a parser • O(n) algorithm, same grammar constant as SBG • Experiments Soft constraints Hard constraints IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  4. Short-Dependency Preference A word’s dependents(adjuncts, arguments) tend to fall near it in the string. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  5. length of a dependency ≈ surface distance 3 1 1 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  6. 50% of English dependencies have length 1, another 20% have length 2, 10% have length 3 ... fraction of all dependencies length IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  7. Related Ideas • Score parses based on what’s between a head and child (Collins, 1997; Zeman, 2004; McDonald et al., 2005) • Assume short → faster human processing (Church, 1980; Gibson, 1998) • “Attach low” heuristic for PPs (English) (Frazier, 1979; Hobbs and Bear, 1990) • Obligatory and optional re-orderings (English) (see paper) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  8. Split Bilexical Grammars (Eisner, 1996; 2000) • Bilexical: capture relationships between two words using rules of the form X[p] → Y[p] Z[c] X[p] → Z[c] Y[p] X[w] → w grammar size = N3|Σ|2 • Split: left children conditionally independent of right children, given parent (equivalent to split HAGs; Eisner and Satta, 1999) head IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  9. Generating with SBGs $ λw0 ρw0 • Start with left wall $ • Generate root w0 • Generate left children w-1, w-2, ..., w-ℓ from the FSA λw0 • Generate right children w1, w2, ..., wr from the FSA ρw0 • Recurse on each wi for i in {-ℓ, ..., -1, 1, ..., r}, sampling αi (steps 2-4) • Return αℓ...α-1w0α1...αr w0 w-1 w1 w-2 w2 ... ... λw-ℓ w-ℓ wr w-ℓ.-1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  10. Naïve Recognition/Parsing p goal O(n5N3) if N nonterminals O(n5) combinations r p c i j 0 k n goal takes takes It takes to tango It takes two to It takes two to tango IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  11. Cubic Recognition/Parsing(Eisner & Satta, 1999) A triangle is a head with some left (or right) subtrees. goal One trapezoid per dependency. It takes two to tango IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  12. Cubic Recognition/Parsing(Eisner & Satta, 1999) goal O(n) combinations 0 i n O(n3) combinations i j i j k k O(n3) combinations i j i j k k O(n3g2N) if N nonterminals, polysemy g IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  13. Implementation • Augment items with (Viterbi) weights; order by weight. • Agenda-based, best-first algorithm. • We use Dyna[see the HLT-EMNLP paper] to implement all parsers here. • Count the number of items built → a measure of runtime. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  14. Very Simple Model for λw and ρw *We parse POS tag sequences, not words. p(child | first, parent, direction) p(stop | first, parent, direction) p(child | notfirst, parent, direction) p(stop | notfirst, parent, direction) λtakes ρtakes It takes two to IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  15. Baseline IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  16. Improvements smoothing/max ent parse words, not tags bigger FSAs/ more nonterminals 73% LTAG, CCG, etc. model dependency length? special NP-treatment, punctuation train discriminatively IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  17. r r d d b b a a e e c c f f Modeling Dependency Length *When running parsing algorithm, just multiply in these probabilities at the appropriate time. p’ DEFICIENT · p(3 | r, a, L) · p(2 | r, b, L) · p(1 | b, c, R) = p · p(1 | r, d, R) · p(1 | d, e, R) · p(1 | e, f, R) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  18. Modeling Dependency Length + length IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  19. Conclusion (I) Modeling dependency length can cut runtime of simple models by 26-37% with effects ranging from -3% to +4% on recall. (Loss on recall perhaps due to deficient/MLE estimation.) IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  20. Going to Extremes Longer dependencies are less likely. What if we eliminate them completely? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  21. Hard Constraints Disallow dependencies between words of distance > b ... Risk: best parse contrived, or no parse at all! Solution: allow fragments (partial parsing; Hindle, 1990 inter alia). Why not model the sequence of fragments? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  22. From SBG to Vine SBG L(ρ$) Σ An SBG wall ($) has one child. L(λ$) = {ε} $ L(ρ$) Σ+ A vine SBG wall has a sequence of children. L(λ$) = {ε} $ IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  23. Building a Vine SBG Parser Grammar: generates sequence of trees from $ Parser: recognizes sequences of trees without long dependencies Need to modify training data so the model is consistent with the parser. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  24. $ 8 would 9 4 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  25. $ would 4 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 4 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  26. $ would 1 1 . According , changes cut 3 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 3 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  27. $ would 1 1 . According , changes cut 1 to 2 2 1 by filings 2 the rule 1 1 estimates more insider 1 1 than b = 2 some 2 third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  28. $ would 1 1 . According , changes cut 1 to 1 by filings the rule 1 1 estimates more insider 1 1 than b = 1 some third (from the Penn Treebank) 1 a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  29. $ would . According , changes cut to by filings the rule estimates more insider than b = 0 some third (from the Penn Treebank) a IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  30. Observation • Even for small b, “bunches” can grow to arbitrary size: • But arbitrary center embedding is out: IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  31. Vine SBG is Finite-State Could compile into an FSA and get O(n) parsing! • Problem: what’s the grammar constant? EXPONENTIAL • insider has no parent • cut and would can have more children • $ can have more children FSA According to some estimates , the rule changes would cut insider ... IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  32. Alternative Instead, we adapt an SBG chart parser which implicitly shares fragments of stack state to the vine case, eliminating unnecessary work. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  33. Quadratic Recognition/Parsing goal ... O(n2b) * * * * * * ... O(n2b) O(n3) combinations only construct trapezoids such that k – i ≤ b i j i j k k O(nb2) O(n3) combinations i j i j k k IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  34. $ would According to some , the new changes would cut insider filings by more than a third . . According , changes cut O(nb) vine construction b = 4 all width ≤ 4 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  35. Parsing Algorithm • Same grammar constant as Eisner and Satta (1999) • O(n3) → O(nb2) runtime • Includes some overhead (low-order term) for constructing the vine • Reality check ... is it worth it? IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  36. Results: Penn Treebank *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  37. Results: Chinese Treebank *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  38. Results: TIGER Corpus *evaluation against original ungrafted Treebank; non-punctuation only b = 20 b = 1 IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  39. Type-Specific Bounds • b can be specific to dependency type: e.g., b(V-O) can be longer than b(S-V) • b specific to ‹parent, child, direction›: gradually tighten based on training data IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  40. English: 50% runtime, no loss • Chinese: 55% runtime, no loss • German: 44% runtime, 2% loss IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  41. Related Work Nederhof (2000) surveys finite-state approximation of context-free languages. CFG → FSA We limit all dependency lengths (not just center-embedding), and derive weights from the Treebank (not by approximation). Chart parser → reasonable grammar constant. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  42. Future Work apply to state-of-the-art parsing models $ better parameter estimation applications: MT, IE, grammar induction IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  43. Conclusion (II) Dependency length can be a helpful feature in improving the speed and accuracy (or trading off between them) of simple parsing models that consider dependencies. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

  44. This Talk in a Nutshell 3 length of a dependency ≈ surface distance 1 1 1 • Formal results: • A hard bound b on dependency length • results in a regular language. • allows O(nb2) parsing. • Empirical results (English, Chinese, German): • Hard constraints cut runtime in half or more with no accuracy loss (English, Chinese) or by 44% with -2.2% accuracy (German). • Soft constraints affect accuracy of simple models by -3% to +24% and cut runtime by 25% to 40%. IWPT 2005 • J. Eisner & N. A. Smith • Parsing with Soft & Hard Constraints on Dependency Length

More Related