1 / 124

Grammatical processing with LFG and XLE

A dvanced QU estion A nswering for INT elligence. Grammatical processing with LFG and XLE. Ron Kaplan ARDA Symposium, August 2004. Match. M. M. Layered Architecture for Question Answering. XLE/LFG Parsing. Target KRR. Text. KR Mapping. F-structure.

tauret
Download Presentation

Grammatical processing with LFG and XLE

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Advanced QUestion Answering for INTelligence Grammatical processing with LFG and XLE Ron Kaplan ARDA Symposium, August 2004

  2. Match M M Layered Architecture for Question Answering XLE/LFG Parsing Target KRR Text KR Mapping F-structure Conceptual semantics KR Sources Assertions Question Query Answers Explanations Subqueries Composed F-StructureTemplates Text to user XLE/LFG Generation Text

  3. Layered Architecture for Question Answering XLE/LFG Parsing Target KRR Text KR Mapping F-structure Conceptual semantics KR Sources Assertions Question Query Answers Explanations Subqueries Composed F-StructureTemplates Text to user XLE/LFG Generation Text

  4. Infrastructure XLE MaxEnt models Linear deduction Term rewriting Theories Lexical Functional Grammar Ambiguity management Glue Semantics Resources English Grammar Glue lexicon KR mapping Layered Architecture for Question Answering XLE/LFG Parsing Target KRR Text KR Mapping F-structure Conceptual semantics KR Sources Assertions Question Query

  5. shallow but wrong delegation furthest away but Subject of flew deep and right “grammatical function” Deep analysis matters…if you care about the answer Example: A delegation led by Vice President Philips, head of the chemical division, flew to Chicago a week after the incident. Question: Who flew to Chicago? Candidate answers: division closest noun head next closest V.P. Philips next

  6. PRED easy(SUBJ, COMP) SUBJ John PRED please(SUBJ, OBJ) SUBJ someone OBJ John COMP PRED eager(SUBJ, COMP) SUBJ John PRED please(SUBJ, OBJ) SUBJ John OBJ someone COMP F-structure: localizes arguments Was John pleased? “John was easy to please” Yes “John was eager to please” Unknown “lexical dependency”

  7. Topics • Basic LFG architecture • Ambiguity management in XLE • Pargram project: Large scale grammars • Robustness • Stochastic disambiguation • [Shallow markup] • [Semantic interpretation] Focus on the language end, not knowledge

  8. “Tony decided to go.” Knowledge The Language Mapping: LFG & XLE StochasticModel NamedEntities LFGGrammar English, German, etc. Parse Functional structures TokensMorphology Sentence Generate XLE XLE: Efficient ambiguity management

  9. Why deep analysis is difficult • Languages are hard to describe • Meaning depends on complex properties of words and sequences • Different languages rely on different properties • Errors and disfluencies • Languages are hard to compute • Expensive to recognize complex patterns • Sentences are ambiguous • Ambiguities multiply: explosion in time and space

  10. S NP V’ NP Det Adj N V Det Aux N the small children are chasing the dog S NP NP V N P Adj N P inudog oObj tiisaismall kodomotatichildren gaSbj oikaketeiruare chasing Different patterns code same meaning The small children are chasing the dog. English Group, order Japanese Group, mark

  11. S NP V’ NP Det Adj N V Det Aux N the small children are chasing the dog Pred ‘chase<Subj, Obj>’ S Tense Present NP NP V PredMod childrensmall Subj N P Adj N P inudog oObj tiisaismall kodomotatichildren gaSbj oikaketeiruare chasing Pred dog Obj S NP Aux NP V NP Warlpiri Mark only N A N kurdujarrarluchildren-Sbj kapalaPresent malikidog-Obj wajilipinyichase witajarrarlusmall-Sbj Different patterns code same meaning The small children are chasing the dog. LFG theory: minor adjustments on universal theme English Group, order Japanese Group, mark chase(small(children), dog)

  12. PRED ‘John’ NUM SG SUBJ PRED ‘like<SUBJ,OBJ>’ TENSE PRESENT PRED ‘Mary’ NUM SG OBJ Modularity Nearly-decomposable LFG architecture related by a piecewise correspondence  C(onstituent)-structures and F(unctional)-structures S NP VP John V NP likes Mary Formal encoding of order and grouping Formal encoding of grammatical relations

  13. LFG grammar Rules Lexical entries • Context-free rules define valid c-structures (trees). • Annotations on rules give constraints that corresponding f-structures must satisfy. • Satisfiability of constraints determines grammaticality. • F-structure is solution for constraints (if satisfied). S  NP VP ( SUBJ)= = N  John ( PRED)=‘John’ ( NUM)=SG V  likes ( PRED)=‘like<SUBJ, OBJ>’ ( SUBJ NUM)=SG (↑ SUBJ PERS)=3 VP  V (NP) =  ( OBJ)= NP  (Det) N =  =

  14. S  NP( SUBJ)= VP= Rules as well-formedness conditions S  SUBJ [ ] NP VP If * denotes a particular daughter: : f-structure of mother (M(*)) : f-structure of daughter(*) A tree containing S over NP - VP is OK if F-unit corresponding to NP node is SUBJ of f-unit corresponding to S node The same f-unit corresponds to both S and VP nodes.

  15. f NP( SUBJ)= VP= f S  s s v f v they( NUM)=PL walks( SUBJ NUM)=SG s v Let f be the (unknown) f-structure of the S Then (substituting equals for equals): s be the f-structure of the NP (fSUBJ) = s and (s NUM)=PL => (f SUBJ NUM)=PL v be the f-structure of the VP (f SUBJ NUM)=PL and (f SUBJ NUM)=SG => SG=PL =>FALSE Inconsistent equations = Ungrammatical S What’s wrong with “They walks” ? NP VP they walks f= v and (v SUBJ NUM)=SG => (f SUBJ NUM)=SG If a valid inference chain yields FALSE, the premises are unsatisfiable, no f-structure.

  16. English: One NP before verb, one after: Subject and Object S  NP( SUBJ)= V= NP( OBJ)= Japanese: Any number of NP’s before Verb Particle on each defines its grammatical function ga: ( GF)=SUBJ NP*( ( GF))= V= o: ( GF)=OBJ S  English and Japanese

  17. S  … NP*… ( ( GF))= rlu: ( GF)=SUBJ ki: ( GF)=OBJ Unlike Japanese, head Noun is optional in NP A*( MOD) NP  N= PRED ‘chase<Subj, Obj>’ S TENSE Present PREDMOD childrensmall  SUBJ NP Aux NP V NP PRED dog OBJ N A N kurdujarrarluchildren-Sbj kapalaPresent malikidog-Obj wajilipinyichase witajarrarlusmall-Sbj Warlpiri: Discontinuous constituents Like Japanese: Any number of NP’s Particle on each defines its grammatical function

  18. S’ NP S Q Who TENSE past PRED think<SUBJ, COMP> COMP Who Aux NP V S  did Bill think NP V PRED see<SUBJ,OBJ> TENSE past SUBJ Mary OBJ Mary saw English: Discontinuity in questions Who did Mary see? Who did Bill think Mary saw? Who did Bill think saw Mary? OBJ COMP OBJ COMP SUBJ Who is understood as subject/object of distant verb.Uncertainty: which function of which verb? S’ → NP S (↑ Q)=↓ ↑=↓ (↑ COMP* SUBJ|OBJ)=↓

  19. Summary: Lexical Functional Grammar Kaplan and Bresnan, 1982 • Modular: c-structure/f-structure in correspondence • Mathematically simple, computationally transparent • Combination of Context-free grammar, Quantifier-free equality theory • Closed under composition with regular relations: finite-state morphology • Grammatical functions are universal primitives • Subject and Object expressed differently in different languages English: Subject is first NP Japanese: Subject has ga • But: Subject and Object behave similarly in all languages Active to Passive: Object becomes Subject English: move words Japanese: move ga • Adopted by world-wide community of linguists • Large literature: papers, (text)books, conferences; reference theory • (Relatively) easy to describe all languages • Linguists contribute to practical computation • Stable: Only minor changes in 25 years

  20. Efficient computation with LFG grammars: Ambiguity Management in XLE

  21. Tokenization Morphology Syntax Semantics Knowledge • The sheet broke the beam. Atoms or photons? • Every proposer wants an award. The same award or each their own? • The duck is ready to eat.Cooked or hungry? • walks Noun or Verb? • untieable knot (untie)able or un(tieable)? • bankriver or financial? • I like Jan. |Jan|.| or |Jan.|.| (sentence end or abbreviation) Computation challenge: Pervasive ambiguity

  22. Coverage vs. Ambiguity I fell in the park. + I know the girl in the park. I see the girl in the park.

  23. Ambiguity can be explosive If alternatives multiply within or across components… Tokenize Morphology Syntax Semantics Knowledge

  24. Computational consequences of ambiguity • Serious problem for computational systems • Broad coverage, hand written grammars frequently produce thousands of analyses, sometimes millions • Machine learned grammars easily produce hundreds of thousands of analyses if allowed to parse to completion • Three approaches to ambiguity management: • Prune: block unlikely analysis paths early • Procrastinate: do not expand alternative analysis paths until something else requires them • Also known as underspecification • Manage: compact representation and computation of all possible analyses

  25. Oops: Strong constraints may reject the so-far-best (= only) option Statistics X Pruning ⇒ Premature Disambiguation • Conventional approach: Use heuristics to kill as soon as possible X X X Tokenize Morphology Syntax Semantics Knowledge X Fast computation, wrong result

  26. Procrastination: Passing the Buck • Chunk parsing as an example: • Collect noun groups, verb groups, PP groups • Leave it to later processing to put these together • Some combinations are nonsense • Later processing must either: • Call (another) parser to check constraints • Have its own model of constraints (= grammar) • Solve constraints that chunker includes with output

  27. Computational Complexity of LFG • LFG is simple combination of two simple theories • Context-free grammars for trees • Quantifier free theory of equality for f-structures • Both theories are easy to compute • Cubic CFG Parsing • Linear equation solving • Combination is difficult: Parsing problem is NP Complete • Exponential/intractible in the worst case(but computable, unlike some other linguistic theories • Can we avoid the worst case?

  28. Some syntactic dependencies • Local dependencies: These dogs *This dogs(agreement) • Nested dependencies: The dogs[in the park] bark (agreement) • Cross-serial dependencies:Jan PietMarie zag helpenzwemmen(predicate/argument map) See(Jan, help(Piet, swim(Marie))) • Long distance dependencies: The girl who John says that Bob believes … likes Henry left. Left(girl) Says(John, believes(Bob, (…likes(girl, Henry))))

  29. Intractable! Expressiveness vs. complexity The Chomsky Hierarchy n is length of sentence Linear Cubic Exponential But languages have mostly local and nested dependencies... so (mostly) cubic performance should be possible.

  30. NP Complete Problems • Problems that can be solved by a Nondeterministic Turing Machine in Polynomial time • General characterization: Generate and test • Lots of candidate solutions that need to be verified for correctness • Every candidate is easy to confirm or disconfirm n elements Nondeterministic TM has an oracle that provides only the right candidates to test, doesn’t search. Deterministic TM doesn’t have oracle, must test all (exponentially many) candidates. 2n candidates

  31. Polynomial search problems • Subparts of a candidate are independent of other parts: outcome is not influenced by other parts (context free) • The same independent subparts appear in many candidates • We can (easily) determine that this is the case • Consequence: test subparts independent of context, share results

  32. Why is LFG parsing NP Complete? Classic generate-and-test search problem • Exponentially many tree-candidates • CFG chart parser quickly produces packed representation of all trees • CFG can be exponentiallyambiguous • Each tree must be tested for f-structure satisfiability • Boolean combinations of per-tree constraints English base verbs: Not 3rd singular ( SUBJ NUM)SG  ( SUBJ PERS)3 Disjunction! Exponentially many exponential problems

  33. Options multiplied out The sheep-sg saw the fish-sg. The sheep-pl saw the fish-sg. The sheep-sg saw the fish-pl. The sheep-pl saw the fish-pl. In principle, a verb might require agreement of Subject and Object: Have to check it out. Options packed But English doesn’t do that: Subparts are independent sgpl sgpl The sheep saw the fish XLE Ambiguity Management: The intuition How many sheep? How many fish? The sheep saw the fish. Packed representation is a “free choice” system • Encodes all dependencies without loss of information • Common items represented, computed once • Key to practical efficiency

  34. nomacc nomacc Das Mädchen sah die Katze The girl saw the cat bad The girl saw the cat The cat saw the girl bad Das Mädchen-nom sah die Katze-nom Das Mädchen-nom sah die Katze-acc Das Mädchen-acc sah die Katze-nom Das Mädchen-acc sah die Katze-acc Dependent choices … but it’s wrong Doesn’t encode all dependencies, choices are not free. Again, packing avoids duplication Who do you want to succeed? I want to succeed John want intrans, succeed trans I want John to succeed want trans, succeed intrans

  35. bad The girl saw the cat The cat saw the girl bad (pq)  (pq)  = Das Mädchen-nom sah die Katze-nom Das Mädchen-nom sah die Katze-acc Das Mädchen-acc sah die Katze-nom Das Mädchen-acc sah die Katze-acc p:nomp:acc q:nomq:acc Das Mädchen sah die Katze Solution: Label dependent choices • Label each choice with distinct Boolean variables p, q, etc. • Record acceptable combinations as a Boolean expression  • Each analysis corresponds to a satisfying truth-value assignment • (free choice from the true lines of ’s truth table)

  36. (a  x  c) (a  x  d) (b  x  c) (b  x  d) (a  b)  x  (c  d) Boolean Satisfiability Can solve Boolean formulas by multiplying out: Disjunctive Normal Form • Produces simple conjunctions of literal propositions (“facts”--equations) • Easy checks for satisfiability If ad FALSE, replace any conjunction with a and d by FALSE. • Blow-up of disjunctive structure before fact processing • Individual facts are replicated (and re-processed): Exponential

  37. Alternative: “Contexted” normal form (a  b)  x  (c  d) pa pb  x  qcqd Produce a flat conjunction of contexted facts context fact

  38.  p p q q  x  p  a p  b x q  c q  d a b c d Alternative: “Contexted” normal form (a  b)  x  (c  d) pa pb  x  qcqd • Each fact is labeled with its position in the disjunctive structure • Boolean hierarchy discarded Produce a flat conjunction of contexted facts No blow-up, no duplicates • Each fact appears and can be processed once • Claims: • Checks for satisfiability still easy • Facts can be processed first, disjunctions deferred

  39. Ambiguity-enabled inference (by trivial logic): If    is a rule of inference, then so is [C1  ]  [C2  ]  [(C1C2)  ] Valid for any theory E.g. Substition of equals for equals: x=y    x/yis a rule of inferenceTherefore: (C1  x=y)  (C2  )  (C1  C2  x/y) A sound and complete method Maxwell & Kaplan, 1987, 1991 Conversion to logically equivalent contexted form Lemma:   iff p  p  (p a new Boolean variable) Proof: (If) If  is true, let p be true, in which case p  p   is true. (Only if) If p is true, then  is true, in which case  is true.

  40. Test for satisfiability Suppose R  FALSE is deduced from a contexted formula . Then  is satisfiable only if R. E.g. R  SG=PL⇒R → FALSE. R is called a “nogood” context. • Perform all fact-inferences, conjoining contexts • If infer FALSE, add context to nogoods • Solve conjunction of nogoods • Boolean satisfiability: exponential in nogood context-Booleans • Independent facts: no FALSE, no nogoods • Implicitly notices independence/context-freeness

  41. Example 1 “They walk” • No disjunction, all facts are in the default “True” context • No change to inference T(f SUBJ NUM)=SG  T(f SUBJ NUM)=SG  T SG=SG reduces to: (f SUBJ NUM)=SG  (f SUBJ NUM)=SG  SG=SG “They walks” • No disjunction, all facts still in the default “True” context • No change to inference: T(f SUBJ NUM)=PL  T(f SUBJ NUM)=SG TPL=SG  T→FALSE Satisfiable iff ¬T, so unsatisfiable

  42. Examples 2 “The sheep walks” • Disjunction of NUM feature from sheep (f SUBJ NUM)=SG  (f SUBJ NUM)=PL • Contexted facts: p(f SUBJ NUM)=SG  p(f SUBJ NUM)=PL  (f SUBJ NUM)=SG (from walks) • Inferences: p(f SUBJ NUM)=SG  (f SUBJ NUM)=SG  p SG=SG p(f SUBJ NUM)=PL  (f SUBJ NUM)=SG  p PL=SG  p FALSE p FALSE is true iff p is false iff p is True. Conclusion: Sentence is grammatical in context p: Only 1 sheep

  43. p  SGp  PL q  SGq  PL p  SGp  PL q  SGq  PL SUBJ OBJ NUM NUM NUM NUM SUBJ OBJ Contexts and packing: Index by facts The sheep saw the fish. Contexted unification  concatenation, when choices don’t interact.

  44. SUBJ[NUMSG]OBJ [ NUM PL] SUBJ[NUMPL]OBJ [ NUM SG] SUBJ[NUMPL]OBJ [ NUM PL] Compare: DNF unification The sheep saw the fish. SUBJ[NUMSG]OBJ [ NUM SG]  [ SUBJ [NUM SG]][SUBJ [NUM PL ]]   [ OBJ [NUM SG]][ OBJ [NUM PL ]]   DNF cross-product of alternatives: Exponential

  45. The XLE wager(for real sentences of real languages) • Alternatives from distant choice-sets can be freely chosen without affecting satisfiability • FALSE is unlikely to appear • Contexted method optimizes for independence • No FALSE  no nogoods nothing to solve. Bet: Worst case2n reduces to k2m where m<< n

  46. Ambiguity-enabled inference: Choice-logic common to all modules If    is a rule of inference,then so is C1    C2    (C1C2)   1. Substitution of equals for equals (e.g. for LFG syntax) x=y   x/y Therefore: C1x=y  C2    (C1C2) x/y 2. Reasoning Cause(x,y)  Prevent(y,z)  Prevent(x,z) Therefore: C1Cause(x,y)  C2Prevent(y,z)  (C1C2)Prevent(x,z) 3. Log-linear disambiguation Prop1(x)  Prop2(x) Count(Featuren)Therefore: C1 Prop1(x)  C2 Prop2(x) (C1C2) Count(Featuren) Ambiguity-enabled components propagate choices, can defer choosing, enumerating

  47. Summary: Contexted constraint satisfaction • Packed • facts not duplicated • facts not hidden in Boolean structure • Efficient • deductions not duplicated • fast fact processing (e.g. equality) can prune slow disjunctive processing • optimized for independence • General and simple • applies to any deductive system, uniform across modules • not limited to special-case disjunctions • mathematically trivial • Compositional free-choice system • enumeration of (exponentially many?) valid solutions deferred across module boundaries • enables backtrack-free, linear-time, on-demand enumeration • enables packed refinement by cross-module constraints: new nogoods

  48. The remaining exponential • Contexted constraint satisfaction (typically) avoids the Boolean explosion in solving f-structure constraints for single trees • How can we suppress tree enumeration? (and still determine satisfiability)

  49. Ordering strategy: Easy things first • Do all c-structure before any f-structure processing • Chart is a free choice representation, guarantees valid trees • Only produce/solve f-structure constraints for constituents in complete, well-formed trees [NB: Interleaved, bottom-up pruning is a bad idea] Bets on inconsistency, not independence

  50. Asking the right question • How can we make it faster? • More efficient unifier: undoable operations, better indexing, clever data structures, compiling. • Reordering for more effective pruning. • Why not cubic? • Intuitively, the problem isn’t that hard. • GPSG: Natural language is nearly context free. • Surely for context-free equivalent grammars!

More Related