1 / 23

Probabilistic Parsing

Probabilistic Parsing. CSPP 56553 Artificial Intelligence February 25, 2004. Roadmap. Probabilistic CFGs Handling ambiguity – more likely analyses Adding probabilities Grammar Issues with probabilities Resolving issues Lexicalized grammars Independence assumptions.

saddam
Download Presentation

Probabilistic Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Parsing CSPP 56553 Artificial Intelligence February 25, 2004

  2. Roadmap • Probabilistic CFGs • Handling ambiguity – more likely analyses • Adding probabilities • Grammar • Issues with probabilities • Resolving issues • Lexicalized grammars • Independence assumptions

  3. Representation:Context-free Grammars • CFGs: 4-tuple • A set of terminal symbols: Σ • A set of non-terminal symbols: N • A set of productions P: of the form A -> α • Where A is a non-terminal and α in (Σ U N)* • A designated start symbol S • L = W|w in Σ* and S=>*w • Where S=>*w means S derives w by some seq

  4. Representation:Context-free Grammars • Partial example • Σ: the, cat, dog, bit, bites, man • N: NP, VP, AdjP, Nominal • P: S-> NP VP; NP -> Det Nom; Nom-> N Nom|N • S S NP VP Det Nom V NP N Det Nom N The dog bit the man

  5. Parsing Goals • Accepting: • Legal string in language? • Formally: rigid • Practically: degrees of acceptability • Analysis • What structure produced the string? • Produce one (or all) parse trees for the string

  6. Parsing Search Strategies • Top-down constraints: • All analyses must start with start symbol: S • Successively expand non-terminals with RHS • Must match surface string • Bottom-up constraints: • Analyses start from surface string • Identify POS • Match substring of ply with RHS to LHS • Must ultimately reach S

  7. Integrating Strategies • Left-corner parsing: • Top-down parsing with bottom-up constraints • Begin at start symbol • Apply depth-first search strategy • Expand leftmost non-terminal • Parser can not consider rule if current input can not be first word on left edge of some derivation • Tabulate all left-corners for a non-terminal

  8. Issues • Left recursion • If the first non-terminal of RHS is recursive -> • Infinite path to terminal node • Could rewrite • Ambiguity: pervasive (costly) • Lexical (POS) & structural • Attachment, coordination, np bracketing • Repeated subtree parsing • Duplicate subtrees with other failures

  9. Earley Parsing • Avoid repeated work/recursion problem • Dynamic programming • Store partial parses in “chart” • Compactly encodes ambiguity • O(N^3) • Chart entries: • Subtree for a single grammar rule • Progress in completing subtree • Position of subtree wrt input

  10. Earley Algorithm • Uses dynamic programming to do parallel top-down search in (worst case) O(N3) time • First, left-to-right pass fills out a chart with N+1 states • Think of chart entries as sitting between words in the input string keeping track of states of the parse at these positions • For each word position, chart contains set of states representing all partial parse trees generated to date. E.g. chart[0] contains all partial parse trees generated at the beginning of the sentence

  11. Representation:Probabilistic Context-free Grammars • PCFGs: 5-tuple • A set of terminal symbols: Σ • A set of non-terminal symbols: N • A set of productions P: of the form A -> α • Where A is a non-terminal and α in (Σ U N)* • A designated start symbol S • A function assigning probabilities to rules: D • L = W|w in Σ* and S=>*w • Where S=>*w means S derives w by some seq

  12. Parse Ambiguity • Two parse trees S S NP VP NP VP N V NP NP PP N V NP PP Det N P NP Det N P NP Det N Det N I saw the man with the duck I saw the man with the duck

  13. Small Probabilistic Grammar NP -> N NP -> Det N NP -> Det Adj N NP -> NP PP 0.45 0.85 VP -> V VP -> V NP VP -> V NP PP S -> NP VP S -> S conj S 1.0 PP -> P NP 0.65 0.45 0.15 0.10 0.10 0.05

  14. Parse Probabilities • T(ree),S(entence),n(ode),R(ule) • T1 = 0.85*0.2*0.1*0.65*1*0.65 = 0.007 • T2 = 0.85*0.2*0.45*0.05*0.65*1*0.65 = 0.003 • Select T1 • Best systems achieve 92-93% accuracy

  15. Issues with PCFGs • Non-local dependencies • Rules are context-free; language isn’t • Example: • Subject vs non-subject NPs • Subject: 90% pronouns (SWB) • NP-> Pron vs NP-> Det Nom: doesn’t know if subj • Lexical context: • Verb subcategorization: • Send NP PP vs Saw NP PP • One approach: lexicalization

  16. Probabilistic Lexicalized CFGs • Key notion: “head” • Each non-terminal assoc w/lexical head • E.g. verb with verb phrase, noun with noun phrase • Each rule must identify RHS element as head • Heads propagate up tree • Conceptually like adding 1 rule per head value • VP(dumped) -> VBD(dumped)NP(sacks)PP(into) • VP(dumped) -> VBD(dumped)NP(cats)PP(into)

  17. PLCFG with head features S(dumped) NP(workers) VP(dumped) NNS(workers) VBD(dumped) NP(sacks) PP(into) NNS(sacks) P(into) NP(bin) DT(a) NN(bin) Workers dumped sacks into a bin

  18. PLCFGs • Issue: Too many rules • No way to find corpus with enough examples • (Partial) Solution: Independence assumed • Condition rule on • Category of LHS, head • Condition head on • Category of LHS and parent’s head

  19. Disambiguation Example S(dumped) NP(workers) VP(dumped) NP(sacks) NNS(workers) VBD(dumped) NP(sacks) PP(into) NNS(sacks) P(into) NP(bin) Dt(a) NN(bin) Workers dumped sacks into a bin

  20. Disambiguation Example II

  21. Evaluation • Compare to “Gold Standard” manual parse • Correct if constituent has same start, end, label • Three measures: • Labeled Recall: • # correct constituents in candidate parse of s ------------------------------------------------------------ # correct constituents in treebank parse of c • Labeled Precision: • # correct constituents in candidate parse of s ------------------------------------------------------------ # total constituents in candidate parse of c • Cross-brackets: (A (B C)) vs ((A B) C) • Standard: 90%,90%,1%

  22. Dependency Grammars • Pure lexical dependency • Relate words based on binary syntactic relns • No constituency, phrase structure rules • Why? • Better for languages with flexible word orders • Abstracts away from surface form/order • Root node + word nodes • Link with dependency relations – fixed set • E.g. subj, obj, dat, loc, attr, mode, comp,…

  23. Dependency Grammar Example <Root> Main: Dumped loc obj subj Workers Sacks Into pcomp bin det a

More Related