1 / 37

Probabilistic Parsing

Probabilistic Parsing. Ling 571 Fei Xia Week 4: 10/18-10/20/05. Outline. Misc: Hw3 and Hw4: lexicalized rules CYK recap Converting CFG into CNF N-best Quiz #2 Common prob equations Independence assumption Lexicalized models. CYK Recap. Converting CFG into CNF. CNF Extended CNF

helia
Download Presentation

Probabilistic Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Probabilistic Parsing Ling 571 Fei Xia Week 4: 10/18-10/20/05

  2. Outline • Misc: Hw3 and Hw4: lexicalized rules • CYK recap • Converting CFG into CNF • N-best • Quiz #2 • Common prob equations • Independence assumption • Lexicalized models

  3. CYK Recap

  4. Converting CFG into CNF • CNF • Extended CNF • CFG in general vs. CFG for natural languages • Converting CFG into CNF • Converting PCFG into CNF • Recovering parse trees

  5. Definition of CNF • A, B,C are non-terminal, a is terminal, S is start symbol • Definition 1: • A  B C, • A  a, • S  Where B, C are not start symbols. • Definition 2: -free grammar • A  B C • A  a

  6. Extended CNF • Definition 3: • A  B C • A  a or A  B • We use Def 3: • Unit rules such as NPN are allowed. • No need to remove unit rules during conversion. • CYK algorithm needs to be modified.

  7. CYK algorithm with Def 2 • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: if then

  8. CYK algorithm with Def 3 • For every position i for all A, if Aw_i, for all A and B, if A=>B, update • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: …. for all non-terminals A and B, if AB, update

  9. CFG • CFG in general: • G=(N, T, P, S) • Rules: • CFG for natural languages: • G=(N, T, P, S) • Pre-terminal: • Rules: • Syntactic rules: • Lexicon:

  10. Conversion from CFG to CNF • CFG (in general) to CNF (Def 1) • Add S0S • Remove e-rules • Remove unit rules • Replace n-ary rules with binary rules • CFG (for NL) to CNF (Def 3) • CFG (for NL) has no e-rules • Unit rules are allowed in CNF (Def 3) • Only the last step is necessary

  11. An example • VP  V NP PP PP • To recover the parse tree w.r.t original CFG, just remove added non-terminals.

  12. Converting PCFG into CNF • VPV NP PP PP 0.1 => VPV X1 0.1 X1 NP X2 1.0 X2 PP PP 1.0

  13. CYK with N-best output

  14. N-best parse trees • Best parse tree: • N-best parse trees:

  15. CYK algorithm for N-best • For every rule Aw_i, • For span=2 to N for begin=1 to N-span+1 end = begin + span – 1; for m=begin to end-1; for all non-terminals A, B, C: for each if val > one of probs in then remove the last element in and insert val to the array remove the last element in B[begin][end][A] and insert (m, B,C,i, j) to B[begin][end][A].

  16. Mary bought books with cash

  17. Common probability equations

  18. Three types of probability • Joint prob: P(x,y)= prob of x and y happening together • Conditional prob: P(x|y) = prob of x given a specific value of y • Marginal prob: P(x) = prob of x for all possible values of y

  19. Common equations

  20. An example • #(words)=100, #(nouns)=40, #(verbs)=20 • “books” appears 10 times, 3 as verbs, 7 as nouns • P(w=books)=0.1 • P(w=books,t=noun)=0.07 • P(t=noun|w=books)=0.7 • P(nouns)=0.4 • P(w=books|t=nouns)=7/40

  21. More general cases

  22. Independence assumption

  23. Independence assumption • Two variables A and B are independent if • P(A,B)=P(A)*P(B) • P(A)=P(A|B) • P(B)=P(B|A) • Two variables A and B are conditional independent given C if • P(A,B|C)=P(A|C) * P(B|C) • P(A|B,C)=P(A|C) • P(B|A,C)=P(B|C) • Independence assumption is used to remove some conditional factors, which will reduce the number of parameters in a model.

  24. PCFG parsers It assumes each rule is independent of other rules

  25. Problems of independence assumptions • Lexical independence: • P(VPV, Vbought) = P(VPV)*P(Vbought) See Table 12.2 on M&S P418.

  26. Problems of independence assumptions (cont) • Structural independence: • P(SNP VP, NPPron) = P(SNP VP) * P(NPPron) See Table 12.3 on M&S P420.

  27. Dealing with the problems • Lexical rules: • P(VPV | V=come) • P(VPV | V=think) • Adding context info: is a function that groups into equivalence classes.

  28. PCFG It assumes each rule is independent of other rules

  29. A lexicalized model

  30. An example • he likes her

  31. Head-head probability

  32. Head-rule probability

  33. Collecting the counts

  34. Remaining problems • he likes her • The Prob(T,S) is the same if the sentence is changed to “her likes he”.

  35. Previous model

  36. A new model

  37. New formula • he likes her

More Related