1 / 77

Lecture 7: Probabilistic CKY Parser

This lecture covers the probabilistic CKY parser for natural language processing, focusing on the challenge of ambiguity. It provides examples and strategies for resolving ambiguity through backpointers and weight picking.

baileygary
Download Presentation

Lecture 7: Probabilistic CKY Parser

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 7: Probabilistic CKY Parser CSCI 544: Applied Natural Language Processing Nanyun (Violet) Peng based on slides of Jason Eisner

  2. Our bane: Ambiguity • John saw Mary • Iron Mary • Phillips screwdriver Mary note how rare rules interact • I see a bird • is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech • Time flies like an arrow • Fruit flies like a banana • Time reactions like this one • Time reactions like a chemist • or is it just an NP?

  3. Our bane: Ambiguity • John saw Mary • Iron Mary • Phillips screwdriver Mary note how rare rules interact • I see a bird • is this 4 nouns – parsed like “city park scavenger bird”? rare parts of speech • Time | flies like an arrow NP VP • Fruit flies | like a banana NP VP • Time | reactions like this one V[stem] NP • Time reactions | like a chemist S PP • or is it just an NP?

  4. How to solve this combinatorial explosion of ambiguity? • First try parsing without any weird rules, throwing them in only if needed. • Better: every rule has a weight. A tree’s weight is total weight of all its rules. Pick the overall lightest (best) parse of sentence. • Can we pick the weights automatically?We’ll get to this later …

  5. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  6. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  7. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  8. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  9. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  10. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  11. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  12. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  13. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  14. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  15. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  16. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  17. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  18. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  19. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  20. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  21. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  22. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  23. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  24. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  25. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  26. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  27. 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  28. S Follow backpointers … 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  29. S NP VP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  30. S NP VP PP VP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  31. S NP VP PP VP P NP 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  32. S NP VP PP VP P NP Det N 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  33. Which entries do we need? 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  34. Which entries do we need? 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  35. Not worth keeping … 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  36. … since it just breeds worse options 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  37. Keep only best-in-class! inferior stock 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  38. Keep only best-in-class! (and its backpointers so you can recover best parse) 1 S  NP VP 6 S  Vst NP 2 S  S PP 1 VP  V NP 2 VP  VP PP 1 NP  Det N 2 NP  NP PP 3 NP  NP NP 0 PP  P NP

  39. Chart (CKY) Parsing phrase(X,I,J) :- rewrite(X,W), word(W,I,J). phrase(X,I,J) :- rewrite(X,Y,Z), phrase(Y,I,Mid), phrase(Z,Mid,J). goal :- phrase(start_symbol, 0, sentence_length). 39

  40. Weighted Chart Parsing (“min cost”) phrase(X,I,J)min= rewrite(X,W) + word(W,I,J). phrase(X,I,J)min= rewrite(X,Y,Z) + phrase(Y,I,Mid) + phrase(Z,Mid,J). goal min= phrase(start_symbol, 0, sentence_length). 40

  41. Probabilistic Trees • Instead of lightest weight tree, take highest probability tree • Given any tree, your parser would have some probability of producing it! • Just like using n-grams to choose among strings … • What is the probability of this tree? S NP time VP PP VP flies P like NP Det an N arrow

  42. Probabilistic Trees • Instead of lightest weight tree, take highest probability tree • Given any tree, your assignment 1 generator would have some probability of producing it! • Just like using n-grams to choose among strings … • What is the probability of this tree? • You rolled a lot of independent dice … S NP time VP | S) p( PP VP flies P like NP Det an N arrow

  43. Chain rule: One word at a time p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

  44. Chain rule + backoff (to get trigram model) p(time flies like an arrow) = p(time) * p(flies | time) * p(like | time flies) * p(an | time flies like) * p(arrow | time flies like an)

  45. Chain rule: One node at a time S S S S NP time VP | S) * p( | ) | S) = p( p( NP NP time VP VP NP VP PP VP flies P like NP S S * p( | ) Det an N arrow NP time NP time VP VP PP VP S S * p( | ) * … NP time NP time VP VP PP VP flies PP VP

  46. Chain rule + backoff PCFG S S S S NP time VP | S) * p( | ) | S) = p( p( NP NP time VP VP NP VP PP VP flies P like NP S S * p( | ) Det an N arrow NP time NP time VP VP PP VP S S * p( | ) * … NP time NP time VP VP PP VP flies PP VP

  47. Simplified notation PCFG S NP time VP | S) = p(S NP VP| S) * p(NP  time| NP) p( PP VP flies P like NP * p(VP VP NP| VP) Det an N arrow * p(VP flies| VP) * …

  48. Already have a CKY alg for weights … S NP time VP | S) = w(S NP VP) + w(NP  time) w( PP VP flies P like NP + w(VP VP NP) Det an N arrow + w(VP flies) + … Just let w(X Y Z) = -log p(X Y Z| X) Then lightest tree has highest prob

  49. Weighted Chart Parsing (“min cost”) phrase(X,I,J)min= rewrite(X,W) + word(W,I,J). phrase(X,I,J)min= rewrite(X,Y,Z) + phrase(Y,I,Mid) + phrase(Z,Mid,J). goal min= phrase(start_symbol, 0, sentence_length). 49

  50. Probabilistic Chart Parsing (“max prob”) phrase(X,I,J)max= rewrite(X,W) * word(W,I,J). phrase(X,I,J)max= rewrite(X,Y,Z) * phrase(Y,I,Mid) * phrase(Z,Mid,J). goal max= phrase(start_symbol, 0, sentence_length). 50

More Related