6•863J Natural Language Processing Lecture 8: Not an Earley finish - PowerPoint PPT Presentation

6 863j natural language processing lecture 8 not an earley finish n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
6•863J Natural Language Processing Lecture 8: Not an Earley finish PowerPoint Presentation
Download Presentation
6•863J Natural Language Processing Lecture 8: Not an Earley finish

play fullscreen
1 / 174
6•863J Natural Language Processing Lecture 8: Not an Earley finish
96 Views
Download Presentation
sidney
Download Presentation

6•863J Natural Language Processing Lecture 8: Not an Earley finish

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. 6•863J Natural Language ProcessingLecture 8: Not an Earley finish Instructor: Robert C• Berwickberwick@csail•mit•edu

  2. The Menu Bar • Administrivia: Agenda: Earley’s algorithm Time complexity Parsing strategies: Earley algorithm What do people do? 6•863J/9•611J SP05

  3. Example Grammar S  NP VP NP  Name NP  Det N N  pjs NP  NP PP N  elephant VP  V NP V  shot VP  VP PP P  in PP  P NP Det  an Det  my Name  I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05

  4. Marxist analysis ---- Rules ----- Start -> S S -> NP VP NP -> Det N NP -> NP PP NP -> Name PP -> P NP VP -> VP PP VP -> V NP VP -> V ---- Lexicon ----- an Det my Det shot V shot N my Det elephant N pajamas N I Name in P I shot an elephant in my pajamas 6•863J/9•611J SP05

  5. State set simulation of nondeterministic machine • State Set S0 = set of all states (‘edges’) we can be in after reading 0 words/pos • March along constructing each state set Si+1 from previous state set state set Si 6•863J/9•611J SP05

  6. State-set construction Initialize: S0initial state set= initial state edge [Start S , 0, n]  e-closure of this set under predict, complete Loop: For word i=1,…,n Si+1 computed from Si (using scan, predict, complete) scan; then predict, complete Final: Is a final edge in Sn? [Start S , 0, n] Sn ? 6•863J/9•611J SP05

  7. Start position (left edge) Progress position (right edge) Dotted rule The basic state representation: an item triple [NP Det • N, 0, 2] 6•863J/9•611J SP05

  8. Det N the guy Scan Scan Another way to view it [NP  Det N• 0 2] [NP  •Det N 0 0] [NP  Det N • 0 2] [NP  Det • N 0 1] 6•863J/9•611J SP05

  9. Earley Parser 6•863J/9•611J SP05

  10. The chart represents ambiguity by multiple back links in matrix End position of edge Start pos of edge 6•863J/9•611J SP05

  11. Example: Top-down init w/ chart [S •NP VP, 0,0] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs We are constructing State set S0 - 6•863J/9•611J SP05

  12. In picture form [S •NP VP, 0, 0] [NP • D N, 0, 0] (from td rule, or predict) [NP  • Name, 0, 0] [NP  • NP PP, 0, 0] + all POS expansions 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs State set S0 now done 6•863J/9•611J SP05

  13. Construct S1 from S0: Scan to next word…follow the bouncing dot… [S •NP VP, 0, 0] [NP • Det N, 0, 0] [NP  • Name, 0, 0] [NP  • NP PP, 0, 0] I shot an elephant in my pjs [NP  Name •, 0, 1] 6•863J/9•611J SP05

  14. The Fundamental Rule(“complete”) Applies… • As time goes by… • Actually, as NP goes by… • We can also extend the length of all the other edges that had an NP with a dot before them… • That is, 6•863J/9•611J SP05

  15. In picture form [S  NP • VP, 0, 1] complete [NP  NP • PP, 0, 1] complete [VP  • V NP, 1, 1] predict [VP  • VP PP, 1, 1] predict predict [PP  • P NP, 1, 1] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs [NP  Name •, 0, 1] State set S1 now done 6•863J/9•611J SP05

  16. Scan Verb - extend edge [VP  V • NP, 1,2] VP • VP PP I shot an elephant in my pjs NP  N • What next? … Predict NP, add all edges corresponding to expansion of NP S  NP • VP 6•863J/9•611J SP05

  17. Picture: Complete combines edges (The “fundamental rule”) S  NP VP • VP  V NP • VP  VP • PP I shot an elephant in my pjs NP  D N • NP  N • NP  • NP PP S  NP • VP 6•863J/9•611J SP05

  18. State set construction – cols in chart … … 6•863J/9•611J SP05

  19. How does each of the 3 ops change Si? • Scan: (jump over a token) • Before: [A atb, k, i-1] in State Set Si-1 & word i= t • Result: Add [A at b, k, i] to State Set Si • (Do this for all items [triples] in State Set Si-1) • Predict (Push): (encounter nonterminal) • Before: [A aBb, k, i-1] , B= a nonterminal, in Si then • After: Add all new triples of form [B   g, i, i] to State Set Si • Complete(Pop): (finish w/ nonterminal) • Before: If Si contains triple in form [B  g , k, i] then • After: go to state set Sk and for all rules of form [A aBb, k, i-1], add triples [A aB b, k, i] to state set Si 6•863J/9•611J SP05

  20. The main deal input x = x1 …… xn • S0 = {[Start  •S, 0, 0]} • For 0  i  n do: Process each item s  Si in order by applying to it the single applicable operation among: (a) Predictor (adds new items to Si ) (b) Completer (adds new items to Si ) (c) Scanner (adds new items to Si+1 ) • If Si+1 = , reject the input • If i= n and S n = {[Start  S • , 0, n],…} then Accept then input; else reject 6•863J/9•611J SP05

  21. Earley’s Algorithm: Predictor • Predictor(AB, [i,j]) Example For each rule: Add: A aBb i j B g Bg i A B a b B g Input Rule 6•863J/9•611J SP05

  22. Predictor (wishor) • Predict (Push): • Before: [A aB b, k, i] , B=nonterminal, in Si then • After: Add all new edges of form [B   g, i+1, i+1] to State Set Si+1 • Cries out its need for a phrase of type B 6•863J/9•611J SP05

  23. Earley’s Algorithm: Scanner • Scanner(AB, [i,j]) Example For each rule: Add edge: A aBb A aBb Bw i j+1 i j A A a b B a b B w Input Rule 6•863J/9•611J SP05

  24. Scan – formally (“Find a word/POS”) • Scan: (jump over a token) • Before: [A at b, k, i] in State Set Si & word i= t • Result: Add [A at b, k, i+1] to State Set Si+1 6•863J/9•611J SP05

  25. Earley’s Algorithm: Completer • Completer(B, [i,j]) Example For each edge Add: B g i j A aBb A a Bb k j k i B A A g a b B a b B g Rule Input 6•863J/9•611J SP05

  26. More precisely • Complete(Pop): (finish w/ phrase) • Before: If Si contains e in form [B  g , k, i] then go to back to state set Sk and for all rules of form [A aB b, j, k], add edges E’ [A aB b, j, i] to state set Si 6•863J/9•611J SP05

  27. “The fundamental rule”: glues smaller trees into larger ones VP V • NP NP d n •  = shot start pos= 1, len 1 1 2 2 4 an elephant start= 2, len=2 VP V NP• 1 start pos= 1, len 3 4 6•863J/9•611J SP05

  28. Earley’s Algorithm: Rules Initialization Predictor Scanner Completer 6•863J/9•611J SP05

  29. State set construction – cols in chart … … 6•863J/9•611J SP05

  30. After NP: S [i, k+2] Predict Start: k Scan: NP [k, k+1] Scan: NP [k, k+2] Complete: NP [k, k+2] Take note of the start-stop indices S Initially: S [i, k] Predict: NP [k, k] NP j k i the dog k+1 k k+2 6•863J/9•611J SP05

  31. Indices: the left-hand edge ‘you are here’ • Predict: does not increment - NP[k,k] • Scan: does increment, by 1, the left-hand edge: NP[k,k]  NP[k,k+1]  NP[k,k+2] • Complete: increments left-hand edge of item in a previous State Set: S[j,k]  S[j, k+2] 6•863J/9•611J SP05

  32. Example Grammar S  NP VP NP  Name NP  Det N N  pjs NP  NP PP N  elephant VP  V NP V  shot VP  VP PP P  in PP  P NP Det  an Det  my Name  I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05

  33. Remember this stands for (0, Start  • S) Initialize

  34. Remember this stands for (0, S • NP VP) predict the kind of S we are looking for

  35. predict the kind of NP we are looking for (actually we’ll look for 3 kinds: any of the 3 will do)

  36. predict the kind of Det we are looking for (2 kinds)

  37. predict the kind of NP we’re looking for but we were already looking for these so don’t add duplicates! Note that this happened when we were processing a left-recursive rule•

  38. scan: the desired word is in the input!

  39. scan: failure

  40. scan: failure

  41. attach the newly createdNP (which starts at 0) to its customers (incomplete constituents that end at 0 and have NP after the dot)

  42. predict

  43. predict

  44. predict

  45. predict

  46. predict

  47. scan: success!

  48. scan: failure

  49. complete

  50. predict