6•863J Natural Language Processing Lecture 8: Not an Earley finish

# 6•863J Natural Language Processing Lecture 8: Not an Earley finish

## 6•863J Natural Language Processing Lecture 8: Not an Earley finish

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
##### Presentation Transcript

1. 6•863J Natural Language ProcessingLecture 8: Not an Earley finish Instructor: Robert C• Berwickberwick@csail•mit•edu

2. The Menu Bar • Administrivia: Agenda: Earley’s algorithm Time complexity Parsing strategies: Earley algorithm What do people do? 6•863J/9•611J SP05

3. Example Grammar S  NP VP NP  Name NP  Det N N  pjs NP  NP PP N  elephant VP  V NP V  shot VP  VP PP P  in PP  P NP Det  an Det  my Name  I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05

4. Marxist analysis ---- Rules ----- Start -> S S -> NP VP NP -> Det N NP -> NP PP NP -> Name PP -> P NP VP -> VP PP VP -> V NP VP -> V ---- Lexicon ----- an Det my Det shot V shot N my Det elephant N pajamas N I Name in P I shot an elephant in my pajamas 6•863J/9•611J SP05

5. State set simulation of nondeterministic machine • State Set S0 = set of all states (‘edges’) we can be in after reading 0 words/pos • March along constructing each state set Si+1 from previous state set state set Si 6•863J/9•611J SP05

6. State-set construction Initialize: S0initial state set= initial state edge [Start S , 0, n]  e-closure of this set under predict, complete Loop: For word i=1,…,n Si+1 computed from Si (using scan, predict, complete) scan; then predict, complete Final: Is a final edge in Sn? [Start S , 0, n] Sn ? 6•863J/9•611J SP05

7. Start position (left edge) Progress position (right edge) Dotted rule The basic state representation: an item triple [NP Det • N, 0, 2] 6•863J/9•611J SP05

8. Det N the guy Scan Scan Another way to view it [NP  Det N• 0 2] [NP  •Det N 0 0] [NP  Det N • 0 2] [NP  Det • N 0 1] 6•863J/9•611J SP05

9. Earley Parser 6•863J/9•611J SP05

10. The chart represents ambiguity by multiple back links in matrix End position of edge Start pos of edge 6•863J/9•611J SP05

11. Example: Top-down init w/ chart [S •NP VP, 0,0] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs We are constructing State set S0 - 6•863J/9•611J SP05

12. In picture form [S •NP VP, 0, 0] [NP • D N, 0, 0] (from td rule, or predict) [NP  • Name, 0, 0] [NP  • NP PP, 0, 0] + all POS expansions 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs State set S0 now done 6•863J/9•611J SP05

13. Construct S1 from S0: Scan to next word…follow the bouncing dot… [S •NP VP, 0, 0] [NP • Det N, 0, 0] [NP  • Name, 0, 0] [NP  • NP PP, 0, 0] I shot an elephant in my pjs [NP  Name •, 0, 1] 6•863J/9•611J SP05

14. The Fundamental Rule(“complete”) Applies… • As time goes by… • Actually, as NP goes by… • We can also extend the length of all the other edges that had an NP with a dot before them… • That is, 6•863J/9•611J SP05

15. In picture form [S  NP • VP, 0, 1] complete [NP  NP • PP, 0, 1] complete [VP  • V NP, 1, 1] predict [VP  • VP PP, 1, 1] predict predict [PP  • P NP, 1, 1] 0 I 1 2 shot 3 an 4 elephant 5 in 6 my 7 pjs [NP  Name •, 0, 1] State set S1 now done 6•863J/9•611J SP05

16. Scan Verb - extend edge [VP  V • NP, 1,2] VP • VP PP I shot an elephant in my pjs NP  N • What next? … Predict NP, add all edges corresponding to expansion of NP S  NP • VP 6•863J/9•611J SP05

17. Picture: Complete combines edges (The “fundamental rule”) S  NP VP • VP  V NP • VP  VP • PP I shot an elephant in my pjs NP  D N • NP  N • NP  • NP PP S  NP • VP 6•863J/9•611J SP05

18. State set construction – cols in chart … … 6•863J/9•611J SP05

19. How does each of the 3 ops change Si? • Scan: (jump over a token) • Before: [A atb, k, i-1] in State Set Si-1 & word i= t • Result: Add [A at b, k, i] to State Set Si • (Do this for all items [triples] in State Set Si-1) • Predict (Push): (encounter nonterminal) • Before: [A aBb, k, i-1] , B= a nonterminal, in Si then • After: Add all new triples of form [B   g, i, i] to State Set Si • Complete(Pop): (finish w/ nonterminal) • Before: If Si contains triple in form [B  g , k, i] then • After: go to state set Sk and for all rules of form [A aBb, k, i-1], add triples [A aB b, k, i] to state set Si 6•863J/9•611J SP05

20. The main deal input x = x1 …… xn • S0 = {[Start  •S, 0, 0]} • For 0  i  n do: Process each item s  Si in order by applying to it the single applicable operation among: (a) Predictor (adds new items to Si ) (b) Completer (adds new items to Si ) (c) Scanner (adds new items to Si+1 ) • If Si+1 = , reject the input • If i= n and S n = {[Start  S • , 0, n],…} then Accept then input; else reject 6•863J/9•611J SP05

21. Earley’s Algorithm: Predictor • Predictor(AB, [i,j]) Example For each rule: Add: A aBb i j B g Bg i A B a b B g Input Rule 6•863J/9•611J SP05

22. Predictor (wishor) • Predict (Push): • Before: [A aB b, k, i] , B=nonterminal, in Si then • After: Add all new edges of form [B   g, i+1, i+1] to State Set Si+1 • Cries out its need for a phrase of type B 6•863J/9•611J SP05

23. Earley’s Algorithm: Scanner • Scanner(AB, [i,j]) Example For each rule: Add edge: A aBb A aBb Bw i j+1 i j A A a b B a b B w Input Rule 6•863J/9•611J SP05

24. Scan – formally (“Find a word/POS”) • Scan: (jump over a token) • Before: [A at b, k, i] in State Set Si & word i= t • Result: Add [A at b, k, i+1] to State Set Si+1 6•863J/9•611J SP05

25. Earley’s Algorithm: Completer • Completer(B, [i,j]) Example For each edge Add: B g i j A aBb A a Bb k j k i B A A g a b B a b B g Rule Input 6•863J/9•611J SP05

26. More precisely • Complete(Pop): (finish w/ phrase) • Before: If Si contains e in form [B  g , k, i] then go to back to state set Sk and for all rules of form [A aB b, j, k], add edges E’ [A aB b, j, i] to state set Si 6•863J/9•611J SP05

27. “The fundamental rule”: glues smaller trees into larger ones VP V • NP NP d n •  = shot start pos= 1, len 1 1 2 2 4 an elephant start= 2, len=2 VP V NP• 1 start pos= 1, len 3 4 6•863J/9•611J SP05

28. Earley’s Algorithm: Rules Initialization Predictor Scanner Completer 6•863J/9•611J SP05

29. State set construction – cols in chart … … 6•863J/9•611J SP05

30. After NP: S [i, k+2] Predict Start: k Scan: NP [k, k+1] Scan: NP [k, k+2] Complete: NP [k, k+2] Take note of the start-stop indices S Initially: S [i, k] Predict: NP [k, k] NP j k i the dog k+1 k k+2 6•863J/9•611J SP05

31. Indices: the left-hand edge ‘you are here’ • Predict: does not increment - NP[k,k] • Scan: does increment, by 1, the left-hand edge: NP[k,k]  NP[k,k+1]  NP[k,k+2] • Complete: increments left-hand edge of item in a previous State Set: S[j,k]  S[j, k+2] 6•863J/9•611J SP05

32. Example Grammar S  NP VP NP  Name NP  Det N N  pjs NP  NP PP N  elephant VP  V NP V  shot VP  VP PP P  in PP  P NP Det  an Det  my Name  I I shot an elephant in my pjs Name V Det N P Det N 6•863J/9•611J SP05

33. Remember this stands for (0, S • NP VP) predict the kind of S we are looking for

34. predict the kind of NP we are looking for (actually we’ll look for 3 kinds: any of the 3 will do)

35. predict the kind of Det we are looking for (2 kinds)

36. predict the kind of NP we’re looking for but we were already looking for these so don’t add duplicates! Note that this happened when we were processing a left-recursive rule•

37. scan: the desired word is in the input!

38. scan: failure

39. scan: failure

40. attach the newly createdNP (which starts at 0) to its customers (incomplete constituents that end at 0 and have NP after the dot)

41. predict

42. predict

43. predict

44. predict

45. predict

46. scan: success!

47. scan: failure

48. complete

49. predict