1 / 31

Improved Inference for Unlexicalized Parsing

Improved Inference for Unlexicalized Parsing. Slav Petrov and Dan Klein. DT. DT 1. DT 2. DT 1. DT 2. DT 3. DT 4. DT 1. DT 2. DT 3. DT 4. DT 5. DT 6. DT 7. DT 8. [Petrov et al. ‘06]. Unlexicalized Parsing. Hierarchical, adaptive refinement:.

celine
Download Presentation

Improved Inference for Unlexicalized Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improved Inference for Unlexicalized Parsing Slav Petrov and Dan Klein

  2. DT DT1 DT2 DT1 DT2 DT3 DT4 DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 [Petrov et al. ‘06] Unlexicalized Parsing Hierarchical, adaptive refinement: 91.2 F1 score on Dev Set (1600 sentences)

  3. 1621 min

  4. Treebank Coarse grammar Prune Parse Parse NP … VP NP-apple NP-1 VP-6 VP-run NP-17 NP-dog … … VP-31 NP-eat NP-12 NP-cat … … Refined grammar Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing

  5. Prune? For each chart item X[i,j], compute posterior probability: < threshold E.g. consider the span 5 to 12: coarse: refined:

  6. 1621 min 111 min (no search error)

  7. X A,B,.. NP … VP ? ??? ??? NP-apple VP-run NP-dog … NP-eat NP-cat … Refined grammar [Charniak et al. ‘06] Multilevel Coarse-to-Fine Parsing Add more rounds of pre-parsing Grammars coarser than X-bar

  8. Hierarchical Pruning Consider again the span 5 to 12: coarse: split in two: split in four: split in eight:

  9. G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

  10. 1621 min 111 min 35 min (no search error)

  11. the the that that this this some some That That this this these these some some That • This • … • … … That That … … … this this … these this … … these … that … that these … … … some some … some some … EM State Drift (DT tag)

  12. 0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

  13. Projection  NP Easy: VP S Estimating Projected Grammars Nonterminals? NP0 NP1 VP1 VP0 S0 S1 Nonterminals in (G) Nonterminals in G

  14. ? ??? Rules in G Rules in (G) Estimating Projected Grammars Rules? S  NP VP S1 NP1 VP1 0.20 S1 NP1 VP2 0.12 S1 NP2 VP1 0.02 S1 NP2 VP2 0.03 S2 NP1 VP1 0.11 S2 NP1 VP2 0.05 S2 NP2 VP1 0.08 S2 NP2 VP2 0.12

  15. S  NP VP S1  NP1 VP1 0.20 S1  NP1 VP2 0.12 S1  NP2 VP1 0.02 S1  NP2 VP2 0.03 S2  NP1 VP1 0.11 S2  NP1 VP2 0.05 S2  NP2 VP1 0.08 S2  NP2 VP2 0.12 Rules in G Rules in (G) … Treebank Infinite tree distribution [Corazza & Satta ‘06] Estimating Projected Grammars Estimating Grammars 0.56

  16. Calculating Expectations • Nonterminals: • ck(X): expected counts up to depth k • Converges within 25 iterations (few seconds) • Rules:

  17. 1621 min 111 min 35 min 15 min (no search error)

  18. G1 G2 G3 G4 G5 G6 60 % 12 % 7 % Learning 6 % 6 % 5 % 4 % Parsing times X-Bar=G0 G=

  19. Bracket Posteriors (after G0)

  20. Bracket Posteriors (after G1)

  21. Bracket Posteriors (Movie) (Final Chart)

  22. Bracket Posteriors (Best Tree)

  23. -2 -1 Parses: Derivations: -2 -1 -1 -2 -1 -1 -2 -1 -1 -2 -1 -1 Parse Selection Computing most likely unsplit tree is NP-hard: • Settle for best derivation. • Rerank n-best list. • Use alternative objective function.

  24. [Titov & Henderson ‘06] Parse Risk Minimization • Expected loss according to our beliefs: • TT : true tree • TP : predicted tree • L : loss function (0/1, precision, recall, F1) • Use n-best candidate list and approximate expectation with samples.

  25. Reranking Results

  26. Dynamic Programming [Matsuzaki et al. ‘05] Approximate posterior parse distribution à la [Goodman ‘98] Maximize number of expected correct rules

  27. Dynamic Programming Results

  28. Final Results (Efficiency) • Berkeley Parser: • 15 min • 91.2 F-score • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • 90.7 F-score • Implemented in C

  29. Final Results (Accuracy)

  30. Conclusions • Hierarchical coarse-to-fine inference • Projections • Marginalization • Multi-lingual unlexicalized parsing

  31. Thank You! Parser available at http://nlp.cs.berkeley.edu

More Related