1 / 42

Parsing German with Latent Variable Grammars

Parsing German with Latent Variable Grammars. Slav Petrov and Dan Klein UC Berkeley. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98] Head lexicalization [Collins ’99, Charniak ’00]

jericho
Download Presentation

Parsing German with Latent Variable Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parsing German with Latent Variable Grammars Slav Petrov and Dan Klein UC Berkeley

  2. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?

  3. Previous Work:Manual Annotation [Klein & Manning ’03] • Manually split categories • NP: subject vs object • DT: determiners vs demonstratives • IN: sentential vs prepositional • Advantages: • Fairly compact grammar • Linguistic motivations • Disadvantages: • Performance leveled out • Manually annotated

  4. [Matsuzaki et. al ’05, Prescher ’05] Previous Work:Automatic Annotation Induction • Advantages: • Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. • Disadvantages: • Grammar gets too large • Most categories are oversplit while others are undersplit.

  5. Overview [Petrov, Barrett, Thibaux & Kleinin ACL’06] • Learning: • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Inference: • Coarse-To-Fine Decoding • Variational Approximation • German Analysis [Petrov & Klein in NAACL’07]

  6. Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.

  7. Limit of computational resources Starting Point

  8. DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT

  9. Refinement of the DT tag DT

  10. Hierarchical Refinement of the DT tag DT

  11. Hierarchical Estimation Results

  12. Refinement of the , tag • Splitting all categories the same amount is wasteful:

  13. Oversplit? The DT tag revisited

  14. Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  15. Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  16. Adaptive Splitting • Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split • No loss in accuracy when 50% of the splits are reversed.

  17. Adaptive Splitting Results

  18. Number of Phrasal Subcategories

  19. Number of Lexical Subcategories

  20. Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics

  21. Linear Smoothing

  22. Result Overview

  23. Treebank Coarse grammar Prune Parse Parse NP … VP NP-apple NP-1 VP-6 VP-run NP-17 NP-dog … … VP-31 NP-eat NP-12 NP-cat … … Refined grammar Refined grammar [Goodman ‘97, Charniak&Johnson ‘05] Coarse-to-Fine Parsing

  24. Hierarchical Pruning Consider the span 5 to 12: coarse: split in two: split in four: split in eight:

  25. G1 G2 G3 G4 G5 G6 DT DT1 DT2 DT1 DT2 DT3 DT4 Learning DT1 DT2 DT3 DT4 DT5 DT6 DT7 DT8 Intermediate Grammars X-Bar=G0 G=

  26. the the that that this this some some That That this this these these some some That • This • … • … … That That … … … this this … these this … … these … that … that these … … … some some … some some … EM State Drift (DT tag)

  27. 0(G) 1(G) 2(G) 3(G) 4(G) 5(G) G1 G2 G3 G4 G5 G6 G1 G2 G3 G4 G5 G6 Learning Learning Projection i G Projected Grammars X-Bar=G0 G=

  28. Bracket Posteriors (after G0)

  29. Bracket Posteriors (after G1)

  30. Bracket Posteriors (Movie) (Final Chart)

  31. Bracket Posteriors (Best Tree)

  32. -2 -1 Parses: Derivations: -2 -1 -1 -2 -1 -1 -2 -1 -1 -2 -1 -1 Parse Selection Computing most likely unsplit tree is NP-hard: • Settle for best derivation. • Rerank n-best list. • Use alternative objective function / Variational Approximation.

  33. Efficiency Results • Berkeley Parser: • 15 min • Implemented in Java • Charniak & Johnson ‘05 Parser • 19 min • Implemented in C

  34. Accuracy Results

  35. Parsing German Shared Task • Two Pass Parsing • Determine constituency structure (F1: 85/94) • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels

  36. Parsing German Shared Task • Two Pass Parsing • Determine constituency structure • Assign grammatical functions • One Pass Approach • Treat categories+grammatical functions as labels

  37. Development Set Results

  38. Shared Task Results

  39. Part-of-speech splits

  40. Linguistic Candy

  41. Conclusions • Split & Merge Learning • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • Hierarchical Coarse-to-Fine Inference • Projections • Marginalization • Multi-lingual Unlexicalized Parsing

  42. Thank You! Parser is avaliable at http://nlp.cs.berkeley.edu

More Related