1 / 38

Learning Accurate, Compact, and Interpretable Tree Annotation

Learning Accurate, Compact, and Interpretable Tree Annotation. Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein. The Game of Designing a Grammar. Annotation refines base treebank symbols to improve statistical fit of the grammar Parent annotation [Johnson ’98].

ophrah
Download Presentation

Learning Accurate, Compact, and Interpretable Tree Annotation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Accurate, Compact, and Interpretable Tree Annotation Slav Petrov, Leon Barrett, Romain Thibaux, Dan Klein

  2. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98]

  3. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00]

  4. The Game of Designing a Grammar • Annotation refines base treebank symbols to improve statistical fit of the grammar • Parent annotation [Johnson ’98] • Head lexicalization[Collins ’99, Charniak ’00] • Automatic clustering?

  5. Previous Work:Manual Annotation [Klein & Manning ’03] • Manually split categories • NP: subject vs object • DT: determiners vs demonstratives • IN: sentential vs prepositional • Advantages: • Fairly compact grammar • Linguistic motivations • Disadvantages: • Performance leveled out • Manually annotated

  6. [Matsuzaki et. al ’05, Prescher ’05] Previous Work:Automatic Annotation Induction • Advantages: • Automatically learned: Label all nodes with latent variables. Same number k of subcategories for all categories. • Disadvantages: • Grammar gets too large • Most categories are oversplit while others are undersplit.

  7. Previous work is complementary

  8. Forward X1 X7 X2 X4 X3 X5 X6 . He was right Backward Learning Latent Annotations EM algorithm: • Brackets are known • Base categories are known • Only induce subcategories Just like Forward-Backward for HMMs.

  9. Limit of computational resources Overview - Hierarchical Training - Adaptive Splitting - Parameter Smoothing

  10. DT-2 DT-3 DT-1 DT-4 Refinement of the DT tag DT

  11. Refinement of the DT tag DT

  12. Hierarchical refinement of the DT tag

  13. Hierarchical Estimation Results

  14. Refinement of the , tag • Splitting all categories the same amount is wasteful:

  15. Oversplit? The DT tag revisited

  16. Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  17. Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  18. Adaptive Splitting • Want to split complex categories more • Idea: split everything, roll back splits which were least useful

  19. Adaptive Splitting • Evaluate loss in likelihood from removing each split = Data likelihood with split reversed Data likelihood with split • No loss in accuracy when 50% of the splits are reversed.

  20. Adaptive Splitting Results

  21. Number of Phrasal Subcategories

  22. Number of Phrasal Subcategories NP VP PP

  23. Number of Phrasal Subcategories NAC X

  24. Number of Lexical Subcategories POS TO ,

  25. Number of Lexical Subcategories RB VBx IN DT

  26. Number of Lexical Subcategories NNP JJ NNS NN

  27. Smoothing • Heavy splitting can lead to overfitting • Idea: Smoothing allows us to pool statistics

  28. Linear Smoothing

  29. Result Overview

  30. Result Overview

  31. Result Overview

  32. Final Results

  33. Final Results

  34. Linguistic Candy • Proper Nouns (NNP): • Personal pronouns (PRP):

  35. Linguistic Candy • Relative adverbs (RBR): • Cardinal Numbers (CD):

  36. Conclusions • New Ideas: • Hierarchical Training • Adaptive Splitting • Parameter Smoothing • State of the Art Parsing Performance: • Improves from X-Bar initializer 63.4 to 90.2 • Linguistically interesting grammars to sift through.

  37. Thank You! petrov@eecs.berkeley.edu

  38. Other things we tried • X-Bar vs structurally annotated grammar: • X-Bar grammar starts at lower performance, but provides more flexibility • Better Smoothing: • Tried different (hierarchical) smoothing methods, all worked about the same • (Linguistically) constraining rewrite possibilities between subcategories: • Hurts performance • EM automatically learns that most subcategory combinations are meaningless: ≥ 90% of the possible rewrites have 0 probability

More Related