1 / 32

Adaptor Grammars

Adaptor Grammars. Ehsan Khoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University. Outline. Definition and motivation behind u nsupervised g rammar learning Non-parametric Bayesian statistics Adaptor grammars vs. PCFG

fauna
Download Presentation

Adaptor Grammars

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptor Grammars EhsanKhoddammohammadi Recent Advances in Parsing Technology WS 2012/13 Saarland University

  2. Outline • Definition and motivation behind unsupervised grammar learning • Non-parametric Bayesian statistics • Adaptor grammars vs. PCFG • A short introduction to Chinese Restaurant Process • Applications of Adaptor grammar

  3. Unsupervised Learning • How many categories of objects? • How many features does an object have? • How many words and rules are in a language?

  4. Grammar Induction Goal: • study how a grammar and parses can be learnt from terminal strings alone Motivation: • Help us understand human language acquisition • Inducing parsers for low-resource languages

  5. Nonparametric Bayesian statistics • Learning the things people learn requires using rich, unbounded hypothesis spaces • Language learning is non-parametric inference, no (obvious) bound on number of words, grammatical, morphemes. • Use stochastic processes to define priors on infinite hypothesis spaces

  6. Nonparametric Bayesian statistics • Likelihood: how well grammar describes data • Prior: Encode our knowledge or expectation of grammars before seeing the data • Universal Grammar (very specific) • Shorter Grammars (general constraints) • Posterior: Shows uncertainty of learner about which grammar is correct (distribution over grammars) Posterior Likelihood Prior

  7. Is PCFG good enough for our purpose? • PCFG can be learnt through Bayesian framework but … • Set of rules is fixed in standard PCFG estimation • PCFG rules are “too small” to be effective units of generalization How can we solve this problem?

  8. Two Non-parametric Bayesian extensions to PCFGs • let the set of non-terminals grow unboundedly: • Start with un-lexicalized short grammar • Split-Join of non-terminals • let the set of rules grow unboundedly: • Generate new rules when ever you need • Learn sub-trees and their probabilities ( Bigger units of generalization)

  9. Adaptive Grammar • CFG rules is used to generate the trees as in a CFG • We have two types of non-terminals: • Un-adapted (normal) non-terminals • Picking a rule and recursive expanding its children as in PCFG • Adapted non-terminals • Picking a rule and recursive expanding its children • Generating a previously generated tree (proportional to number of times that it is already generated) We have a Chinese Restaurant Process for each adapted non-terminal

  10. The Story of Adaptor Grammars • In PCFG, rules are applied independently from each other. • The sequence of trees generated by an adaptor grammar are not independent. • if an adapted sub-tree has been used frequently in the past, it's more likely to be used again. • An un-adapted nonterminal expands Using with probability proportional to • An adapted nonterminal expands: • to a sub-tree rooted in with probability proportional to the number of times was previously generated • Using with probability proportional to • is prior.

  11. Properties of Adaptor grammars • In Adaptor grammars: • The probability of adapted sub-trees are learnt separately, not just product of probability of rules. • “Rich get richer” (Zipf distribution) • Useful compound structures are more probable than their parts. • there is no recursion amongst adapted non-terminals (an adapted non-terminal never expands to itself)

  12. The Chinese Restaurant Process

  13. The Chinese Restaurant Process • n customers walk into a restaurant, choose tables zi with probability • Defines an exchangeable distribution over seating arrangements (inc. counts on tables)

  14. CRP

  15. CRP

  16. CRP

  17. CRP

  18. CRP

  19. Application of Adaptor grammars No usage for parsing! Because grammar induction is hard. • Word Segmentation • Learning concatenative morphology • Learning the structure of NE NPs • Topic Modeling

  20. Unsupervised Word Segmentation • Input: phoneme sequences with sentence boundaries • Task: identify words

  21. Word segmentation with PCFG

  22. Unigram word segmentation

  23. Collocation word segmentation

  24. Performance • Evaluated on Brent corpus

  25. Morphology • Input: raw text • Task: identify stems and morphemes and decompose a word to its morphological components • Adaptor grammars can just be applied for simple concatenative morphology.

  26. CFG for morphological analysis

  27. Adaptor grammar for morphological analysis Generated Words: cats dogs cats

  28. Performance • For more sophisticated model: • 116,129 tokens: 70% correctly segmented • 7,170 verb types:66% correctly segmented

  29. Inference • distribution of adapted trees are exchangeable : Gibbs sampling • Variational Inference method is also provided for learning adaptor grammars. Covering this part is beyond the objectives of this talk.

  30. Conclusion • We are interested in inducing grammars without supervision for two reasons: • Language acquisition • Low-resource languages • PCFG rules are too much small for bigger generalization • Learning the things people learn requires using rich, unbounded hypothesis spaces • Adaptor grammars using CRP to learn rules from this unbounded hypothesis spaces

  31. References • Adaptor Grammars: A Framework for Specifying Compositional Nonparametric Bayesian Models, M. Johnson et al. , ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS, 2007 • Using adaptor grammars to identify synergies in the unsupervised acquisition of linguistic structure, Mark Johnson, ACL-08, HLT , 2008 • Inferring Structure from Data, Tom Griffith, Machine Learning summer school, Sardinia, 2010

  32. Thank you for your attention!

More Related