1 / 16

BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING. Shuanhu Bai and Haizhou Li Institute for Infocomm Research, Republic of Singapore. Outline. Introduction N-gram Model Bayesian Learning QB Estimation for Incremental Learning Continuous N-gram Model Bayesian Learning

avital
Download Presentation

BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. BAYESIAN LEARNING OF N-GRAM STATISTICAL LANGUAGE MODELING Shuanhu Bai and Haizhou Li Institute for Infocomm Research, Republic of Singapore

  2. Outline • Introduction • N-gram Model • Bayesian Learning • QB Estimation for Incremental Learning • Continuous N-gram Model • Bayesian Learning • QB Estimation for Incremental Learning • Experimental Results • Conclusions

  3. Introduction • Assuming ample training data, the n-gram language models are still far from optimal • Studies show that they are extremely sensitive to changes in the style, topic or genre • LM adaptation aims at bridging the mismatch between the models and the test domain • A typical n-gram LM is trained under maximum likelihood estimation (MLE) criterion

  4. Introduction (cont.) • One typical adaptation technique is called deleted interpolation which combines the flat, reliable general model (baseline model) with the sharp, volatile domain specific model • In this paper, we will study the Bayesian learning formulation for n-gram LM adaptation • Under the Bayesian learning framework, an incremental adaptation procedure is also proposed for dynamically updating of cache-based n-gram

  5. N-gram Model • N-gram model • The quality of a given n-gram LM on a corpus D of size T is commonly assessed by the log-likelihood probability • Unigram & Bigram

  6. N-gram Model (cont.) • MLE • Smoothing • Backoff • cache

  7. Bayesian Learning for N-gram Model • Dirichlet • The probability of generating a text corpus is obtained by integrating over the parameter space • MAP

  8. QB Estimation for Incremental Learning for N-gram Model • It is of practical use to devise such incremental learning mechanism that adapts both parameters and the prior knowledge over time • Sub-corpus Dn={D1,D2,…,Dn} • The updating of parameters can be iterated between the reproducible prior and posterior estimates

  9. ML • MAP • QB

  10. Continuous N-gram Model • Continuous n-gram model is also called aggregate Markov model • We introduce Z hidden variable as the “soft” word classes • Z=1-> unigram, Z=I -> bigram • The continuous bigram model has two obviously advantages over the discrete bigram: • Parameters : I x I -> I X Z X 2 • Can apply EM to estimate parameters under MLE criterion

  11. Continuous N-gram Model (cont.) • Parameters

  12. Bayesian Learning for Continuous N-gram Model • Prior • After EM algorithm • Can be interpreted as a smoothing between the known priors and the current observations, or cache corpus

  13. QB Estimation for Incremental Learning for continuous N-gram Model • Updating of parameters • Initial parameters

  14. Experimental Results • Corpus • A: 60 million words from LDC98T30 of finance and business • B: 20 million words from LDC98T30 of sports and fashion for incremental training • C: A+B for adaptation • D: 20 million words in the same domain of C (open test set) • Vocabulary: 50,000 words from A + B

  15. Experimental Results (cont.)

  16. Conclusions • Propose a Bayesian learning approach to n-gram modeling • an interpretation for the smoothing or adaptation of language model as a weighting between prior knowledge and current observations • The Dirichlet conjugate prior not only leads to a batch adaptation • procedure but also a quasi-Bayes incremental learning strategy for on-line language modeling

More Related