330 likes | 641 Views
Posterior Regularization for Structured Latent Variable Models. Li Z honghua I2R SMT Reading Group. Outline. Motivation and Introduction Posterior Regularization Application Implementation Some Related Frameworks. Motivation and Introduction. Prior Knowledge
E N D
Posterior Regularization for Structured Latent Variable Models Li Zhonghua I2R SMT Reading Group
Outline • Motivation and Introduction • Posterior Regularization • Application • Implementation • Some Related Frameworks
Motivation and Introduction Prior Knowledge We posses a wealth of prior knowledge about most NLP tasks.
Motivation and Introduction Leveraging Prior Knowledge Possible approaches and their limitations
Motivation and Introduction--Limited Approach Bayesian Approach : Encode prior knowledge with a prior on parameters • Limitation:Our prior knowledge is not about parameters! • Parameters are difficult to interpret; hard to get desired effect.
Motivation and Introduction--Limited Approach Augmenting Model : Encode prior knowledge with additional variables and dependencies. limitation: may make exact inference intractable
Posterior Regularization • A declarative language for specifying prior knowledge -- Constraint Features & Expectations • Methods for learning with knowledge in this language -- EM style learning algorithm
Posterior Regularization Original Objective :
Posterior Regularization EM style learning algorithm
Posterior Regularization Computing the Posterior Regularizer
Application Statistical Word Alignments IBM Model 1 and HMM
Application One feature for each source word m, that counts how many times it is aligned to a target word in the alignment y.
Application Define feature for each target-source position pair i,j . The feature takes the value zero in expectation if a word pair i ,j is aligned with equal probability in both directions.
Application Learning Tractable Word Alignment Models with Complex Constraints CL10
Application • Six language pairs • both types of constraints improve over the HMM in terms of both precision and recall • improve over the HMM by 10% to 15% • S-HMM performs slightly better than B-HMM • S-HMM performs better than B-HMM in 10 out of 12 cases • improve over IBM M4 9 times out of 12
Implementation • http://code.google.com/p/pr-toolkit/
more info: http://sideinfo.wikkii.com many of my slides get from there Thanks!