190 likes | 317 Views
Part-of-Speech Tagging. Foundation of Statistical NLP CHAPTER 10. Contents. Markov Model Taggers Hidden Markov Model Taggers Transformation-Based Learning of Tags Tagging Accuracy and Uses of Taggers. Markov Model Taggers. Markov properties Limited horizon Time invariant
E N D
Part-of-Speech Tagging Foundation of Statistical NLP CHAPTER 10
Contents • Markov Model Taggers • Hidden Markov Model Taggers • Transformation-Based Learning of Tags • Tagging Accuracy and Uses of Taggers
Markov Model Taggers • Markov properties • Limited horizon • Time invariant cf. Wh-extraction (Chomsky) a. Should Peter buy a book? b. Which book should Peter buy?
Markov Model Taggers • The probabilistic model • Finding the best tagging t1,n for a sentence w1,n ex:P(AT NN BEZ IN AT VB | The bear is on the move)
assumtion • words are independent of each other • a word’s identity only depends on its tag
Markov Model Taggers • Training for all tags t jdo for all tags tkdo end end for all tags t jdo for all words wldo end end
Markov Model Taggers • Tagging (the Viterbi algorithm)
Variations • The models for unknown words 1. assuming that they can be any part of speech 2. using morphological to make inferences about a possible parts of speech
Variation • Trigram taggers • Interpolation • Variable Memory Markov Model (VMMM)
Variation • Smoothing • Reversibility Kl: the number of possible parts of speech of wl
Variation • Sequence vs. tag by tag Time flies like an arrow. a. NN VBZ RB AT NN. P(.) = 0.01 b. NN NNS VB AT NN. P(.) = 0.01 • there is no large difference in accuracy between maximizing the sequence and tag
Hidden Markov Model Taggers When we have no tagged training data • Initializing all parameters with the dictionary information • Jelinek’s method • Kupiec’s method
Hidden Markov Model Taggers • Jelinek’s method • initializing the HMM with the MLE for P(wk|ti) • assuming that words occur equally likely with each of their possible tags. T(wj): the number of tags allowed for wj
Hidden Markov Model Taggers • Kupiec’s method • grouping all words with the same possible parts of speech into ‘metawords’ uL • not to fine-tune parameters for each word
Hidden Markov Model Taggers • Training • after initialization, the HMM is trained using the Forward-Backward algorithm • Tagging • equal to VMM ! the difference between VMM tagging and HMM tagging is in how we train the model, not in how we tag.
The effect of initialization on HMM overtrainingproblem D0 maximum likelihood estimates from a tagged training corpus D1 correct ordering only of lexical probabilities D2 lexical probabilities proportional to overall tag probabilities D3 equal lexical probabilities for all tags admissible for a word T0 maximum likelihood estimates from a tagged training corpus T1 equal probabilities for all transitions Hidden Markov Model Taggers
Use Visible Markov Model • a sufficiently large training text • similar to the intended text of application • Run Forward-Backward for a few iterations • no training text • training and test text are very different • but at least some lexical information • Run Forward-Backward for a larger number of iterations • no lexical information