CS621: Artificial Intelligence. Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 36,37–Part of Speech Tagging and HMM 21 st and 25 th Oct, 2010 (forward, backward computation and Baum Welch Algorithm will be done later). Part of Speech Tagging.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Pushpak BhattacharyyaCSE Dept., IIT Bombay
Lecture 36,37–Part of Speech Tagging and HMM
21st and 25th Oct, 2010
(forward, backward computation and Baum Welch Algorithm will be done later)
A man returns to his parked car and finds the sticker “Parking fine”. He goes and thaks the policeman for appreiating his parking skill. fine_adverb vs. fine_noun
Example from Indian Language
(one who gets love can give love)
(the love that you are image exits, is impossible in this world)
DEM vs. PRON cannot be disambiguated
At the level of the POS tagger
Cannot assume parsing
Cannot assume semantics
Best tag sequence
= argmax P(T|W)
= argmax P(T)P(W|T) (by Baye’s Theorem)
P(T) = P(t0=^ t1t2 … tn+1=.)
= P(t0)P(t1|t0)P(t2|t1t0)P(t3|t2t1t0) …
= P(t0)P(t1|t0)P(t2|t1) … P(tn|tn-1)P(tn+1|tn)
= P(ti|ti-1) Bigram Assumption
i = 0
P(W|T) = P(w0|t0-tn+1)P(w1|w0t0-tn+1)P(w2|w1w0t0-tn+1) …
Assumption: A word is determined completely by its tag. This is inspired by speech recognition
= P(wo|to)P(w1|t1) … P(wn+1|tn+1)
= P(wi|ti) (Lexical Probability Assumption)
i = 0
i = 1
This model is called Generative model.
Here words are observed from tags as states.
This is similar to HMM.