1 / 20

Part of Speech Tagging of Indian languages using Hidden Markov Model Ph. D. Seminar Report by

Part of Speech Tagging of Indian languages using Hidden Markov Model Ph. D. Seminar Report by Manish Shrivastava Roll no. 03405002 Under the guidance of Dr. Pushpak Bhattacharyya. Presentation Outline. Part of Speech Tagging Motivation Existing Taggers

gerald
Download Presentation

Part of Speech Tagging of Indian languages using Hidden Markov Model Ph. D. Seminar Report by

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part of Speech Tagging of Indian languages using Hidden Markov Model Ph. D. Seminar Report by Manish Shrivastava Roll no. 03405002 Under the guidance of Dr. Pushpak Bhattacharyya

  2. Presentation Outline • Part of Speech Tagging • Motivation • Existing Taggers • Need for Part of Speech Taggers for Indian languages • Part of Speech Tagging of Indian languages • The Morphological Perspective • Morphological Advantages • Hidden Markov Model • Conclusions • Future work

  3. Part of Speech Tagging • Is the task of assigning POS tags to words • Selecting among more than one tags that apply • Can be used for further NLP tasks • Information extraction, Question Answering etc.

  4. Example of POS tagging

  5. Motivation • Lack of significant tools for Indian languages • Dependence of other NLP activities on PoS tagging • Failure of existing techniques on Indian Languages

  6. Existing Taggers • Techniques used for foreign languages • Rule Based Tagging • Stochastic Tagging

  7. Overview of PoS tagging

  8. Existing Taggers • Rule Based Taggers • Brill tagger • Stochastic Taggers • CLAWS tagger • Tree tagger

  9. Need for a new Taggers for Hindi • The existing taggers fail on Indian languages • The grammatical structure differs • Free word structure of Hindi • Stochastic taggers cannot give good performance • Morphological Information not taken into account

  10. Example of Free word structure

  11. Part of Speech tagging of Indian Languages • To make efficient taggers • Get morphological information • Use heuristics to use morphological information

  12. Morphological Perspective • Three kind of word morphologies • Verb • Noun • Adjectives

  13. Morphological Perspective • Noun Morphology • Depicting possesion • laD,ka Possesion laD,ko ka • Depicting number • laD,ka plural laD,ko

  14. Morphological Perspective Verb Morphology Tense Kola laD,ko Kola rho hO. Kola laDko Kolato qao . Kola laD,ko Kolanaa caahto hOM.

  15. Morphological Advantage • POS tag heuristic • Noun • laD,kaoM Suffix -- oM “ aoM “ • sahoilayaaoM Suffix -- iyoN “ [yaaoM “ • Verb • pZ,U^Mgaa Suffix -- UMgA “ }^Mgaa “ • pZ,ta Suffix -- wA “ ta “

  16. Morphological Advantages • Morphological strength of Hindi helps in efficient tagging • The morphological information can be used for further tasks

  17. The Tool : Hidden Markov Model • Why HMM • Underlying events generate surface probabilities • The models can be trained using Expectation Maximization algorithm. • Easy to port to other languages

  18. Hidden Markov Model Example of a Hidden Markov Model

  19. Hidden Markov Model • The Parameters • i = initial state probabilities • aij =state transition probability • bij = probability of recognizing kth symbol in transition from i to j • Estimation • Initial estimation done with training data • Re-estimation done using Baum-Welch Re-estimation

  20. Conclusions • The Part of Speech taggers for Hindi should morphological information • To make efficient taggers we must allow use of heuristics • Hidden Markov Models can be used for portable taggers.

More Related