html5-img
1 / 26

Part-of-Speech Tagging and Chunking with Maximum Entropy Model

Part-of-Speech Tagging and Chunking with Maximum Entropy Model. Sandipan Dandapat Department of Computer Science & Engineering Indian Institute of Technology Kharagpur. Goal. Lexical Analysis Part-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb...

akiva
Download Presentation

Part-of-Speech Tagging and Chunking with Maximum Entropy Model

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat Department of Computer Science & Engineering Indian Institute of Technology Kharagpur

  2. Goal • Lexical Analysis • Part-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb... • Syntactic Analysis • Chunking: Identify and label phrases as verb phrase and noun phrase etc.

  3. Machine Learning to Resolve POS Tagging and Chunking • HMM • Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) • Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.) • Maximum Entropy (Ratnaparkhi,96; etc.) • TB(ED)L (Brill,92,94,95; etc.) • Decision Tree (Black,92; Marquez,97; etc.)

  4. Our Approach • Maximum Entropy based • Diverse and overlapping features • Language Independence • Reasonably good accuracy • Data intensive • Absence of sequence information

  5. POS Tagging Schema Language Model Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

  6. POS Tagging: Our Approach ME Model ME Model: Current state depends on history (features) Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

  7. POS Tagging: Our Approach ME Model ME Model: Current state depends on history (features) Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

  8. POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer ME Model ti  {T} or ti  TMA(wi) Raw text Disambiguation Algorithm Tagged text … POS tagging

  9. POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer ME Model ti  {T} or ti  TMA(wi) Raw text Beam Search Tagged text … POS tagging

  10. Disambiguation Algorithm Text: Tags: Where, ti{T} , wi{T} = Set of tags

  11. Disambiguation Algorithm Text: Tags: Where, ti TMA(wi), wi{T} = Set of tags

  12. What are Features? • Feature function • Binary function of the history and target Example,

  13. i-3 W1 T1 i-2 i-1 i i+1 i+2 i+3 T2 T3 W1 W2 W3 W4 W4 T4 T5 T6 T7 T4 POS Tagging Features pos word POS_Tag Feature Set Estimated Tag • 40 different experiments were conducted taking several combination from set ‘F’

  14. i-3 W1 T1 T2 W2 i-2 i-1 i i+1 i+2 i+3 T3 T3 W3 W3 W4 T4 T5 T6 T7 W6 W7 POS Tagging Features pos word POS_Tag Feature Set Estimated Tag

  15. Chunking Features pos word POS_Tag Chunk_Tag i-3 W1 T1 C1 W2 W3 i-2 i-1 i i+1 i+2 i+3 C2 C3 T2 T3 T4 T5 T6 Feature Set W4 Estimated Tag C4 C5 C6 C7 W5 W6 W7 T7

  16. Experiments: POS tagging • Baseline Model • Maximum Entropy Model • ME (Bengali, Hindi and Telugu) • ME + IMA ( Bengali) • ME + CMA (Bengali) • Data Used

  17. Tagset and Corpus Ambiguity • Tagset consists of 27 grammatical classes • Corpus Ambiguity • Mean number of possible tags for each word • Measured in the training tagged data (Dermatas et al 1995)

  18. POS Tagging Results on Development Set Overall Accuracy

  19. POS Tagging Results on Development Set Known Words Unknown Words Overall Accuracy

  20. POS Tagging Results - Bengali

  21. Results on Development set

  22. Chunking Results • Two different measures • Per word basis • Per chunk basis  Correctly identified groups along with correctly labeled groups

  23. Assessment of Error Types Bengali Hindi Telugu

  24. Results on Test Set • Bengali data has been tagged using ME+IMA model • Hindi and Telugu data has been tagged with simple ME model • Chunk Accuracy has been measured per word basis

  25. Conclusion and Future Scope • Morphological restriction on tags gives an efficient tagging model even when small labeled text is available • The performance of Hindi and Telugu can be improved using the morphological analyzer of the languages • Linguistic prefix and suffix information can be adopted • More features can be explored for chunking

  26. Thank You

More Related