Part-of-Speech Tagging and Chunking with Maximum Entropy Model

Part-of-Speech Tagging and Chunking with Maximum Entropy Model Sandipan Dandapat Department of Computer Science & Engineering Indian Institute of Technology Kharagpur

Goal • Lexical Analysis • Part-Of-Speech (POS) Tagging : Assigning part-of-speech to each word. e.g. Noun, Verb... • Syntactic Analysis • Chunking: Identify and label phrases as verb phrase and noun phrase etc.

Machine Learning to Resolve POS Tagging and Chunking • HMM • Supervised (DeRose,88; Mcteer,91; Brants,2000; etc.) • Semi-supervised (Cutting,92; Merialdo,94; Kupiec,92; etc.) • Maximum Entropy (Ratnaparkhi,96; etc.) • TB(ED)L (Brill,92,94,95; etc.) • Decision Tree (Black,92; Marquez,97; etc.)

Our Approach • Maximum Entropy based • Diverse and overlapping features • Language Independence • Reasonably good accuracy • Data intensive • Absence of sequence information

POS Tagging Schema Language Model Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

POS Tagging: Our Approach ME Model ME Model: Current state depends on history (features) Raw text Disambiguation Algorithm Tagged text Possible POS Class Restriction … POS tagging

POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer ME Model ti  {T} or ti  TMA(wi) Raw text Disambiguation Algorithm Tagged text … POS tagging

POS Tagging: Our Approach {T} : Set of all tags TMA(wi) : Set of tags computed by Morphological Analyzer ME Model ti  {T} or ti  TMA(wi) Raw text Beam Search Tagged text … POS tagging

Disambiguation Algorithm Text: Tags: Where, ti{T} , wi{T} = Set of tags

Disambiguation Algorithm Text: Tags: Where, ti TMA(wi), wi{T} = Set of tags

What are Features? • Feature function • Binary function of the history and target Example,

i-3 W1 T1 i-2 i-1 i i+1 i+2 i+3 T2 T3 W1 W2 W3 W4 W4 T4 T5 T6 T7 T4 POS Tagging Features pos word POS_Tag Feature Set Estimated Tag • 40 different experiments were conducted taking several combination from set ‘F’

i-3 W1 T1 T2 W2 i-2 i-1 i i+1 i+2 i+3 T3 T3 W3 W3 W4 T4 T5 T6 T7 W6 W7 POS Tagging Features pos word POS_Tag Feature Set Estimated Tag

Chunking Features pos word POS_Tag Chunk_Tag i-3 W1 T1 C1 W2 W3 i-2 i-1 i i+1 i+2 i+3 C2 C3 T2 T3 T4 T5 T6 Feature Set W4 Estimated Tag C4 C5 C6 C7 W5 W6 W7 T7

Experiments: POS tagging • Baseline Model • Maximum Entropy Model • ME (Bengali, Hindi and Telugu) • ME + IMA ( Bengali) • ME + CMA (Bengali) • Data Used

Tagset and Corpus Ambiguity • Tagset consists of 27 grammatical classes • Corpus Ambiguity • Mean number of possible tags for each word • Measured in the training tagged data (Dermatas et al 1995)

POS Tagging Results on Development Set Overall Accuracy

POS Tagging Results on Development Set Known Words Unknown Words Overall Accuracy

POS Tagging Results - Bengali

Results on Development set

Chunking Results • Two different measures • Per word basis • Per chunk basis  Correctly identified groups along with correctly labeled groups

Assessment of Error Types Bengali Hindi Telugu

Results on Test Set • Bengali data has been tagged using ME+IMA model • Hindi and Telugu data has been tagged with simple ME model • Chunk Accuracy has been measured per word basis

Conclusion and Future Scope • Morphological restriction on tags gives an efficient tagging model even when small labeled text is available • The performance of Hindi and Telugu can be improved using the morphological analyzer of the languages • Linguistic prefix and suffix information can be adopted • More features can be explored for chunking

Thank You

Part-of-Speech Tagging and Chunking with Maximum Entropy Model