Persian Part Of Speech Tagging

Persian Part Of Speech Tagging Mostafa Keikha Database Research Group (DBRG) ECE Department, University of Tehran

Decision Trees • Decision Tree (DT): • Tree where the root and each internal node is labeled with a question. • The arcs represent each possible answer to the associated question. • Each leaf node represents a prediction of a solution to the problem. • Popular technique for classification; Leaf node indicates class to which the corresponding tuple belongs.

Decision Tree Example

Decision Trees • A Decision Tree Model is a computational model consisting of three parts: • Algorithm to create the tree • Algorithm that applies the tree to data • Creation of the tree is the most difficult part. • Processing is basically a search similar to that in a binary search tree (although DT may not be binary).

Decision Tree Algorithm

Using DT in POS Tagging • Compute Ambiguity classes • Each term may have different tags • Ambiguity class for each term: set of all possible tags • compute # of occurrence for each tag in each ambiguity class

Using DT in POS Tagging • Create Decision Tree on Ambiguity classes • In each level delete tag with minimum occurrence a b c d 10 20 25 40 b c d 40 39 50 b d 60 55 b

Using DT in POS Tagging • Advantage • Easy to understand • Easy to implement • Disadvantage • Context independent

Using DT in POS Tagging • Known Tokens Results

POS tagging using HMMs Let W be a sequence of words W = w1 , w2 , … , wn Let T be the corresponding tag sequence T = t1 , t2 , … , tn Task : Find T which maximizes P ( T | W ) T’ = argmaxT P ( T | W )

POS tagging using HMMs Smoothing Transition Probabilities Sparse data problem Linear interpolation method P'(ti | ti - 2 , ti - 1) = λ1 P( ti ) + λ2 P(ti | ti - 1 ) + λ3 P(ti | ti - 2 , ti - 1) such that the s sum to 1

Calculation of λs POS tagging using HMMs

POS tagging using HMMs Emission Probability, P(W | T ) ≈ P(w1 | t1) * P(w2 | t2) * . . . * P(wn | tn) Context Dependency To make more dependent on the context the emission probability is calculated as: P(W | T ) ≈ P(w1 | $ t1) * P(w2 | t1t2) ...* P(wn | tn-1tn)

POS tagging using HMMs • Smoothing technique is applied P' (wi | ti-1ti) = θ1 P(wi | ti) + θ2 P(wi | ti-1ti) Sum of all θs is equal to 1 • θs are different for different words.

POS tagging using HMMs 1) 2) 3) 4) 5) 6)

POS tagging using HMMs

POS tagging using HMMs • Lexicon generation probability

POS tagging using HMMs

POS tagging using HMMs P(N V ART N | files like a flower) = 4.37*10-6

POS tagging using HMMs • Known Tokens Results

Unknown Tokens Results

Overall Results

Persian Part Of Speech Tagging

Persian Part Of Speech Tagging

Presentation Transcript

Part of Speech Tagging (Chapter 8)

Part of Speech (POS) Tagging

Part-of-speech tagging

Part-of-Speech Tagging

CS4705 Part of Speech tagging

Part of Speech Tagging

Part-of-Speech (POS) tagging

Distributional Part-of-Speech Tagging

Part-of-Speech Tagging

Part of Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part-of-Speech Tagging

Part of Speech Tagging

Part-of-speech Tagging

Part of Speech Tagging

Part-of-speech tagging

Part-of-Speech Tagging