1 / 23

# ????? ???? ?????? - ????? ????? POS Tagging Algorithms - PowerPoint PPT Presentation

עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms. עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן. Supervised Learning Scheme. “Labeled” Examples. Training Algorithm. Classification Model. New Examples. Classification Algorithm. Classifications.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about '????? ???? ?????? - ????? ????? POS Tagging Algorithms' - dulcea

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### עיבוד שפות טבעיות - שיעור חמישיPOS Tagging Algorithms

עידו דגן

המחלקה למדעי המחשב

אוניברסיטת בר אילן

88-680

“Labeled”

Examples

Training

Algorithm

Classification

Model

New

Examples

Classification

Algorithm

Classifications

88-680

• Introduced by Brill (1995)

• Can exploit a wider range of lexical and syntactic regularities via transformation rules – triggering environment and rewrite rule

• Tagger:

• Construct initial tag sequence for input – most frequent tag for each word

• Iteratively refine tag sequence by applying “transformation rules” in rank order

• Learner:

• Construct initial tag sequence for the training corpus

• Loop until done:

• Try all possible rules and compare to known tags, apply the best rule r* to the sequence and add it to the rule ranking

88-680

1. Change NN to VB if previous is TO

• to/TO conflict/NN with  VB

2. Change VBP to VB if MD in previous three

• might/MD vanish/VBP VB

3. Change NN to VB if MD in previous two

4. Change VB to NN if DT in previous two

88-680

Specify which transformations are possible

For example: change tag A to tag B when:

• The preceding (following) tag is Z

• The tag two before (after) is Z

• One of the two previous (following) tags is Z

• One of the three previous (following) tags is Z

• The preceding tag is Z and the following is W

• The preceding (following) tag is Z and the tag two before (after) is W

88-680

New templates to include dependency on surrounding words (not just tags):

Change tag A to tag B when:

• The preceding (following) word is w

• The word two before (after) is w

• One of the two preceding (following) words is w

• The current word is w

• The current word is w and the preceding (following) word is v

• The current word is w and the preceding (following) tag is X (Notice: word-tag combination)

• etc…

88-680

• How to choose most likely tag for unseen words?

Transformation based approach:

• Learn “morphological” transformations from:

Change tag from X to Y if:

• Deleting prefix (suffix) x results in a known word

• The first (last) characters of the word are x

• Adding x as a prefix (suffix) results in a known word

• Word W ever appears immediately before (after) the word

• Character Z appears in the word

88-680

Unannotated

Input Text

Setting Initial

State

Ground Truth for

Input Text

Annotated

Text

Learning

Algorithm

Rules

88-680

• Initial tagging of training corpus – most frequent tag per word

• At each iteration:

• Identify rules that fix errors and compute “error reduction” for each transformation rule:

• #errors fixed - #errors introduced

• Find best rule; If error reduction greater than a threshold (to avoid overfitting):

• Apply best rule to training corpus

• Append best rule to ordered list of transformations

88-680

• POS tagging:For a given sentence W = w1…wnFind the matching POS tags T = t1…tn

• In a statistical framework:T' = arg max P(T|W) T

88-680

Denominator doesn’t depend on tags

Words are independent of each other

A word’s identity depends only on its own tag

Chaining rule

Markovian assumptions

Notation: P(t1) = P(t1 | t0)

88-680

• Limited Horizon

• P(Xi+1 = tk |X1,…,Xi) = P(Xi+1 = tk | Xi)

• Time invariant

• P(Xi+1 = tk | Xi) = P(Xj+1 = tk | Xj)

88-680

• In order to estimate P(wi|ti), P(ti|ti-1)we can use the maximum likelihood estimation

• P(wi|ti) = c(wi,ti) / c(ti)

• P(ti|ti-1) = c(ti-1ti) / c(ti-1)

• Notice estimation for i=1

88-680

• Many words will not appear in the training corpus.

• Unknown words are a major problem for taggers (!)

• Solutions –

• Incorporate Morphological Analysis

• Consider words appearing once in training data as UNKOWNs

88-680

• For P(ti|ti-1)

• Optionally – for P(ti|ti-1)

88-680

• Finding the most probable tag sequence can be done with the viterbi algorithm.

• No need to calculate every single possible tag sequence (!)

88-680

• Assume a state machine with

• Nodes that correspond to tags

• A start and end state

• Arcs corresponding to transition probabilities - P(ti|ti-1)

• A set of observations likelihoods for each state - P(wi|ti)

88-680

P(likes)=0.3P(flies)=0.1…P(eats)=0.5

P(like)=0.2P(fly)=0.3…P(eat)=0.36

VBZ

RB

VB

NN

P(the)=0.4P(a)=0.3P(an)=0.2…

0.6

NNS

AT

0.4

88-680

• An HMM is similar to an Automata augmented with probabilities

• Note that the states in an HMM do not correspond to the input symbols.

• The input symbols don’t uniquely determine the next state.

88-680

• HMM=(S,K,A,B)

• Set of states S={s1,…sn}

• Output alphabet K={k1,…kn}

• State transition probabilities A={aij} i,jS

• Symbol emission probabilities B=b(i,k) iS,kK

• start and end states (Non emitting)

• Alternatively: initial state probabilities

• Note: for a given i- aij=1 & b(i,k)=1

88-680

• Because we only observe the input - the underlying states are hidden

• Decoding:The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w1,…,wn find a state sequence T=t1,…,tn that best explains the observation.

88-680

88-680