Pos tagging algorithms
Download
1 / 23

????? ???? ?????? - ????? ????? POS Tagging Algorithms - PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on

עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms. עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן. Supervised Learning Scheme. “Labeled” Examples. Training Algorithm. Classification Model. New Examples. Classification Algorithm. Classifications.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '????? ???? ?????? - ????? ????? POS Tagging Algorithms' - dulcea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Pos tagging algorithms

עיבוד שפות טבעיות - שיעור חמישיPOS Tagging Algorithms

עידו דגן

המחלקה למדעי המחשב

אוניברסיטת בר אילן

88-680


Supervised learning scheme
Supervised Learning Scheme

“Labeled”

Examples

Training

Algorithm

Classification

Model

New

Examples

Classification

Algorithm

Classifications

88-680


Transformational based learning tbl for tagging
Transformational Based Learning (TBL) for Tagging

  • Introduced by Brill (1995)

  • Can exploit a wider range of lexical and syntactic regularities via transformation rules – triggering environment and rewrite rule

  • Tagger:

    • Construct initial tag sequence for input – most frequent tag for each word

    • Iteratively refine tag sequence by applying “transformation rules” in rank order

  • Learner:

    • Construct initial tag sequence for the training corpus

    • Loop until done:

      • Try all possible rules and compare to known tags, apply the best rule r* to the sequence and add it to the rule ranking

88-680


Some examples
Some examples

1. Change NN to VB if previous is TO

  • to/TO conflict/NN with  VB

    2. Change VBP to VB if MD in previous three

  • might/MD vanish/VBP VB

    3. Change NN to VB if MD in previous two

  • might/MD reply/NN VB

    4. Change VB to NN if DT in previous two

  • the/DT reply/VB  NN

88-680


Transformation templates
Transformation Templates

Specify which transformations are possible

For example: change tag A to tag B when:

  • The preceding (following) tag is Z

  • The tag two before (after) is Z

  • One of the two previous (following) tags is Z

  • One of the three previous (following) tags is Z

  • The preceding tag is Z and the following is W

  • The preceding (following) tag is Z and the tag two before (after) is W

88-680


Lexicalization
Lexicalization

New templates to include dependency on surrounding words (not just tags):

Change tag A to tag B when:

  • The preceding (following) word is w

  • The word two before (after) is w

  • One of the two preceding (following) words is w

  • The current word is w

  • The current word is w and the preceding (following) word is v

  • The current word is w and the preceding (following) tag is X (Notice: word-tag combination)

  • etc…

88-680


Initializing unseen words
Initializing Unseen Words

  • How to choose most likely tag for unseen words?

    Transformation based approach:

    • Start with NP for capitalized words, NN for others

    • Learn “morphological” transformations from:

      Change tag from X to Y if:

      • Deleting prefix (suffix) x results in a known word

      • The first (last) characters of the word are x

      • Adding x as a prefix (suffix) results in a known word

      • Word W ever appears immediately before (after) the word

      • Character Z appears in the word

88-680


Pos tagging algorithms

TBL Learning Scheme

Unannotated

Input Text

Setting Initial

State

Ground Truth for

Input Text

Annotated

Text

Learning

Algorithm

Rules

88-680


Greedy learning algorithm
Greedy Learning Algorithm

  • Initial tagging of training corpus – most frequent tag per word

  • At each iteration:

    • Identify rules that fix errors and compute “error reduction” for each transformation rule:

      • #errors fixed - #errors introduced

    • Find best rule; If error reduction greater than a threshold (to avoid overfitting):

      • Apply best rule to training corpus

      • Append best rule to ordered list of transformations

88-680


Stochastic pos tagging
Stochastic POS Tagging

  • POS tagging:For a given sentence W = w1…wnFind the matching POS tags T = t1…tn

  • In a statistical framework:T' = arg max P(T|W) T

88-680


Pos tagging algorithms

Bayes’ Rule

Denominator doesn’t depend on tags

Words are independent of each other

A word’s identity depends only on its own tag

Chaining rule

Markovian assumptions

Notation: P(t1) = P(t1 | t0)

88-680


The markovian assumptions
The Markovian assumptions

  • Limited Horizon

    • P(Xi+1 = tk |X1,…,Xi) = P(Xi+1 = tk | Xi)

  • Time invariant

    • P(Xi+1 = tk | Xi) = P(Xj+1 = tk | Xj)

88-680


Maximum likelihood estimations
Maximum Likelihood Estimations

  • In order to estimate P(wi|ti), P(ti|ti-1)we can use the maximum likelihood estimation

    • P(wi|ti) = c(wi,ti) / c(ti)

    • P(ti|ti-1) = c(ti-1ti) / c(ti-1)

      • Notice estimation for i=1

88-680


Unknown words
Unknown Words

  • Many words will not appear in the training corpus.

  • Unknown words are a major problem for taggers (!)

  • Solutions –

    • Incorporate Morphological Analysis

    • Consider words appearing once in training data as UNKOWNs

88-680



Smoothing for tagging
Smoothing for Tagging

  • For P(ti|ti-1)

  • Optionally – for P(ti|ti-1)

88-680


Viterbi
Viterbi

  • Finding the most probable tag sequence can be done with the viterbi algorithm.

  • No need to calculate every single possible tag sequence (!)

88-680


Pos tagging algorithms
Hmms

  • Assume a state machine with

    • Nodes that correspond to tags

    • A start and end state

    • Arcs corresponding to transition probabilities - P(ti|ti-1)

    • A set of observations likelihoods for each state - P(wi|ti)

88-680


Pos tagging algorithms

P(likes)=0.3P(flies)=0.1…P(eats)=0.5

P(like)=0.2P(fly)=0.3…P(eat)=0.36

VBZ

RB

VB

NN

P(the)=0.4P(a)=0.3P(an)=0.2…

0.6

NNS

AT

0.4

88-680


Pos tagging algorithms
HMMs

  • An HMM is similar to an Automata augmented with probabilities

  • Note that the states in an HMM do not correspond to the input symbols.

  • The input symbols don’t uniquely determine the next state.

88-680


Hmm definition
HMM definition

  • HMM=(S,K,A,B)

    • Set of states S={s1,…sn}

    • Output alphabet K={k1,…kn}

    • State transition probabilities A={aij} i,jS

    • Symbol emission probabilities B=b(i,k) iS,kK

    • start and end states (Non emitting)

      • Alternatively: initial state probabilities

  • Note: for a given i- aij=1 & b(i,k)=1

88-680


Why hidden
Why Hidden?

  • Because we only observe the input - the underlying states are hidden

  • Decoding:The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w1,…,wn find a state sequence T=t1,…,tn that best explains the observation.

88-680


Homework
Homework

88-680