1 / 23

עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms. עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן. Supervised Learning Scheme. “Labeled” Examples. Training Algorithm. Classification Model. New Examples. Classification Algorithm. Classifications.

dulcea
Download Presentation

עיבוד שפות טבעיות - שיעור חמישי POS Tagging Algorithms

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. עיבוד שפות טבעיות - שיעור חמישיPOS Tagging Algorithms עידו דגן המחלקה למדעי המחשב אוניברסיטת בר אילן 88-680

  2. Supervised Learning Scheme “Labeled” Examples Training Algorithm Classification Model New Examples Classification Algorithm Classifications 88-680

  3. Transformational Based Learning (TBL) for Tagging • Introduced by Brill (1995) • Can exploit a wider range of lexical and syntactic regularities via transformation rules – triggering environment and rewrite rule • Tagger: • Construct initial tag sequence for input – most frequent tag for each word • Iteratively refine tag sequence by applying “transformation rules” in rank order • Learner: • Construct initial tag sequence for the training corpus • Loop until done: • Try all possible rules and compare to known tags, apply the best rule r* to the sequence and add it to the rule ranking 88-680

  4. Some examples 1. Change NN to VB if previous is TO • to/TO conflict/NN with  VB 2. Change VBP to VB if MD in previous three • might/MD vanish/VBP VB 3. Change NN to VB if MD in previous two • might/MD reply/NN VB 4. Change VB to NN if DT in previous two • the/DT reply/VB  NN 88-680

  5. Transformation Templates Specify which transformations are possible For example: change tag A to tag B when: • The preceding (following) tag is Z • The tag two before (after) is Z • One of the two previous (following) tags is Z • One of the three previous (following) tags is Z • The preceding tag is Z and the following is W • The preceding (following) tag is Z and the tag two before (after) is W 88-680

  6. Lexicalization New templates to include dependency on surrounding words (not just tags): Change tag A to tag B when: • The preceding (following) word is w • The word two before (after) is w • One of the two preceding (following) words is w • The current word is w • The current word is w and the preceding (following) word is v • The current word is w and the preceding (following) tag is X (Notice: word-tag combination) • etc… 88-680

  7. Initializing Unseen Words • How to choose most likely tag for unseen words? Transformation based approach: • Start with NP for capitalized words, NN for others • Learn “morphological” transformations from: Change tag from X to Y if: • Deleting prefix (suffix) x results in a known word • The first (last) characters of the word are x • Adding x as a prefix (suffix) results in a known word • Word W ever appears immediately before (after) the word • Character Z appears in the word 88-680

  8. TBL Learning Scheme Unannotated Input Text Setting Initial State Ground Truth for Input Text Annotated Text Learning Algorithm Rules 88-680

  9. Greedy Learning Algorithm • Initial tagging of training corpus – most frequent tag per word • At each iteration: • Identify rules that fix errors and compute “error reduction” for each transformation rule: • #errors fixed - #errors introduced • Find best rule; If error reduction greater than a threshold (to avoid overfitting): • Apply best rule to training corpus • Append best rule to ordered list of transformations 88-680

  10. Stochastic POS Tagging • POS tagging:For a given sentence W = w1…wnFind the matching POS tags T = t1…tn • In a statistical framework:T' = arg max P(T|W) T 88-680

  11. Bayes’ Rule Denominator doesn’t depend on tags Words are independent of each other A word’s identity depends only on its own tag Chaining rule Markovian assumptions Notation: P(t1) = P(t1 | t0) 88-680

  12. The Markovian assumptions • Limited Horizon • P(Xi+1 = tk |X1,…,Xi) = P(Xi+1 = tk | Xi) • Time invariant • P(Xi+1 = tk | Xi) = P(Xj+1 = tk | Xj) 88-680

  13. Maximum Likelihood Estimations • In order to estimate P(wi|ti), P(ti|ti-1)we can use the maximum likelihood estimation • P(wi|ti) = c(wi,ti) / c(ti) • P(ti|ti-1) = c(ti-1ti) / c(ti-1) • Notice estimation for i=1 88-680

  14. Unknown Words • Many words will not appear in the training corpus. • Unknown words are a major problem for taggers (!) • Solutions – • Incorporate Morphological Analysis • Consider words appearing once in training data as UNKOWNs 88-680

  15. “Add-1/Add-Constant” Smoothing 88-680

  16. Smoothing for Tagging • For P(ti|ti-1) • Optionally – for P(ti|ti-1) 88-680

  17. Viterbi • Finding the most probable tag sequence can be done with the viterbi algorithm. • No need to calculate every single possible tag sequence (!) 88-680

  18. Hmms • Assume a state machine with • Nodes that correspond to tags • A start and end state • Arcs corresponding to transition probabilities - P(ti|ti-1) • A set of observations likelihoods for each state - P(wi|ti) 88-680

  19. P(likes)=0.3P(flies)=0.1…P(eats)=0.5 P(like)=0.2P(fly)=0.3…P(eat)=0.36 VBZ RB VB NN P(the)=0.4P(a)=0.3P(an)=0.2… 0.6 NNS AT 0.4 88-680

  20. HMMs • An HMM is similar to an Automata augmented with probabilities • Note that the states in an HMM do not correspond to the input symbols. • The input symbols don’t uniquely determine the next state. 88-680

  21. HMM definition • HMM=(S,K,A,B) • Set of states S={s1,…sn} • Output alphabet K={k1,…kn} • State transition probabilities A={aij} i,jS • Symbol emission probabilities B=b(i,k) iS,kK • start and end states (Non emitting) • Alternatively: initial state probabilities • Note: for a given i- aij=1 & b(i,k)=1 88-680

  22. Why Hidden? • Because we only observe the input - the underlying states are hidden • Decoding:The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w1,…,wn find a state sequence T=t1,…,tn that best explains the observation. 88-680

  23. Homework 88-680

More Related