1 / 18

Three Basic Problems

Three Basic Problems. Compute the probability of a text: P m (W 1, N ) Compute maximum probability tag sequence: arg max T 1, N P m (T 1, N | W 1, N ) Compute maximum likelihood model arg max m P m (W 1, N ). Notation. a ij = Estimate of P(t i t j )

Download Presentation

Three Basic Problems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Three Basic Problems • Compute the probability of a text: Pm(W1,N) • Compute maximum probability tag sequence: arg maxT1,N Pm(T1,N | W1,N) • Compute maximum likelihood model arg maxm Pm(W1,N)

  2. Notation • aij = Estimate of P(titj) • bjk = Estimate of P(wk | tj) • Ak(i) = P(w1,k-1, tk=ti) (from Forward algorithm) • Bk(i) = P(wk+1,N | tk=ti) (from Backwards algorithm)

  3. EM Algorithm(Estimation-Maximization) • Start with some initial model • Compute the most likely states for each output symbol from the current model • Use this tagging to revise the model, increasing the probability of the most likely transitions and outputs • Repeat until convergence Note: No labeled training required!

  4. Estimating transition probabilities Define pk(i,j) as prob. of traversing arc titj at time k given the observations: pk(i,j) = P(tk = ti, tk+1 = tj | W, m) = P(tk = ti, tk+1 = tj,W | m) / P(W | m) = =

  5. Expected transitions • Define gi(k) = P(tk = ti | W, m), then: gi(k) = • Now note that: • Expected number of transitions from tag i = • Expected transitions from tag i to tag j =

  6. Reestimation • a’ij = = • b’k = =

  7. EM Algorithm Outline • Choose initial model = <a,b,g(1)> • Repeat until results don’t improve much: • Compute pt using based on current model and Forward & Backwards algorithms to compute a and b (Estimation) • Compute new model <a’,b’,g’(1)> (Maximization) Note: Only guarantees a local maximum!

  8. Example • Tags: a, b • Words: x, y, z • z can only be tagged b • Text: x y z z y

  9. Some extensions for HMM POS tagging • Higher-order models: P(ti1,…,tin tj) • Incorporating text features: • Output prob = P(wi,fj| tk) where f is a vector of features (capitalized, ends in –d, etc.) • Combining labeled and unlabeled training (initialize with labeled then do EM)

  10. Transformational Tagging • Introduced by Brill (1995) • Tagger: • Construct initial tag sequence for input • Iteratively refine tag sequence by applying “transformation rules” in rank order • Learner: • Construct initial tag sequence • Loop until done: • Try all possible rules, apply the best rule r* to the sequence and add it to the rule ranking

  11. Unannotated Input Text Setting Initial State Ground Truth for Input Text Annotated Text Learning Algorithm Rules

  12. Learning Algorithm • May assign tag X to word w only if: • w occurred in the corpus with tag X, or • w did not occur in the corpus at all • Try to find best transformation from some tag X to some other tag Y • Greedy algorithm: Choose next the rule that maximizes accuracy on the training set

  13. Transformation Template Change tag A to tag B when: • The preceding (following) tag is Z • The tag two before (after) is Z • One of the two previous (following) tags is Z • One of the three previous (following) tags is Z • The preceding tag is Z and the following is W • The preceding (following) tag is Z and the tag two before (after) is W

  14. Initial tag annotation • while transformations can be found, do: • for each from_tag, do: • for each to_tag, do: • for pos 1 to corpus_size, do: • if (correct_tag(pos) = to_tag && tag(pos) = from_tag) then num_good_trans(tag(pos – 1))++ • else if (correct_tag(pos) = from_tag && tag(pos) = from_tag) then num_bad_trans(tag(pos – 1))++ • find maxT (num_good_trans(T) – num_bad_trans(T)) • if this is the best score so far, store as best rule: Change from_tag to to_tag if previous tag is T • Apply best rule to training corpus • Append best rule to ordered list of transformations

  15. Some examples 1. Change NN to VB if previous is TO • to/TO conflict/NN with  VB 2. Change VBP to VB if MD in previous three • might/MD vanish/VBP VB 3. Change NN to VB if MD in previous two • might/MD reply/NN VB 4. Change VB to NN if DT in previous two • might/MD the/DT reply/VB  NN

  16. Lexicalization New templates to include dependency on surrounding words (not just tags): Change tag A to tag B when: • The preceding (following) word is w • The word two before (after) is w • One of the two preceding (following) words is w • The current word is w • The current word is w and the preceding (following) word is v • The current word is w and the preceding (following) tag is X • etc…

  17. Initializing Unseen Words • How to choose most likely tag for unseen words? Transformation based approach: • Start with NP for capitalized words, NN for others • Learn transformations from: Change tag from X to Y if: • Deleting prefix (suffix) x results in a known word • The first (last) characters of the word are x • Adding x as a prefix (suffix) results in a known word • Word W ever appears immediately before (after) the word • Character Z appears in the word

  18. Morphological Richness • Parts of speech really include features: • NN2  Noun(type=common,num=plural) This is more visible in other languages with richer morphology: • Hebrew nouns: number, gender, possession • German nouns: number, gender, case, ??? • And so on…

More Related