1 / 27

עיבוד שפות טבעיות - שיעור חמישי Hidden Markov Models

עיבוד שפות טבעיות - שיעור חמישי Hidden Markov Models. אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן. Stochastic POS Tagging. POS tagging: For a given sentence W = w 1 …w n Find the matching POS tags T = t 1 …t n In a statistical framework: T' = arg max P(T|W) T. Bayes’ Rule.

morton
Download Presentation

עיבוד שפות טבעיות - שיעור חמישי Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. עיבוד שפות טבעיות - שיעור חמישיHidden Markov Models אורן גליקמן המחלקה למדעי המחשב אוניברסיטת בר אילן 88-680

  2. Stochastic POS Tagging • POS tagging:For a given sentence W = w1…wnFind the matching POS tags T = t1…tn • In a statistical framework:T' = arg max P(T|W) T 88-680

  3. Bayes’ Rule Words are independent of each other A words presence only depends on its tag Markovian assumptions 88-680

  4. The Markovian assumptions • Limited Horizon • P(Xi+1 = tk |X1,…,Xi) = P(Xi+1 = tk | Xi) • Time invariant • P(Xi+1 = tk | Xi) = P(Xj+1 = tk | Xj) 88-680

  5. Maximum Likelihood Estimations • In order to estimate P(wi|ti), P(ti|ti-1)we can use the maximum likelihood estimation • P(wi|ti) = c(wi,ti) / c(ti) • P(ti|ti-1) = c(ti-1ti) / c(ti-1) 88-680

  6. Viterbi • Finding the most probable tag sequence can be done with the viterbi algorithm. • No need to calculate every single possible tag sequence (!) 88-680

  7. Hmms • Assume a state machine with • Nodes that correspond to tags • A start and end state • Arcs corresponding to transition probabilities - P(ti|ti-1) • A set of observations likelihoods for each state - P(wi|ti) 88-680

  8. P(likes)=0.3P(flies)=0.1…P(eats)=0.5 P(like)=0.2P(fly)=0.3…P(eat)=0.36 VBZ RB VB NN 0.6 P(the)=0.4P(a)=0.3P(an)=0.2… NNS AT 0.4 88-680

  9. HMMs • An HMM is similar to an Automata augmented with probabilities • Note that the states in an HMM do not correspond to the input symbols. • The input symbols don’t uniquely determine the next state. 88-680

  10. HMM definition • HMM=(S,K,A,B) • Set of states S={s1,…sn} • Output alphabet K={k1,…kn} • State transition probabilities A={aij} i,jS • Symbol emission probabilities B=b(i,k) iS,kK • start and end states (Non emitting) • Note: for a given i- aij=1 & b(i,k)=1 88-680

  11. Why Hidden • Because we only observe the input - the underlying states are hidden 88-680

  12. Decoding • The problem of part-of-speech tagging can be viewed as a decoding problem: Given an observation sequence W=w1,…,wn find a state sequence T=t1,…,tn that best explains the observation. 88-680

  13. Viterbi • A dynamic programming algorithm: • For every state j in the HMM j(i) = the probability of the best path that leads to node j given observation o1,…,oi • For every state j in the HMM j(i) = back-pointers… 88-680

  14. Viterbi… • Initialization: • j(0) = j • Induction: • j(i+1) = maxk k(i) akj b(j,oi+1) • Set j(i+1) accordingly • Termination: • Backtrace from end state using  88-680

  15. A*, N-best decoding • Sometimes one wants not just the best state sequence for a given input but rather the top – n best sequences.e.g. as input for a different model • A* / stack decoding is an alternative to viterbi. 88-680

  16. Finding the probability of an observation • Given an HMM how do we efficiently compute how likely a certain observation is? • Why would we want this? • For speech decoding, language modeling • Not trivial because the observation can result from different paths. 88-680

  17. Naïve approach 88-680

  18. The forward algorithm • A dynamic programming algorithm similar to the viterbi can be applied to efficiently calculate the probability of a given observation. • The algorithm can work forward from beginning of the observation or backward from its end. 88-680

  19. Up from bigrams • The POS tagging model we described used an history of just the previous tag: P(ti|t1,…,ti-1) = P(ti|ti-1) i.e. a First Order Markovian Assumption • In this case each state in the HMM corresponds to a POS tag • One can build an HMM for POS trigrams P(ti|t1,…,ti-1) = P(ti|ti-2,ti-1) 88-680

  20. POS Trigram HMM Model • More accurate then a bigram model • He clearly marked • is clearly marked • In such a model the HMM states do NOT correspond to POS tags. • Why not 4-grams? • Too many states, not enough data! 88-680

  21. Question • Is the HMM based tagging a supervised algorithm? • Yes, because we need a tagged corpus to estimate the transition and emission probabilities (!) • What do we do if we don’t have an annotated corpus but, • Have a dictionary • Have an annotated corpus from a different domain and an un-annotated corpus in desired domain. 88-680

  22. Baum-Welch Algorithm • also known as the Forward-Backward Algorithm • An EM algorithm for HMMs. • Maximization by Iterative hill climbing • The algorithm iteratively improves the model parameters based on un-annotated training data. 88-680

  23. Baum-Welch Algorithm… • Start of with parameters based on the dictionary: • P(w|t) = 1 if t is possible tag for w • P(w|t) = 0 otherwise • Uniform distribution on state transitions • This is enough to bootstrap from. • Could also be used to tune a system to a new domain. 88-680

  24. Unknown Words • Many words will not appear in the training corpus. • Unknown words are a major problem for taggers (!) • Solutions – • Incorporate Morphological Analysis • Consider words appearing once in training data as UNKOWNs 88-680

  25. Completely unsupervised • What if there is no dictionary and no annotated corpus? 88-680

  26. Evaluation 88-680

  27. Homework 88-680

More Related