1 / 49

HMM (I)

HMM (I). LING 570 Fei Xia Week 7: 11/5-11/7/07. HMM. Definition and properties of HMM Two types of HMM Three basic questions in HMM. Definition of HMM. Hidden Markov Models. There are n states s 1 , …, s n in an HMM, and the states are connected.

kylee
Download Presentation

HMM (I)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HMM (I) LING 570 Fei Xia Week 7: 11/5-11/7/07

  2. HMM • Definition and properties of HMM • Two types of HMM • Three basic questions in HMM

  3. Definition of HMM

  4. Hidden Markov Models • There are n states s1, …, sn in an HMM, and the states are connected. • The output symbols are produced by the states or edges in HMM. • An observation O=(o1, …, oT) is a sequence of output symbols. • Given an observation, we want to recover the hidden state sequence. • An example: POS tagging • States are POS tags • Output symbols are words • Given an observation (i.e., a sentence), we want to discover the tag sequence.

  5. V DT N N N time flies like an arrow Same observation, different state sequences P DT N N V time flies like an arrow

  6. Two types of HMMs • State-emission HMM (Moore machine): • The output symbol is produced by states: • By the from-state • By the to-state • Arc-emission HMM (Mealy machine): • The output symbol is produce by the edges; i.e., by the (from-state, to-state) pairs.

  7. PFA recap

  8. Formal definition of PFA A PFA is • Q: a finite set of N states • Σ: a finite set of input symbols • I: Q R+ (initial-state probabilities) • F: Q R+ (final-state probabilities) • : the transition relation between states. • P:(transition probabilities)

  9. Constraints on function: Probability of a string:

  10. b:0.8 a:1.0 q0:0 q1:0.2 An example of PFA F(q0)=0 F(q1)=0.2 I(q0)=1.0 I(q1)=0.0 P(abn)=I(q0)*P(q0,abn,q1)*F(q1) =1.0 * 1.0*0.8n *0.2

  11. Arc-emission HMM

  12. Definition of arc-emission HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • Transition prob: A={aij}. • Emission prob: B={bijk}

  13. Constraints in an arc-emission HMM For any integer n and any HMM

  14. An example: HMM structure w1 w2 w1 w1 w5 … sN s1 s2 w4 w3 Same kinds of parameters but the emission probabilities depend on both states: P(wk | si, sj)  # of Parameters: O(N2M + N2).

  15. o1 o2 on … Xn+1 X1 X2 Xn A path in an arc emission HMM • State sequence: X1,n+1 • Output sequence: O1,n

  16. PFA vs. Arc-emission HMM A PFA is • Q: a finite set of N states • Σ: a finite set of input symbols • I: Q R+ (initial-state probabilities) • F: Q R+ (final-state probabilities) • : the transition relation between states. • P:(transition probabilities) A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • Transition prob: A={aij}. • Emission prob: B={bijk}

  17. State-emission HMM

  18. Definition of state-emission HMM • A HMM is a tuple : • A set of states S={s1, s2, …, sN}. • A set of output symbols Σ={w1, …, wM}. • Initial state probabilities • Transition prob: A={aij}. • Emission prob: B={bjk} • We use si and wk to refer to what is in an HMM structure. • We use Xi and Oi to refer to what is in a particular HMM path and its output

  19. Constraints in a state-emission HMM For any integer n and any HMM

  20. An example: the HMM structure … s1 s2 sN w1 w2 w1 w3 w5 w1 • Two kinds of parameters: • Transition probability: P(sj| si) • Emission probability: P(wk | si) •  # of Parameters: O(NM+N2)

  21. X1 X2 Xn o2 on o1 Output symbols are generated by the from-states • State sequence: X1,n • Output sequence: O1,n

  22. X2 X3 X1 Xn+1 o2 on o1 Output symbols are generated by the to-states • State sequence: X1,n+1 • Output sequence: O1,n

  23. X1 X2 Xn o2 on o1 … X2 X3 X1 Xn+1 o2 on o1 A path in a state-emission HMM Output symbols are produced by the from-states: Output symbols are produced by the to-states:

  24. o1 o2 on … Xn+1 X1 X2 Xn … X2 X3 X1 Xn+1 o2 on o1 Arc-emission vs. state-emission

  25. Properties of HMM • Markov assumption (Limited horizon): • Stationary distribution (Time invariance): the probabilities do not change over time: • The states are hidden because we know the structure of the machine (i.e., S and Σ), but we don’t know which state sequences generate a particular output.

  26. Are the two types of HMMs equivalent? • For each state-emission HMM1, there is an arc-emission HMM2, such that for any sequence O, P(O|HMM1)=P(O|HMM2). • The reverse is also true. • How to prove that?

  27. Applications of HMM • N-gram POS tagging • Bigram tagger: oi is a word, and si is a POS tag. • Other tagging problems: • Word segmentation • Chunking • NE tagging • Punctuation predication • … • Other applications: ASR, ….

  28. Three HMM questions

  29. Three fundamental questions for HMMs • Training an HMM: given a set of observation sequences, learn its distribution, i.e. learn the transition and emission probabilities • HMM as a parser: Finding the best state sequence for a given observation • HMM as an LM: compute the probability of a given observation

  30. Training an HMM: estimating the probabilities • Supervised learning: • The state sequences in the training data are known • ML estimation • Unsupervised learning: • The state sequences in the training data are unknown • forward-backward algorithm

  31. HMM as a parser

  32. oT o1 o2 XT+1 … XT X1 X2 HMM as a parser: Finding the best state sequence • Given the observation O1,T=o1…oT, find the state sequence X1,T+1=X1 … XT+1 that maximizes P(X1,T+1 | O1,T).  Viterbi algorithm

  33. “time flies like an arrow” \emission N time 0.1 V time 0.1 N flies 0.1 V flies 0.2 V like 0.2 P like 0.1 DT an 0.3 N arrow 0.1 \init BOS 1.0 \transition BOS N 0.5 BOS DT 0.4 BOS V 0.1 DT N 1.0 N N 0.2 N V 0.7 N P 0.1 V DT 0.4 V N 0.4 V P 0.1 V V 0.1 P DT 0.6 P N 0.4

  34. N V P DT Finding all the paths: to build the trellis time flies like an arrow N N N N V V V V BOS P P P P DT DT DT DT

  35. Finding all the paths (cont) time flies like an arrow N N N N N V V V V V BOS P P P P P DT DT DT DT DT

  36. Viterbi algorithm The probability of the best path that produces O1,t-1 while ending up in state sj: Initialization: Induction:  Modify it to allow ²-emission

  37. Proof of the recursive function

  38. Viterbi algorithm: calculating ±j(t) # N is the number of states in the HMM structure # observ is the observation O, and leng is the length of observ. Initialize viterbi[0..leng] [0..N-1] to 0 for each state j viterbi[0] [j] = ¼[j] back-pointer[0] [j] = -1 # dummy for (t=0; t<leng; t++) for (j=0; j<N; j++) k=observ[t] # the symbol at time t viterbi[t+1] [j] = maxi viterbi[t] [i] aij bjk back-pointer[t+1] [j] = arg maxi viterbi[t] [i] aij bjk

  39. Viterbi algorithm: retrieving the best path # find the best path best_final_state = arg maxj viterbi[leng] [j] # start with the last state in the sequence j = best_final_state push(arr, j); for (t=leng; t>0; t--) i = back-pointer[t] [j] push(arr, i) j = i return reverse(arr)

  40. Hw7 and Hw8 • Hw7: write an HMM “class”: • Read HMM input file • Output HMM • Hw8: implement the algorithms for two HMM tasks: • HMM as parser: Viterbi algorithm • HMM as LM: the prob of an observation

  41. Implementation issue storing HMM Approach #1: • ¼i: pi {state_str} • aij: a {from_state_str} {to_state_str} • bjk: b {state_str} {symbol} Approach #2: • state2idx{state_str} = state_idx • symbol2idx{symbol_str} = symbol_idx • ¼i: pi [state_idx] = prob • aij: a [from_state_idx] [to_state_idx] = prob • bjk: b [state_idx] [symbol_idx] = prob • idx2state[state_idx] = state_str • Idx2symbol[symbol_idx] = symbol_str

  42. Storing HMM: sparse matrix • aij: a [i] [j] = prob • bjk: b [j] [k] = prob • aij: a[i] = “j1 p1 j2 p2 …” • aij: a[j] = “i1 p1 i2 p2 …” • bjk: b[j] = “k1 p1 k2 p2 ….” • bjk: b[k] = “j1 p1 j2 p2 …”

  43. Other implementation issues • Index starts from 0 in programming, but often starts from 1 in algorithms • The sum of logprob is used in practice to replace the product of prob. • Check constraints and print out warning if the constraints are not met.

  44. HMM as LM

  45. HMM as an LM: computing P(o1, …, oT) 1st try: - enumerate all possible paths - add the probabilities of all paths

  46. Forward probabilities • Forward probability: the probability of producing O1,t-1 while ending up in state si:

  47. Calculating forward probability Initialization: Induction:

  48. Summary • Definition: hidden states, output symbols • Properties: Markov assumption • Applications: POS-tagging, etc. • Three basic questions in HMM • Find the probability of an observation: forward probability • Find the best sequence: Viterbi algorithm • Estimate probability: MLE • Bigram POS tagger: decoding with Viterbi algorithm

More Related