1 / 15

Hidden Markov Models

Hidden Markov Models. part 2. CBB 231 / COMPSCI 261. Recall: Training Unambiguous Models. transitions. emissions. Training Ambiguous Models. The Problem: We have training sequences , but not the associated paths ( state labels ) Two Solutions:

shing
Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models part 2 CBB 231 / COMPSCI 261

  2. Recall: Training Unambiguous Models transitions emissions

  3. Training Ambiguous Models • The Problem: • We have training sequences, but not the associated paths (state labels) • Two Solutions: • 1. Viterbi Training: Start with random HMM parameters. Use Viterbi to find the most probable path for each training sequence, and then label the sequence with that path. Use labeled sequence training on the resulting set of sequences and paths. • 2. Baum-Welch Training: sum over all possible paths (rather than the single most probable one) to estimate expected counts Ai,j and Ei,k; then use the same formulas as for labeled sequence training on these expected counts:

  4. Viterbi Training training features Labeled Sequence Trainer initial submodel M final submodel M* Viterbi Decoder Sequence Labeler labeled features paths {i} new submodel M repeat n times (iterate over the training sequences) (find most probable path for this sequence) (label the sequence with that path)

  5. Recall: The Forward Algorithm F(i,k) represents the probability P(x0...xk-1, qi) that the machine emits the subsequence x0...xk-1 by any path ending in state qi—i.e., so that symbol xk-1 is emitted by state qi.

  6. The Backward Algorithm B(i,k) = probability that the machine M will emit the subsequence xk...xL-1 and then terminate, given that M is currently in state qi (which has already emitted xk-1). THEREFORE:

  7. Backward Algorithm: Pseudocode

  8. Baum-Welch: Summing over All Paths all left paths (i,k-1) all right paths F(i,k)B(i,k) = P(M emits x0...xk-1...xL-1, with xk-1 being emitted by state qi). F(i,k)Pt(qj|qi)Pe(xk|qj)B( j,k+1) = P(M emits x0...xk-1xk...xL-1 and transitions from state qi to qj at time k-1→k).

  9. Combining Forward & Backward F(i,k) = P(x0...xk-1,qi) = P(M emits x0...xk-1 by any path ending in state qi, with xk-1 emitted by qi). B(i,k) = P(xk...xL-1|qi) = P(M emits xk...xL-1 and then terminates, given that M is in state qi, which has emitted xk-1). F(i,k)B(i,k) = P(x0...xk-1,qi)P(xk...xL-1|qi)=P(x0...xL-1,qi)* F(i,k)B(i,k)/P(S) = P(qi, k-1 | S) where C(qi,k)=1 if qi=q and xi=s; otherwise 0. where C(qm,qn,k)=1 if qm=qi and qn=qj; otherwise 0. *assuming P(xk...xL-1) is conditionally independent ofP(x0...xk-1), givenqi.

  10. Baum-Welch Training compute Fwd & Bkwd DP matrices accumulate expected counts for E & A

  11. Using Logarithms in Forward & Backward (6.43) In the log-space version of these algorithms, we can replace the raw probabilities pi with their logarithmic counterparts, log pi, and apply the Equation (6.43) whenever the probabilities are to be summed. Evaluation of the elog pi-log p0 term should generally not result in numerical underflow in practice, since this term evaluates to pi/p0, which for probabilities of similar events should not deviate too far from unity. (due to Kingsbury & Rayner, 1971)

  12. Monotonic Convergence Behavior

  13. Posterior Decoding all left parses all right parses putative exon fixed path Backward algorithm Forward algorithm

  14. Posterior Decoding

  15. Summary • Training of ambiguous HMM’s can be accomplished using Viterbi training or the Baum-Welch algorithm • Viterbi training performs labeling using the single most probable path • Baum-Welch training instead estimates transition & emission events by computing expectations via Forward-Backward, by summing over all paths containing a given event • Posterior decoding can be used to estimate the probability that a given symbol or substring was generate by a particular state

More Related