1 / 32

Hidden Markov Models

Hidden Markov Models. Yves Moreau Katholieke Universiteit Leuven. Regular expressions. Alignment Regular expression Problem: regular expression does not distinguish Exceptional TGCTAGG Consensus ACACATC. ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC.

alika
Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models Yves Moreau Katholieke Universiteit Leuven

  2. Regular expressions • Alignment • Regular expression • Problem: regular expression does not distinguish • Exceptional TGCTAGG • Consensus ACACATC ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC [AT][CG][AC][ACGT]*A[TG][GC]

  3. Sequence score Transition probabilities .4 A .2C .4 G .2 T .2 Emission probabilities .6 .6 A .8C 0 G 0 T .2 A 0C .8 G .2 T 0 A .8C .2 G 0 T 0 A 1C 0 G 0 T 0 A 0 C 0 G .2 T .8 A 0C .8 G .2 T 0 1.0 1.0 1.0 .4 1.0 Hidden Markov Models

  4. Log odds • Use logarithm for scaling and normalize by random model • Log odds for sequence S: -0.92 A:-0.22C: 0.47 G:-0.22 T:-0.22 -0.51 -0.51 A: 1.16 T:-0.22 C: 1.16 G:-0.22 A: 1.16 C:-0.22 A: 1.39 G:-0.22 T: 1.16 C: 1.16 G:-0.22 0 0 0 -0.92 0

  5. Log odds

  6. A T C G Markov chain • Sequence: • Example of a Markov chain • Probabilistic model of a DNA sequence Transition probabilities

  7. Markov property • Probability of a sequence through Bayes’ rule • Markov property • “The future is only function of the present and not of the past”

  8. A T a w C G Beginning and end of a sequence • Computation of the probability is not homogeneous • Length distribution is not modeled • P(length=L) unspecified • Solution • Modeling of beginning and end of the sequence • The probability to observe a sequence of a given length decreases with the length of the sequence

  9. Sequence score Transition probabilities .4 A .2C .4 G .2 T .2 Emission probabilities .6 .6 A .8C 0 G 0 T .2 A 0C .8 G .2 T 0 A .8C .2 G 0 T 0 A 1C 0 G 0 T 0 A 0 C 0 G .2 T .8 A 0C .8 G .2 T 0 1.0 1.0 1.0 .4 1.0 Hidden Markov Models

  10. Hidden Markov Model • In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (pathp) • Transition probabilities (a: a0l, w: ak0) • Emission probabilities • Joint probability of the sequence a,x1,...,xL,w and the path

  11. 0.9 0.95 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 0.05 0.1 Fair Loaded Casino (I) – problem setup • The casino uses mostly a fair die but switches sometimes to a loaded die • We observe the outcome x of the successive throws but want to know when the die was fair or loaded (pathp)

  12. Estimation of the sequence and state probabilities

  13. The Viterbi algorithm • We look for the most probable path p* • This problem can be tackled by dynamic programming • Let us define vk(i) as the probability of the most probable path that ends in state k for the emission of symbol xi • Then we can compute this probability recursively as

  14. The Viterbi algorithm • The Viterbi algorithm grows the best path dynamically • Initial condition: sequence in beginning state • Traceback pointers tot follow the best path (= decoding)

  15. Casino (II) - Viterbi

  16. The forward algorithm • The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM • This is important for the computation of posterior probabilities and the comparison of HMMs • The sum over all paths (exponentially many) can be computed by dynamic programming • Les us define fk(i) as the probability of the sequence for the paths that end in state k with the emission of symbol xi • Then we can compute this probability as

  17. The forward algorithm • The forward algorithm grows the total probability dynamically from the beginning to the end of the sequence • Initial condition: sequence in beginning state • End: all states converge to the end state

  18. The backward algorithm • The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol xi is emitted from state k • This is important to compute the probability of a given state at symbol xi • P(x1,...,xi,pi=k) can be computed by the forward algorithm fk(i) • Let us define bk(i) as the probability that the rest of the sequence for the paths that pass through state k at symbol xi

  19. The backward algorithm • The backward algorithm grows the probability bk(i) dynamically backwards (from end to beginning) • Border condition: start in end state • Once both forward and backward probabilities are available, we can compute the posterior probability of the state

  20. Posterior decoding • Instead of using the most probable path for decoding (Viterbi), we can use the path of the most probable states • The path p^ can be “illegal” (P(p^|x)=0) • This approach can also be used when we are interested in a function g(k) of the state (e.g., labeling)

  21. Casino (III) – posterior decodering • Posterior probability of the state “fair” w.r.t. the die throws

  22. 0.9 0.99 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 0.01 0.1 Fair Loaded Casino (IV) – posterior decodering • New situation : P(xi+1 = FAIR | xi = FAIR) = 0.99 • Viterbi decoding cannot detect the cheating from 1000 throws, while posterior decoding does

  23. Parameter estimation for HMMs

  24. Choice of the architecture • For the parameter estimation, we assume that the architecture of the HMM is known • Choice of architecture is an essential design choice • Duration modeling • “Silent states” for gaps

  25. Parameter estimation with known paths • HMM with parameters q (transition and emission probabilities) • Training set D of N sequences x1,...,xN • Score of the model is the likelihood of the parameters given the training data

  26. Parameter estimation with known paths • If the state paths are known, the parameters are estimated through counts (how often is a transition used, how often is a symbol produced by a given state) • Use of ‘pseudocounts’ if necessary • Akl= number of transitions from k to l in training set + pseudocount rkl • Ek(b) = number of emissions of b from k in training set + pseudocount rk(b)

  27. Parameter estimation with unknown paths: Viterbi training • Strategy: iterative method • Suppose that the parameters are known and find the best path • Use Viterbi decoding to estimate the parameters • Iterate till convergence • Viterbi training does not maximize the likelihood of the parameters • Viterbi training converges exactly in a finite number of steps

  28. Parameter estimation with unknown paths: Baum-Welch training • Strategy: parallel to Viterbi but we use the expected value for the transition and emission counts (instead of using only the best path) • For the transitions • For the emissions

  29. Parameter estimation with unknown paths: Baum-Welch training • Initialization: Choose arbitrary model parameters • Recursion: • Set all transitions and emission variables to their pseudocount • For all sequences j = 1,...,n • Compute fk(i) for sequence j with the forward algorithm • Compute bk(i) for sequence j with the backward algorithm • Add the contributions to A and E • Compute the new model parameters akl =Akl/SAkl’ and ek(b) • Compute the log-likelihood of the model • End: stop when the log-likelihood does not change more than by some threshold or when the maximum number of iterations is exceeded

  30. 0.88 0.71 0.9 0.73 0.95 0.93 1: 0.19 2: 0.19 3: 0.23 4: 0.08 5: 0.23 6: 0.08 1: 0.17 2: 0.17 3: 0.17 4: 0.17 5: 0.17 6: 0.15 1: 1/6 2: 1/6 3: 1/6 4: 1/6 5: 1/6 6: 1/6 1: 0.07 2: 0.10 3: 0.10 4: 0.17 5: 0.05 6: 0.52 1: 1/10 2: 1/10 3: 1/10 4: 1/10 5: 1/10 6: 1/2 1: 0.10 2: 0.11 3: 0.10 4: 0.11 5: 0.10 6: 0.48 0.27 0.07 0.05 0.12 0.29 0.1 Fair Fair Fair Loaded Loaded Loaded Casino (V) – Baum-Welch training Originalmodel 300 throws 30000 throws

  31. Numerical stability • Many expressions contain products of many probabilities • This causes underflow when we compute these expressions • For Viterbi, this can be solved by working with the logarithms • For the forward and backward algorithms, we can work with an approximation to the logarithm or by working with rescaled variables

  32. Summary • Hidden Markov Models • Computation of sequence and state probabilities • Viterbi computation of the best state path • The forward algorithm for the computation of the probability of a sequence • The backward algorithm for the computation of state probabilities • Parameter estimation for HMMs • Parameter estimation with known paths • Parameter estimation with unknown paths • Viterbi training • Baum-Welch training

More Related