1 / 27

Hidden Markov Models

1. 2. 2. 1. 1. 1. 1. …. 2. 2. 2. 2. …. K. …. …. …. …. x 1. K. K. K. K. x 2. x 3. x K. …. Hidden Markov Models. 1. 1. 1. 1. …. 2. 2. 2. 2. …. …. …. …. …. K. K. K. K. …. Generating a sequence by the model.

sadie
Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 1 2 2 1 1 1 1 … 2 2 2 2 … K … … … … x1 K K K K x2 x3 xK … Hidden Markov Models

  2. 1 1 1 1 … 2 2 2 2 … … … … … K K K K … Generating a sequence by the model Given a HMM, we can generate a sequence of length n as follows: • Start at state 1 according to prob a01 • Emit letter x1 according to prob e1(x1) • Go to state 2 according to prob a12 • … until emitting xn 1 a02 2 2 0 K e2(x1) x1 x2 x3 xn

  3. Evaluation We will develop algorithms that allow us to compute: P(x) Probability of x given the model P(xi…xj) Probability of a substring of x given the model P(i = k | x) “Posterior” probability that the ith state is k, given x A more refined measure of which states x may be in

  4. The Forward Algorithm fk(i) = P(x1…xi, i = k) (the forward probability) Initialization: f0(0) = 1 fk(0) = 0, for all k > 0 Iteration: fk(i) = ek(xi) l fl(i – 1) alk Termination: P(x) = k fk(N)

  5. Motivation for the Backward Algorithm We want to compute P(i = k | x), the probability distribution on the ith position, given x We start by computing P(i = k, x) = P(x1…xi, i = k, xi+1…xN) = P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k) = P(x1…xi, i = k) P(xi+1…xN | i = k) Then, P(i = k | x) = P(i = k, x) / P(x) Forward, fk(i) Backward, bk(i)

  6. The Backward Algorithm – derivation Define the backward probability: bk(i) = P(xi+1…xN | i = k) “starting from ith state = k, generate rest of x” = i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k) = li+1…N P(xi+1,xi+2, …, xN, i+1 = l, i+2, …, N | i = k) = l el(xi+1) akli+1…N P(xi+2, …, xN, i+2, …, N | i+1 = l) = l el(xi+1) aklbl(i+1)

  7. The Backward Algorithm We can compute bk(i) for all k, i, using dynamic programming Initialization: bk(N) = 1, for all k Iteration: bk(i) = l el(xi+1) akl bl(i+1) Termination: P(x) = l a0l el(x1) bl(1)

  8. Computational Complexity What is the running time, and space required, for Forward, and Backward? Time: O(K2N) Space: O(KN) Useful implementation technique to avoid underflows Viterbi: sum of logs Forward/Backward: rescaling at each few positions by multiplying by a constant

  9. Posterior Decoding P(i = k | x) = P(i = k , x)/P(x) = P(x1, …, xi, i = k, xi+1, … xn) / P(x) = P(x1, …, xi, i = k) P(xi+1, … xn | i = k) / P(x) = fk(i) bk(i) / P(x) We can now calculate fk(i) bk(i) P(i = k | x) = ––––––– P(x) Then, we can ask What is the most likely state at position i of sequence x: Define ^ by Posterior Decoding: ^i = argmaxkP(i = k | x)

  10. Posterior Decoding • For each state, • Posterior Decoding gives us a curve of likelihood of state for each position • That is sometimes more informative than Viterbi path * • Posterior Decoding may give an invalid sequence of states (of prob 0) • Why?

  11. Posterior Decoding x1 x2 x3 …………………………………………… xN • P(i = k | x) = P( | x) 1(i = k) =  {:[i] = k}P( | x) State 1 P(i=l|x) l k 1() = 1, if  is true 0, otherwise

  12. Viterbi, Forward, Backward VITERBI Initialization: V0(0) = 1 Vk(0) = 0, for all k > 0 Iteration: Vl(i) = el(xi) maxkVk(i-1) akl Termination: P(x, *) = maxkVk(N) • FORWARD • Initialization: • f0(0) = 1 • fk(0) = 0, for all k > 0 • Iteration: • fl(i) = el(xi) k fk(i-1) akl • Termination: • P(x) = k fk(N) BACKWARD Initialization: bk(N) = 1, for all k Iteration: bl(i) = k el(xi+1) akl bk(i+1) Termination: P(x) = k a0k ek(x1) bk(1)

  13. Variants of HMMs

  14. Higher-order HMMs • How do we model “memory” larger than one time point? • P(i+1 = l | i = k) akl • P(i+1 = l | i = k, i -1 = j) ajkl • … • A second order HMM with K states is equivalent to a first order HMM with K2 states aHHT state HH state HT aHT(prev = H) aHT(prev = T) aHTH state H state T aHTT aTHH aTHT state TH state TT aTH(prev = H) aTH(prev = T) aTTH

  15. Similar Algorithms to 1st Order • P(i+1 = l | i = k, i -1 = j) • Vlk(i) = maxj{ Vkj(i – 1) + … } • Time? Space?

  16. Modeling the Duration of States 1-p Length distribution of region X: E[lX] = 1/(1-p) • Geometric distribution, with mean 1/(1-p) This is a significant disadvantage of HMMs Several solutions exist for modeling different length distributions X Y p q 1-q

  17. Example: exon lengths in genes

  18. Solution 1: Chain several states p 1-p X Y X X q 1-q Disadvantage: Still very inflexible lX = C + geometric with mean 1/(1-p)

  19. Solution 2: Negative binomial distribution Duration in X: m turns, where • During first m – 1 turns, exactly n – 1 arrows to next state are followed • During mth turn, an arrow to next state is followed m – 1 m – 1 P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n p p p 1 – p 1 – p 1 – p Y X(n) X(1) X(2) ……

  20. Example: genes in prokaryotes • EasyGene: Prokaryotic gene-finder Larsen TS, Krogh A • Negative binomial with n = 3

  21. Solution 3: Duration modeling Upon entering a state: • Choose duration d, according to probability distribution • Generate d letters according to emission probs • Take a transition to next state according to transition probs Disadvantage: Increase in complexity of Viterbi: Time: O(D) Space: O(1) where D = maximum duration of state F d<Df xi…xi+d-1 Pf Warning, Rabiner’s tutorial claims O(D2) & O(D) increases

  22. Viterbi with duration modeling emissions emissions Recall original iteration: Vl(i) = maxk Vk(i – 1) akl el(xi) New iteration: Vl(i) = maxk maxd=1…DlVk(i – d) Pl(d) akl j=i-d+1…iel(xj) F L d<Df d<Dl Pl Pf transitions xi…xi + d – 1 xj…xj + d – 1 Precompute cumulative values

  23. Proteins, Pair HMMs, and Alignment

  24. A state model for alignment M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J I (+1, 0) J (0, +1) -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

  25. Let’s score the transitions s(xi, yj) M (+1,+1) Alignments correspond 1-to-1 with sequences of states M, I, J s(xi, yj) s(xi, yj) -d -d I (+1, 0) J (0, +1) -e -e -AGGCTATCACCTGACCTCCAGGCCGA--TGCCC--- TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII

  26. Alignment with affine gaps – state version Dynamic Programming: M(i, j): Optimal alignment of x1…xi to y1…yjending in M I(i, j): Optimal alignment of x1…xi to y1…yj ending in I J(i, j): Optimal alignment of x1…xi to y1…yjending in J The score is additive, therefore we can apply DP recurrence formulas

  27. Alignment with affine gaps – state version Initialization: M(0,0) = 0; M(i, 0) = M(0, j) = -, for i, j > 0 I(i,0) = d + ie; J(0, j) = d + je Iteration: M(i – 1, j – 1) M(i, j) = s(xi, yj) + max I(i – 1, j – 1) J(i – 1, j – 1) e + I(i – 1, j) I(i, j) = max d + M(i – 1, j) e + J(i, j – 1) J(i, j) = max d + M(i, j – 1) Termination: Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }

More Related