1 / 20

Hidden Markov Models

Hidden Markov Models. Fundamentals and applications to bioinformatics. Markov Chains. Given a finite discrete set S of possible states, a Markov chain process occupies one of these states at each unit of time. The process either stays in the same state or moves to some other state in S.

Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models Fundamentals and applications to bioinformatics.

  2. Markov Chains • Given a finite discrete set S of possible states, a Markov chain process occupies one of these states at each unit of time. • The process either stays in the same state or moves to some other state in S. • This occurs in a stochastic way, rather than in a deterministic one. • The process is memoryless and time homogeneous.

  3. 1 S1 S3 S2 1/3 2/3 1/3 1/2 1/6 Transition Matrix • Let S={S1, S2, S3}. A Markov Chain is described by a table of transition probabilities such as the following:

  4. A simple example • Consider a 3-state Markov model of the weather. We assume that once a day the weather is observed as being one of the following: rainy or snowy, cloudy, sunny. • We postulate that on day t, weather is characterized by a single one of the three states above, and give ourselves a transition probability matrix A given by:

  5. - 2 - • Given that the weather on day 1 is sunny, what is the probability that the weather for the next 7 days will be “sun-sun-rain-rain-sun-cloudy-sun”?

  6. - 3 - • Given that the model is in a known state, what is the probability it stays in that state for exactly d days? • The answer is • Thus the expected number of consecutive days in the same state is • So the expected number of consecutive sunny days, according to the model is 5.

  7. Elements of an HMM • What if each state does not correspond to an observable (physical) event? What if the observation is a probabilistic function of the state? An HMM is characterized by the following: • N, the number of states in the model. • M, the number of distinct observation symbols per state. • the state transition probability distribution where • the observation symbol probability distribution in state qj, , where bj(k) is the probability that the k-th observation symbol pops up at time t, given that the model is in state Ej. • the initial state distribution

  8. Three Basic Problems for HMMs • Given the observation sequence O = O1O2O3…Ot, and a model m = (A, B, p), how do we efficiently compute P(O | m)? • Given the observation sequence O and a model m, how do we choose a corresponding state sequence Q = q1q2q3…qt which is optimal in some meaningful sense? • How do we adjust the model parameters to maximize P(O | m)?

  9. Solution to Problem (1) • Given an observed output sequence O, we have that • This calculation involves the sum of NT multiplications, each being a multiplication of 2T terms. The total number of operations is on the order of 2T NT. • Fortunately, there is a much more efficient algorithm, called the forward algorithm.

  10. The Forward Algorithm • It focuses on the calculation of the quantity which is the joint probability that the sequence of observations seen up to and including time t is O1,…,Ot, and that the state of the HMM at time t is Ei. Once these quantities are known,

  11. …continuation • The calculation of the (t, i)’s is by induction on t. From the formula we get

  12. Backward Algorithm • Another approach is the backward algorithm. • Specifically, we calculate (t, i) by the formula • Again, by induction one can find the (t,i)’s starting with the value t = T – 1, then for the value t = T – 2, and so on, eventually working back to t = 1.

  13. Solution to Problem (2) • Given an observed sequence O = O1,…,OT of outputs, we want to compute efficiently a state sequence Q = q1,…,qT that has the highest conditional probability given O. • In other words, we want to find a Q that makes P[Q | O] maximal. • There may be many Q’s that make P[Q | O] maximal. We give an algorithm to find one of them.

  14. The Viterbi Algorithm • It is divided in two steps. First it finds maxQ P[Q | O], and then it backtracks to find a Q that realizes this maximum. • First define, for arbitrary t and i, (t,i) to be the maximum probability of all ways to end in state Si at time t and have observed sequence O1O2…Ot. • Then maxQ P[Q and O] = maxi(T,i)

  15. - 2 - • But • Since the denominator on the RHS does not depend on Q, we have • We calculate the (t,i)’s inductively.

  16. - 3 - • Finally, we recover the qi’s as follows. Define and put: • This is the last state in the state sequence desired. The remaining qt for t < T are found recursively by defining and putting

  17. Solution to Problem (3) • We are given a set of observed data from an HMM for which the topology is known. We wish to estimate the parameters in that HMM. • We briefly describe the intuition behind the Baum-Welch method of parameter estimation. • Assume that the alphabet M and the number of states N is fixed at the outset. • The data we use to estimate the parameters constitute a set of observed sequences {O(d)}.

  18. The Baum-Welch Algorithm • We start by setting the parameters pi, aij, bi(k) at some initial values. • We then calculate, using these initial parameter values: • pi* = the expected proportion of times in state Si at the first time point, given {O(d)}.

  19. - 2 - 2) 3) where Nij is the random number of times qt(d) =Si and qt+1(d) = Sj for some d and t; Ni is the random number of times qt(d) = Si for some d and t; and Ni(k) equals the random number of times qt(d) = Si and it emits symbol k, for some d and t.

  20. Upshot • It can be shown that if  = (pi, ajk, bi(k)) is substituted by * = (pi*, ajk*, bi*(k)) then P[{O(d)}| *] P[{O(d)}| ], with equality holding if and only if * = . • Thus successive iterations continually increase the probability of the data, given the model. Iterations continue until a local maximum of the probability is reached.

More Related