Hidden Markov Model Lecture #6

Hidden Markov ModelLecture #6 Background Readings: Chapters 3.1, 3.2 in the text book, Biological Sequence Analysis, Durbin et al., 2001. .

Use of Markov Chains in Genome search: Modeling CpG Islands In human genomes the pair CG often transforms to (methyl-C) G which often transforms to TG. Hence the pair CG appears less than expected from what is expected from the independent frequencies of C and G alone. Due to biological reasons, this process is sometimes suppressed in short stretches of genomes such as in the start regions of many genes. These areas are called CpG islands(-C-phosphate-G-).

Example: CpG Island (Cont.) We consider two questions (and some variants): Question 1: Given a short stretch of genomic data, does it come from a CpG island ? We solved Question 1 by using two models for DNA strings: “+” model and “-” model, for strings with and without CpG islands. Each model was a Markov chain over the states {A,C,G,T} with appropriate transition probabilities.

CpG Island: Question 2 Question 2: Given a long piece of genomic data, does it contain CpG islands, and where? For solving this question, we need to decide which parts of a given long sequence of letters is more likely to come from the “+” model, and which parts are more likely to come from the “–” model.

Model for question 2 Given a long genomic string with possible CpG Islands, we define a Markov Chain over 8 states, all interconnected (hence it is ergodic): The problem is that we don’t know the sequence of states which are traversed, but just the sequence of letters. A+ C+ G+ T+ A- C- G- T- Therefore we use here Hidden Markov Model, which we define and study next.

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model HMM consists of: A Markov chain over a set of states, and for each state s and symbol x, an emission probability p(Xi=x|Si=s). Notations: Markov Chain transition probabilities: p(Si+1= t|Si= s) = mst Emission probabilities: p(Xi = b| Si = s) = es(b)

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model The probability of the chain S and the emitted letters X is :

Probability distribution defined by HMM Claim: Let M=(mst) and Es=(es(b)) be given stochastic matrices. Then for each fixed L>0, the function p defined by: is a probability distribution over all hidden Markov models of length L. That is:

Probability distribution defined by HMM Proof by induction on L. For L=1:

Probability distribution defined by HMM Induction step: Assume correctness for L, prove for L+1: 1 1 1 (by induction)

Independence properties in HMM • We would like that HMM will satisfy certain independence properties, e.g: • The distribution of states Sk is completely determined by the identity of the preceding state sk-1, • The distribution of transmitted letter Xkis completely determined by the transmitting state sk. • In the next slides we will formally prove that 2. is implied by the probability distribution of HMM which we just defined.

M M M M S1 SL Sk T T T x1 xL Xk=? Independence of emission probabilities Claim: The following equality holds: p(Xk=xk|x1,..,xk-1,xk+1,..,xL,s1,..,sk,..,sL) = esk(xk) B A

Independence of emission probabilities Proof of claim: We use the definition of conditional probability, P(A|B) = P(A,B)/P(B). Note: p(A,B) denotes p(AB). A is the event Xk=xk. (the k-th output is xk). B is the event which specifies all the sequence except Xk : (X1= x1,.., Xk-1= xk-1, Xk+1= xk+1,.., XL= xL,S1= s1, .., SL= sL). (A,B) is the event (X1= x1,.., XL= xL,S1= s1, .., SL= sL).

M M M M S1 SL Sk T T T x1 xL Xk=? Independence of emission probabilities Proof (cont)

Independence of emission probabilities Proof (end): From the previous equalities we have: = p(A,B)/esk(xk) Thus we conclude: P(A|B) = P(A,B)/P(B) = esk(xk) QED

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Independence of emission probabilities Exercise: Using the definition of conditional probability: P(A|B) = P(A,B)/P(B), prove formally that for any set of constraints B: B {X1= x1,.., Xi-1= xi-1, Xi+1= xi+1,.., XL= xL,S1= s1,..,Si= si, .., SL= sL}, such that “Si=si” B, it holds that p(Xi=xi|B) = esi(xi) Hint: express the probabilities as sum of p(S,X) over all possible S and X.

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Hidden Markov Model: three questions of interest • Given the “visible” sequence x=(x1,…,xL), find: • A most probable (hidden) path. • The probability of x. • For each i = 1,..,L, and for each state k, the probability that si=k.

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL 1. Most Probable state path Given an output sequence x = (x1,…,xL), Amost probablepaths*= (s*1,…,s*L)is one which maximizes p(s|x).

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL Most Probable path (cont.) Since we need to find swhich maximizes p(s,x) We use a DP algorithm, called Viterbi’s algorithm, which we present next.

s1 s2 si X1 X2 Xi Viterbi’s algorithm for most probable path The task: compute Let the states be {1,…,m} Idea: for i=1,…,L and for each state l, compute: vl(i) = the probability p(s1,..,si;x1,..,xi|si=l)of a most probable path up to i, which ends in state l.

Si-1 s1 l ... X1 Xi-1 Xi Viterbi’s algorithm for most probable path vl(i) = the probability p(s1,..,si;x1,..,xi|si=l)of a most probable path up to i, which ends in state l. For i = 1,…,L and for each state l we have:

Result: p(s1*,…,sL*;x1,…,xl) = Viterbi’s algorithm s1 s2 sL-1 sL si 0 X1 X2 XL-1 XL Xi We add the special initial state 0. Initialization: v0(0) = 1 , vk(0) = 0 for k > 0 For i=1 to L do for each state l : vl(i) = el(xi) MAXk {vk(i-1)mkl } ptri(l)=argmaxk{vk(i-1)mkl} [storing previous state for reconstructing the path] Termination:

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL 2. Computing p(x) Given an output sequence x = (x1,…,xL), compute the probability that this sequence was generated by the given HMM: The summation taken over all state-paths s generating x.

Forward algorithm for computing p(x) The task: compute Idea: for i=1,…,L and for each state l, compute: Fl(i) = p(x1,…,xi;si=l ), the probability of all the paths which emit (x1,..,xi) and end in state si=l. Use the recursive formula: si ? ? X1 Xi-1 Xi

Result: p(x1,…,xL) = Forward algorithm for computing p(x) s1 s2 sL-1 sL si 0 X1 X2 XL-1 XL Xi Similar to Viterbi’s algorithm (use sum instead of maximum): Initialization: f0(0) := 1 , fk(0) := 0 for k>0 For i=1 to L do for each state l : Fl(i) = el(xi) ∑kFk(i-1)mkl

M M M M S1 S2 SL-1 SL T T T T x1 x2 XL-1 xL 3. The distribution of Si, given x Given an output sequence x = (x1,…,xL), Compute for each i=1,…,l and for each state k the probability that si = k. This helps to reply queries like: what is the probability that si is in a CpG island, etc.

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Solution in two stages • For a fixed i and each state k, an algorithm to compute p(si=k | x1,…,xL). 2. An algorithm which perform this task for every i = 1,..,L, without repeating the first task L times.

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Computing for a fixed i:

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Computing for a fixed i (cont.) p(x1,…,xL,Si=l) = p(x1,…,xi, Si=l) p(xi+1,…,xL| x1,…,xi, Si=l) (by the equality p(A,B) = p(A)p(B|A ). p(x1,…,xi, Si=l)=Fl(i) , which is computed by the forward algorithm:

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi B(si): The Backward algorithm Recall: p(x1,…,xL,Si=l) = p(x1,…,xi, Si=l) p(xi+1,…,xL| x1,…,xi, Si=l) We are left with the task to compute the Backward algorithm Bl(i) ≡ p(xi+1,…,xL| x1,…,xi, Si=l), and get the desired result: p(x1,…,xL, Si=l) = p(x1,…,xi, Si=l) p(xi+1,…,xL | Si=l) ≡ Fi(l)·Bi(l)

B(si): The Backward algorithm s1 s2 Si+1 sL si X1 X2 Xi+1 XL Xi From the probability distribution of Hidden Markov Model and the definition of conditional probability: Bl(i) = p(xi+1,…,xL| x1,…,xi,Si=l) = p(xi+1,…,xL|Si=l) =

B(si): The backward algorithm (cont.) Si Si+1 Si+2 SL Xi+1 Xi+2 XL Thus we compute Bl(i) from the values of Bk(i) for all states k using the backward recursion:

B(si): The backward algorithm (end) SL-1 SL XL First step, step L-1: Compute Bl(L-1) for each possible state l: For i=L-2 down to 1, for each possible state l, compute Bl(i) from the values of Bk(i+1) :

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi The combined answer • To compute the probability that Si=lfor all states l, given x=(x1,…,xL), run the forward algorithm and compute Fl(i) = P(x1,…,xi,Si=l), run the backward algorithm to compute Bl(i) = P(xi+1,…,xL|Si=l), the product Fl(i)Bl(i) is the answer. • 2. To compute these probabilities for every isimply run the forward and backward algorithms once, storing Fl(i) and Bl(i) for all i and l. Compute Fl(i)Bl(i) for all i and l.

s1 s2 sL-1 sL si X1 X2 XL-1 XL Xi Time and Space Complexity of the forward/backward algorithms Time complexity is O(m2L) where m is the number of states. It is linear in the length of the chain, provided the number of states is a constant. Space complexity is also O(m2L).

Hidden Markov Model Lecture #6

Hidden Markov Model Lecture #6

Presentation Transcript

Lecture 2 Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model

Hidden Markov Model