Hidden Markov Models

Hidden Markov Models

Overview • Markov models • Hidden Markov models(HMM) • Issues Regarding HMM • Algorithmic approach to Issues of HMM

Markov Models • A Markov model is a finite state machine with N distint states begins at (Time t = 1) in initial state . • It moves from current state to Next state according to the transition probabilities associated with the Current state • This kind of system is called Finite or Discrete Markov model.

Hidden Markov Models • A Hidden Markov model is a statistical model in which the system being modelled is assumed to be markov process with unobserved hidden states. • In Regular Markov models the state is clearly visible to others in which the state transition probabilities are the parameters only where as in HMM the state is not visible but the output is visible.

H1 H2 HL-1 HL Hi X1 X2 XL-1 XL Xi Hidden Markov Models - HMM • Hidden variables Observed data

Description • Formally, HMM is defined by an alphabet M( ∑,Q,A,E ). • ∑ is an alphabet of symbols. • Q is a set of states, each state will emit symbols from the alphabet ∑. • A = ( ak,l) is a Q × Q matrix describing the probability of changing to state l after the HMM is in state k. • E = (ek(b)) is Q × ∑ matrix describing the probability of emitting the symbol b during a step in which the HMM is in state k.

Fair Bet Casino Problem • Given a sequence of coin tosses, with a sequence of x1,x2,x3……xnof coin tosses can be either heads or tails made by two possible coins(F or B) as input. • We need to find a sequence with each ∏ = ∏1, ∏2, ∏3… ∏n being either F or B indicating that xi is the result of tossing the fair or biased coin.

Explaination • The above problem is ambiguous because the sequence of coins generated may be FFFFF.. Or BBBB..BB . • We need to design the way to grade different coin sequences. • This ill-defined problem should be converted into Decoding problem based on HMM paradigm.

Fair Bet Casino problem • This is an sample HMM designed for Fair Bet casino problem. There are two states F( Fair ) and B ( Biased ) . • Each state can emit either heads( H ) or tails ( T ) with probabilities

For above problem we define the parameters as probability of getting head or tail when we used fair coin is 0.5 • Probability of getting head is 0.75 when we used biased coin and getting tail is 0.25. • If the resulting sequences of tosses is X = x1,x2,x3….xn, the the probability that x was generated by fair coin is P ( X|fair coin ) = ∏i=1n( p( xi )) = 1/ 2n . P ( X|biased coin ) = 3k/ 4n where k is the no of heads in X.

If P( X|fair coin) > P(X|biased coin), then the dealer most likely used a fair coin. • If P( X|fair coin) < P(X|biased coin), then the dealer most likely used a biased coin. • The probabilities of getting fair coin and biased coin will be equal at k = n/ log23. • If k < n/ log23 dealer uses fair coin else dealer uses biased coin. • log2( P(X|fair coin) / P( X|biased coin))= n-k log23

A path ∏ = ∏1, ∏2, ∏3… ∏n is the sequence of states. If dealer uses fair coin for first 3 times and last 3 tosses and the corresponding ∏ would be FFFBBBBBFFF and the resulting sequence of tosses is 01011101001. • Probability of Xi being generated by ∏I is

We denote P( Xi/∏I ) to denote the probability that symbol Xi was emitted from state ∏I • We write the transition probability as P(∏i ----> ∏i+1) • The transition probabilities for above matrix is defined as

The probability of generating X through the path∏ can be calculated as • (1\2*1\2)(1\2*9\10)……..(1\2*9\10)(1\2*9\10) is 2.66* 10-6. • This probability should be maximum. If it is not maximum then it is not the most probable path • We need to select another sequence for ∏ which will get maximum probabilty. • If we select ∏ = FFFBBBFFFFF then the probability is 3.54* 10-6

The probability that sequence x was generated by the path ∏ given the model M, is • Since the above solution is not optimal solution because only the dealer knows the real sequence of states by ∏ that emitted X, we say that ∏ is hidden and attempt to solve the following decoding problem.

Main issues ? • Evaluation problem: Given the HMM M = ( ∑,Q,A,E )and observation sequence X = x1,x2 ……xk, Caluculate the probability that model m has generated sequence X. • Decoding problem : Given the HMM M =( ∑,Q,A,E ) and observation sequence X = x1,x2 ……xk, Caluculate the most likely sequence of hidden states ∏i that generated sequence X.

Solution to decoding problem ? • Decoding problem: Viterbi Algorithm • In this algorithm we go through the observations from start to end referring a state of hidden machine for each observation. • We also record the values of Overall Probability, Viterbi path (sequence of states) and the viterbi probability( Probability of observed state sequences in viterbi path ) • The probability of possible step given its corresponding observation is probability of transmission times the emission probability.

Viterbi algorithm for Dynamic Programming • Overall Probability : Multiply each new probability with the oldone and then add together. • Viterbi probability : Take the highest next step probability and multiply with the next step viterbi probability. • Viterbi path : Add the next step path to viterbi path.

Viterbi algorithm • Here we use HMM-inspired analog of Manhattan grid for Decoding problem

To calculate P(X|∏) we need to set the edge weights in this graph such that product of edge weights will generate the sequence. • There are |Q|2( n-1) edges in the graph where weight of each edge from (k,i) to (l,i+1) given by el(xi=1) * akl • Probability of path ending at any particular vertex is caluculated as

Decoding problem is now reduced to finding a longest path in the directed acyclic graph ( DAG) • The length of path is defined as product of its edges weights instead of caluculating sum of weights in dynamic programming algorithms. • Application of logarithms to the solution makes the same as to previous case. • To calculate the probability of path that ends at state k we need to calculate the most likely path ending at the state k.

The computations in the viterbi algorithm are usualy done using logarithmic scores Sk,I = log Sk,i to avoid the overflow. • Viterbi algorithm is essentially a search through the space of all possible paths in that graph for the one that maximizes the value of P ( X|∏ )

We can also caluculate the probability of HMM was in state k at time i/ P(X,∏i=k) = ∑ all with ∏ i=kP( X|∏ ) • Probability that the dealer had a biased coin at moment I is given by

HMM Parameter Estimation • Previously we know the transition probabilities and emission probabilities of HMM so it is easy to caluculate the hidden states using an observed sequence and probabilities. • It is much difficult to calculate the probable sequence or path when both the probabilities are unknown. • Let Θ be vector combining unknown transition and emission probabilities of the HMM.

We define P( X|Θ) as the maximum probability of x given the assignment of parameters Θ. • Our goal is to find maxΘ P( X|Θ) • Instead of getting single string x, we can obtain a sample of training sequencies x1,x2…..xm max Θ ∏ i=1mP( Xi| Θ)

The common algorithms used for the above approach is heuristics for parameter optimization. • If we know the number of transitions from state k to l and Ek(b) is the number of times b is emitted from state k, then the reasonable estimators are ak,l = Ak,l/ ∑q€QAkq ek(b) = Ek(b)/ ∑σ€∑Ek( σ )

Profile HMM Alignment • This can be used for searching for new family members from a database using paiwise alignments when the functionally related biological sequences are given. • This approach may fail because distant sequences may have weak similarities which will not pass the statistical significance test. • The representation of the family of related proteins is given by their multiple allignment and corresponding profile

A profile represented in terms of frequencies of nucleotides. • HMM can also be used for sequence comparision in particular for alidning a sequence against a profile. • It contains n sequentially linked match states M1,M2,….Mn.

Profile HMM

References • www.cedar.buffalo.edu/~govind/CS661/Lec12.ppt • www.bios.niu.edu/johns/bioinf.../Hidden%20Markov%20Models.ppt • www.mathcs.emory.edu/~cs153000/share/0123/book123.pdf

Thank you

Hidden Markov Models