1 / 31

Hidden Markov Models

Hidden Markov Models. Overview. Markov models Hidden Markov models(HMM) Issues Regarding HMM Algorithmic approach to Issues of HMM. Markov Models. A Markov model is a finite state machine with N distint states begins at (Time t = 1) in initial state .

lalasa
Download Presentation

Hidden Markov Models

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Hidden Markov Models

  2. Overview • Markov models • Hidden Markov models(HMM) • Issues Regarding HMM • Algorithmic approach to Issues of HMM

  3. Markov Models • A Markov model is a finite state machine with N distint states begins at (Time t = 1) in initial state . • It moves from current state to Next state according to the transition probabilities associated with the Current state • This kind of system is called Finite or Discrete Markov model.

  4. Hidden Markov Models • A Hidden Markov model is a statistical model in which the system being modelled is assumed to be markov process with unobserved hidden states. • In Regular Markov models the state is clearly visible to others in which the state transition probabilities are the parameters only where as in HMM the state is not visible but the output is visible.

  5. H1 H2 HL-1 HL Hi X1 X2 XL-1 XL Xi Hidden Markov Models - HMM • Hidden variables Observed data

  6. Description • Formally, HMM is defined by an alphabet M( ∑,Q,A,E ). • ∑ is an alphabet of symbols. • Q is a set of states, each state will emit symbols from the alphabet ∑. • A = ( ak,l) is a Q × Q matrix describing the probability of changing to state l after the HMM is in state k. • E = (ek(b)) is Q × ∑ matrix describing the probability of emitting the symbol b during a step in which the HMM is in state k.

  7. Fair Bet Casino Problem • Given a sequence of coin tosses, with a sequence of x1,x2,x3……xnof coin tosses can be either heads or tails made by two possible coins(F or B) as input. • We need to find a sequence with each ∏ = ∏1, ∏2, ∏3… ∏n being either F or B indicating that xi is the result of tossing the fair or biased coin.

  8. Explaination • The above problem is ambiguous because the sequence of coins generated may be FFFFF.. Or BBBB..BB . • We need to design the way to grade different coin sequences. • This ill-defined problem should be converted into Decoding problem based on HMM paradigm.

  9. Fair Bet Casino problem • This is an sample HMM designed for Fair Bet casino problem. There are two states F( Fair ) and B ( Biased ) . • Each state can emit either heads( H ) or tails ( T ) with probabilities

  10. For above problem we define the parameters as probability of getting head or tail when we used fair coin is 0.5 • Probability of getting head is 0.75 when we used biased coin and getting tail is 0.25. • If the resulting sequences of tosses is X = x1,x2,x3….xn, the the probability that x was generated by fair coin is P ( X|fair coin ) = ∏i=1n( p( xi )) = 1/ 2n . P ( X|biased coin ) = 3k/ 4n where k is the no of heads in X.

  11. If P( X|fair coin) > P(X|biased coin), then the dealer most likely used a fair coin. • If P( X|fair coin) < P(X|biased coin), then the dealer most likely used a biased coin. • The probabilities of getting fair coin and biased coin will be equal at k = n/ log23. • If k < n/ log23 dealer uses fair coin else dealer uses biased coin. • log2( P(X|fair coin) / P( X|biased coin))= n-k log23

  12. A path ∏ = ∏1, ∏2, ∏3… ∏n is the sequence of states. If dealer uses fair coin for first 3 times and last 3 tosses and the corresponding ∏ would be FFFBBBBBFFF and the resulting sequence of tosses is 01011101001. • Probability of Xi being generated by ∏I is

  13. We denote P( Xi/∏I ) to denote the probability that symbol Xi was emitted from state ∏I • We write the transition probability as P(∏i ----> ∏i+1) • The transition probabilities for above matrix is defined as

  14. The probability of generating X through the path∏ can be calculated as • (1\2*1\2)(1\2*9\10)……..(1\2*9\10)(1\2*9\10) is 2.66* 10-6. • This probability should be maximum. If it is not maximum then it is not the most probable path • We need to select another sequence for ∏ which will get maximum probabilty. • If we select ∏ = FFFBBBFFFFF then the probability is 3.54* 10-6

  15. The probability that sequence x was generated by the path ∏ given the model M, is • Since the above solution is not optimal solution because only the dealer knows the real sequence of states by ∏ that emitted X, we say that ∏ is hidden and attempt to solve the following decoding problem.

  16. Main issues ? • Evaluation problem: Given the HMM M = ( ∑,Q,A,E )and observation sequence X = x1,x2 ……xk, Caluculate the probability that model m has generated sequence X. • Decoding problem : Given the HMM M =( ∑,Q,A,E ) and observation sequence X = x1,x2 ……xk, Caluculate the most likely sequence of hidden states ∏i that generated sequence X.

  17. Solution to decoding problem ? • Decoding problem: Viterbi Algorithm • In this algorithm we go through the observations from start to end referring a state of hidden machine for each observation. • We also record the values of Overall Probability, Viterbi path (sequence of states) and the viterbi probability( Probability of observed state sequences in viterbi path ) • The probability of possible step given its corresponding observation is probability of transmission times the emission probability.

  18. Viterbi algorithm for Dynamic Programming • Overall Probability : Multiply each new probability with the oldone and then add together. • Viterbi probability : Take the highest next step probability and multiply with the next step viterbi probability. • Viterbi path : Add the next step path to viterbi path.

  19. Viterbi algorithm • Here we use HMM-inspired analog of Manhattan grid for Decoding problem

  20. To calculate P(X|∏) we need to set the edge weights in this graph such that product of edge weights will generate the sequence. • There are |Q|2( n-1) edges in the graph where weight of each edge from (k,i) to (l,i+1) given by el(xi=1) * akl • Probability of path ending at any particular vertex is caluculated as

  21. Decoding problem is now reduced to finding a longest path in the directed acyclic graph ( DAG) • The length of path is defined as product of its edges weights instead of caluculating sum of weights in dynamic programming algorithms. • Application of logarithms to the solution makes the same as to previous case. • To calculate the probability of path that ends at state k we need to calculate the most likely path ending at the state k.

  22. The computations in the viterbi algorithm are usualy done using logarithmic scores Sk,I = log Sk,i to avoid the overflow. • Viterbi algorithm is essentially a search through the space of all possible paths in that graph for the one that maximizes the value of P ( X|∏ )

  23. We can also caluculate the probability of HMM was in state k at time i/ P(X,∏i=k) = ∑ all with ∏ i=kP( X|∏ ) • Probability that the dealer had a biased coin at moment I is given by

  24. HMM Parameter Estimation • Previously we know the transition probabilities and emission probabilities of HMM so it is easy to caluculate the hidden states using an observed sequence and probabilities. • It is much difficult to calculate the probable sequence or path when both the probabilities are unknown. • Let Θ be vector combining unknown transition and emission probabilities of the HMM.

  25. We define P( X|Θ) as the maximum probability of x given the assignment of parameters Θ. • Our goal is to find maxΘ P( X|Θ) • Instead of getting single string x, we can obtain a sample of training sequencies x1,x2…..xm max Θ ∏ i=1mP( Xi| Θ)

  26. The common algorithms used for the above approach is heuristics for parameter optimization. • If we know the number of transitions from state k to l and Ek(b) is the number of times b is emitted from state k, then the reasonable estimators are ak,l = Ak,l/ ∑q€QAkq ek(b) = Ek(b)/ ∑σ€∑Ek( σ )

  27. Profile HMM Alignment • This can be used for searching for new family members from a database using paiwise alignments when the functionally related biological sequences are given. • This approach may fail because distant sequences may have weak similarities which will not pass the statistical significance test. • The representation of the family of related proteins is given by their multiple allignment and corresponding profile

  28. A profile represented in terms of frequencies of nucleotides. • HMM can also be used for sequence comparision in particular for alidning a sequence against a profile. • It contains n sequentially linked match states M1,M2,….Mn.

  29. Profile HMM

  30. References • www.cedar.buffalo.edu/~govind/CS661/Lec12.ppt • www.bios.niu.edu/johns/bioinf.../Hidden%20Markov%20Models.ppt • www.mathcs.emory.edu/~cs153000/share/0123/book123.pdf

  31. Thank you

More Related