1 / 36

Online Viterbi Algorithm for Analysis of Long Biological Sequences

Online Viterbi Algorithm for Analysis of Long Biological Sequences. By Niloofar Hezarjaribi. Hidden Marcov Model. Hidden Marcov Model. Hidden Markov Model (HMM) are commonly used for analysis of long genomic sequences Generative probabilistic model

zuriel
Download Presentation

Online Viterbi Algorithm for Analysis of Long Biological Sequences

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Online Viterbi Algorithm for Analysis of Long Biological Sequences By NiloofarHezarjaribi

  2. Hidden Marcov Model

  3. Hidden Marcov Model • Hidden Markov Model (HMM) are commonly used for analysis of long genomic sequences • Generative probabilistic model • Linear time viterbi algorithm the most commonly used algorithm • Space complexity is O(mn) • Unsuitable for long sequences

  4. Hidden Marcov Model • HMM composed of states and transitions • Generates sequences over a given alphabet • It uses emission probability and transition probability in each state • HMM defines a joint probability Pr(X,S) • X: given sequence • S: state path that maximizes the joint probability

  5. Hidden Markov Model

  6. Hidden Markov Model • Probability of the path is stored in table P(i,j) • Second last state is stored in B(i,j) • tk(j) transition probability from state k to state j • ej(Xi) emission probability of Xi in state j. • Back Pointer B(i,j) is the value of k that maximizes P(i,j) • After computing these values we have to move from right to left following the back pointers

  7. Hidden Markov Model • For an HMM with m states and a sequence X of length n: Space Complexity: O(nm) running time: O(nm2)

  8. Impractical for long sequences O(mn)???!!

  9. Hidden Markov Model • Example: 250 million symbols 100 states memory: 25 GB Completely impractical!!!

  10. Solutions

  11. Split the sequence • Use of Checkpointing

  12. Proposed Solution?

  13. Online Viterbi Algorithmspace complexity: requires much less memory

  14. Online Viterbi Algorithm

  15. Online Viterbi Algorithm • Represent the back pointer matrix B in the Viterbi algorithm by tree structure. • Parent node of node (i,j) is (i-1, B(i,j)) • We eliminate the node’s that are not in one of the paths ending column i. • The highest probable path is the path from leaf (n,j) which has the highest P(n,j) to the root.

  16. Online Viterbi Algorithm • Paths are not necessarily edge disjoint • Often all the paths share the same prefix up to some node called coalescence point. • After processing D symbols we have to check if the coalescence point has been reached or not. • If not we have to choose one of the potential paths heuristically

  17. Online Viterbi Algorithm

  18. Online Viterbi Algorithm • How to find a coalescence point?? • Maintain compressed version of back pointer tree. • Each node stores the number of its children and a pointer to its parent node. • Keep a linked list of all nodes of the compressed tree ordered by the sequence position. • Keep the list of the pointers to all of the leaves.

  19. Online Viterbi Algorithm While processing the k-th sequence: • First, create new leaf • Second, link it to its parent • Third, insert it into linked list • Once these new leaves created, eliminate all the former leaves that have no children and recursively all the ancestors • Finally, we need to compress the tree

  20. Online Viterbi Algorithm How to compress the tree?? • Examine all the nodes in decreasing order • Delete the nodes with zero or one child • If the node has at least two children we will follow the parent link • Link the node to the first ancestor that has at least two children • The node that doesn’t have an ancestor that has at least two children is a coalescence point • Make it a new root • Output the path till that point and remove it from memory

  21. Online Viterbi Algorithm • Running time of this update: O(m) per sequence position • Representation of compressed tree’s space: O(m) • So the time is not increasing by doing this update • Overhead of this update is less than 5% • Worst case space requires O(mn)

  22. Online Viterbi Algorithm Advantages of this algorithm: • The maximum space requirement: O(mlogn) • Online viterbi leads to significant decrease in memory usage • It can construct the initial segment of the most probable path before the whole process is finished

  23. Memory Requirements of Online Viterbi Algorithm

  24. Memory Requirements of Online Viterbi Algorithm Symmetric two states HMM: • Symmetric two states over a binary alphabet

  25. Memory Requirements of Online Viterbi Algorithm • Assume t < ½ and e < ½ • Configuration of back pointers can be as shown below:

  26. Memory Requirements of Online Viterbi Algorithm • Configuration iv never occurs for t < ½ • Coalescence point occurs whenever one of the configurations ii or iii occur.

  27. Memory Requirements of Online Viterbi Algorithm The upper bound memory requirement is O(mlogn)

  28. Memory Requirements of Online Viterbi Algorithm Multi-state HMM: • In two states each new coalescence point will clear the memory, but multi-state leave a tree of substantial length in the memory • So the sizes of consecutive runs are not independent

  29. Memory Requirements of Online Viterbi Algorithm How to evaluate memory requirements of multi-state HMM: • Generalize the two-state to multiple state • Symmetric HMM with m states emits symbols over m letter alphabet • Each symbol emits one symbol with higher probability • Transition probabilities are equiprobable except the self transitions

  30. Memory Requirements of Online Viterbi Algorithm • Algorithm has been tested for m 6 and sequence gene has been generated by HMM • Data are consistent with logarithmic growth of average maximum memory needed.

  31. Conclusion • Algorithm is based on efficient detection of coalescence points in trees • The algorithm requires variable space that depends on the HMM and on the local properties of the analyzed sequence. • Experiments on both simulated and real data suggest that the asymptotic bound Θ(mlogn) extend to multi-state HMMs, and in fact, for most of the time throughout the execution the algorithm uses much less memory • Algorithm can be used for on-line processing of streamed sequences

  32. Use of Online Viterbi algorithm in My Research • Using Viterbi algorithm in DVFS • Assign low frequency to less busy windows and assign high frequency to busy windows

  33. Use of Online Viterbi algorithm in My Research dynamicEnergy= transitionEnergy + 0.5*C*V2 *t; dynamicTime = activeCycles/VF[i][1] + switchLatency*abs(5 - i); profileEnergy = 0.5*capacity*pow(VF[5][0], 2)*activeCycles; profileTime = activeCycles/VF[5][1]; normalizedEnergy = dynamicEnergy / profileEnergy; normalizedTime = dynamicTime / profileTime; myCost= (1-alpha)*normalizedEnergy + (alpha)*normalizedTime

  34. Use of Online Viterbi algorithm in My Research

  35. Question???

More Related