1 / 22

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions. Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.). Shortly:. Hidden Markov Models are extensively used to model processes in many fields

Download Presentation

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions Shay Mozes Oren Weimann (MIT) Michal Ziv-Ukelson (Tel-Aviv U.)

  2. Shortly: • HiddenMarkov Models are extensively used to model processes in many fields • The runtime of HMM algorithms is usually linear in the length of the input • We show how to exploit repetitions to obtain speedup • First provable speedup of Viterbi’s algorithm • Can use different compression schemes • Applies to several decoding and training algorithms

  3. Markov Models P1←1= 0.9 P2←1= 0.1 P2←2= 0.8 • statesq1 , … ,qk q2 q1 P1←2= 0.2 • transition probabilitiesPi←j e1(A) = 0.3 e1(C) = 0.2 e1(G) = 0.2 e1(T) = 0.3 e2(A) = 0.2 e2(C) = 0.3 e2(G) = 0.3 e2(T) = 0.2 • emission probabilitiesei(σ) σєΣ • time independent, discrete, finite

  4. Markov Models HiddenMarkov Models time 1 1 1 1 2 2 2 2 states k k k k xn x1 x2 x3 observed string • We are only given the description of the model and the observed string • Decoding: find the hidden sequence of states that is most likely to have generated the observed string

  5. Decoding – Viterbi’s Algorithm time states v6[4]=maxj{e4(c)·P4←j·v5[j]} v6[4]= e4(c)·P4←2·v5[2] v5[2] v6[4]= P4←2·v5[2] v6[4]= v5[2] probability of best sequence of states that emits first 5 chars and ends in state 2 probability of best sequence of states that emits first 5 chars and ends in state j

  6. Outline • Overview • Exploiting repetitions • Using LZ78 • Using Run-Length Encoding • Summary of results

  7. VA in Matrix Notation v1[i]=maxj{ei(x1)·Pi←j · v0[j]} Mij(σ) = ei (σ)·Pi←j v1[i]=maxj{ Mij(x1) · v0[j]} (A⊗B)ij= maxk{Aik ·Bkj } Viterbi’s algorithm: O(k2n) vn=M(xn) ⊗ M(xn-1) ⊗ ··· ⊗ M(x1) ⊗v0 v1= M(x1) ⊗v0 v2= M(x2) ⊗ M(x1) ⊗v0 O(k3n)

  8. Exploiting Repetitions c a t g a a c t g a a c vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0 12 steps • compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once • use it twice! vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0 6 steps

  9. Exploiting repetitions ℓ - length of repetition W λ – number of times W repeats in string computing M(W) costs (ℓ -1)k3 each time W appears we save (ℓ -1)k2 W is good if λ(ℓ -1)k2 > (ℓ -1)k3 number of repeatsλ > k number of states matrix-matrix multiplication > matrix-vector multiplication

  10. Offline General Scheme • dictionary selection: choose the set D={Wi } of good substrings • encoding: compute M(Wi ) for every Wi in D • parsing: partition the input X into good substringsX = Wi1Wi2 … Win’X’ = i1,i2, … ,in’ • propagation: run Viterbi’s Algorithm on X’ using M(Wi)

  11. Outline • Overview • Exploiting repetitions • Using LZ78 • Using Run-Length Encoding • Summary of results

  12. LZ78 • The next LZ-word is the longest LZ-word previously seen plus one character • Use a trie • Number of LZ-words is asymptotically < n ∕ log n g a aacgacg c g

  13. Using LZ78 Cost • dictionary selection:D = words in LZ parse of X • encoding: use incremental nature of LZM(Wσ)= M(W) ⊗M(σ) • parsing:X’ = LZ parse of X • propagation: run VA on X’ using M(Wi ) • Speedup: k2n log n k3n ∕ log n k • O(n) • O(k3n ∕ log n) • O(n) • O(k2n∕ log n)

  14. Improvement a g c g • Remember speedup condition: λ > k • Use just LZ-words that appear more than k times • These words are represented by trie nodes with more than k descendants • Now must parse X (step III) differently • Ensures graceful degradation with increasing k:Speedup: min(1,log n∕ k)

  15. Experimental results ~x5 faster: • Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast) • Long - 22Mbp human Y-chromosome

  16. Outline • Overview • Exploiting repetitions • Using LZ78 • Using Run-Length Encoding • Summary of results

  17. Run Length Encoding aaaccggggg → a3c2g5 aaaccggggg → a2a1c2g4g1

  18. Summary of results • General framework • LZ78 log(n) ∕ k • RLE r ∕log(r) • Byte-Pair Encoding r • Path reconstruction O(n) • F/B algorithms (standard matrix multiplication) • Viterbi training same speedups apply • Baum-Welch training speedup, many details • Parallelization

  19. Thank you! Any questions?

  20. Path traceback • In VA, easy to do in O(n) time by keeping track of maximizing states during computation • The problem: we run VA on X’, so we get the sequence of states for X’, not for X.we only get the states on the boundaries of good substrings of X • Solution: keep track of maximizing states when computing the matrices M(w). Takes O(n) time and O(nk2) space

  21. Training • Estimate unknown parameters Pi←j , ei(σ) • Use Expectation Maximization: • Decoding • Recalculate parameters • Viterbi Training: each iteration costs O( VA + n + k2) path traceback + update Pi←j , ei(σ) Decoding (bottleneck) speedup!

  22. Baum Welch Training • each iteration costs: O( FB + nk2) • If substring w has length l and repeats λ times satisfies:then can speed up the entire process by precalculation path traceback + update Pi←j , ei(σ) Decoding O(nk2)

More Related