Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

1 / 30

# Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions - PowerPoint PPT Presentation

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions. Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson (Ben-Gurion Uni.). Hidden Markov Models (HMMs). Sunny. Rainy. Cloudy.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Yuri Lifshits (Caltech)

Shay Mozes (Brown Uni.)

Oren Weimann (MIT)

Michal Ziv-Ukelson (Ben-Gurion Uni.)

Hidden Markov Models (HMMs)

Sunny

Rainy

Cloudy

Transition probabilities: the probability of the weather given the previous day's weather.Pi←j = the probability to make a transition to state i from state j

Hidden States: sunny, rainy, cloudy. q1 , … ,qk

Hidden Markov Models (HMMs)

Observable states : the states of the process that are `visible‘ Σ ={soggy, damp, dryish, dry}

Humidity in IBM

Emission probabilities : the probability of observing a particular observable state given that we are in a particular hidden state. ei(σ) = the probability to observe σєΣ given that the state is i

Shortly:
• HiddenMarkov Models are extensively used to model processes in many fields (error-correction, speech recognition, computational linguistics, bioinformatics)
• We show how to exploit repetitions to obtain speedup of HMM algorithms
• Can use different compression schemes
• Applies to several decoding and training algorithms
HMM Decoding and Training
• Decoding:

Given the model (emission and transition probabilities) and an observed sequence X, find the hidden sequence of states that is most likely to have generated the observed sequence.

• X = dry,dry,damp,soggy…
• Training:

Given the number of hidden states and an observed sequence estimate emission and transition probabilities Pi←j , ei(σ)

Decoding

time

1

1

1

1

2

2

2

2

states

k

k

k

k

xn

x1

x2

x3

observed string

• Decoding: Given the model and the observed string find the hidden sequence of states that is most likely to have generated the observed string
Decoding – Viterbi’s Algorithm (VA)

time

states

v6[4]=maxj{e4(c)·P4←j·v5[j]}

v6[4]= e4(c)·P4←2·v5[2]

v5[2]

v6[4]= P4←2·v5[2]

v6[4]= v5[2]

probability of best sequence of states that emits first 5 chars and ends in state 2

probability of best sequence of states that emits first 5 chars and ends in state j

Outline
• Overview
• Exploiting repetitions
• Using Four-Russions speedup
• Using LZ78
• Using Run-Length Encoding
• Training
• Summary of results
VA in Matrix Notation

j

j

i

k

i

v1[i]=maxj{ei(x1)·Pi←j · v0[j]}

Mij(σ) = ei (σ)·Pi←j

1

1

1

1

1

2

2

2

2

2

v1[i]=maxj{Mij(x1) · v0[j]}

(A⊗B)ij= maxk{Aik ·Bkj}

k

k

k

k

k

σ

Viterbi’s algorithm:

O(k2n)

v1= M(x1) ⊗v0

vn=M(xn) ⊗ M(xn-1) ⊗ ··· ⊗ M(x1) ⊗v0

v2= M(x2) ⊗ M(x1) ⊗v0

x1

x2

O(k3n)

Exploiting Repetitions

c a t g a a c t g a a c

vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0

12 steps

• compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once
• use it twice! vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0

6 steps

Exploiting repetitions

ℓ - length of repetition W

λ – number of times W repeats in string

computing M(W) costs (ℓ -1)k3

each time W appears we save (ℓ -1)k2

W is good if λ(ℓ -1)k2 > (ℓ -1)k3

number of repeats = λ > k = number of states

matrix-matrix multiplication

>

matrix-vector multiplication

Offline

General Scheme
• dictionary selection: choose the set D={Wi } of good substrings
• encoding: compute M(Wi ) for every Wi in D
• parsing: partition the input X into good substringsX’ = Wi1Wi2 … Win’
• propagation: run Viterbi’s Algorithm on X’ using M(Wi)
Outline
• Overview
• Exploiting repetitions
• Using Four-Russions speedup
• Using LZ78
• Using Run-Length Encoding
• Training
• Summary of results
Using the Four-Russians Method

Cost

• dictionary selection:D = all strings over Σ of length < l
• encoding: incremental constructionM(Wσ)= M(W) ⊗M(σ)
• parsing:X’ = split X to words of lengthl
• propagation: run VA on X’ using M(Wi )
• Speedup: k2n O(2|Σ|lk3 + k2n/ l)
• O(1)
• O(2|Σ|lk3)
• O(n)
• O(k2n/ l)

= Θ(log n)

Outline
• Overview
• Exploiting repetitions
• Using Four-Russions speedup
• Using LZ78
• Using Run-Length Encoding
• Training
• Summary of results
Lempel Ziv 78
• The next LZ-word is the longest LZ-word previously seen plus one character
• Use a trie
• Number of LZ-words is asymptotically < n ∕ log n

g

a

c

aacgacg

g

Using LZ78

Cost

• dictionary selection:D = all LZ-words in X
• encoding: use incremental nature of LZM(Wσ)= M(W) ⊗M(σ)
• parsing:X’ = LZ parse of X
• propagation: run VA on X’ using M(Wi )
• Speedup: k2n log n k3n ∕ log n k
• O(n)
• O(k3n ∕ log n)
• O(n)
• O(k2n∕ log n)
Improvement

a

g

c

g

• Remember speedup condition: λ > k
• Use just LZ-words that appear more than k times
• These words are represented by trie nodes with more than k descendants
• Now must parse X (step III) differently
• Ensures graceful degradation with increasing k:Speedup: min(1,log n∕ k)
Experimental Results – CpG Islands

~x5 faster:

• Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)
• Long - 22Mbp human Y-chromosome
Outline
• Overview
• Exploiting repetitions
• Using Four-Russions speedup
• Using LZ78
• Using Run-Length Encoding
• Training
• Summary of results

Offline

Run Length Encoding

aaaccggggg → a3c2g5

||

aaaccggggg → a2a1c2g4g1

Path traceback
• In VA, easy to do in O(n) time by keeping track of maximizing states during computation
• The problem: we only get the states on the boundaries of good substrings of X
• Solution: keep track of maximizing states when computing the matrices M(W)=M(W1) ⊗ M(W2). Takes O(n) time and O(n’k2) space
Outline
• Overview
• Exploiting repetitions
• Using Four-Russions speedup
• Using LZ78
• Using Run-Length Encoding
• Training
• Summary of results
Training
• Estimate model θ = {Pi←j, ei(σ)}given X.
• find θ that maximize P(X | θ).
• Use Expectation Maximization:
• Decoding using current θ
• Use decoding result to update θ
VA Training
• Aij= #of times state i follows state j in the most likely sequence of states.
• Ei(σ) = #of times the letter σis emitted by the state i in the most likely sequence.
• Each iteration costs O( VA + n + k2)

path traceback +

update Pi←j , ei(σ)

Decoding (bottleneck) speedup!

The Forward-Backward Algorithm
• The forward algorithm calculates ft[i] the probability to observe the sequence x1, x2, …, xt requiring that the t’th state is i.
• The backward algorithm calculates bt[i] the probability to observe the sequence xt+1, xt+2, …, xngiven that the t’th state is i.

ft=M(xt) ●M(xt-1) ● … ●M(x1) ●f0

bt= bn ●M(xn) ●M(xn-1) ● … ●M(xt+2) ● M(xt+1)

Baum Welch Training (in a nutshell)

Σ

t

• Aij= ft [j]●Pi←j ● ei(xt+1)●bt+1[i]
• each iteration costs: O( FB + nk2)
• If substring W has length l and repeats λ times satisfies:then can speed up the entire process by precalculation

path traceback +

update Pi←j , ei(σ)

Decoding O(nk2)

Outline
• Overview
• Exploiting repetitions
• Using Four-Russions speedup
• Using LZ78
• Using Run-Length Encoding
• Training
• Summary of results
Summary of results
• General framework
• Four-Russians log(n)
• LZ78 log(n) ∕ k
• RLE r ∕ log(r)
• Byte-Pair Encoding r
• SLP, LZ77 r/k
• Path reconstruction O(n)
• Forward-Backward same speedups
• Viterbi training same speedups
• Baum-Welch training speedup, many details
• Parallelization