Loading in 2 Seconds...

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Loading in 2 Seconds...

- 139 Views
- Uploaded on

Download Presentation
## Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

OutlineOutlineOutline

Yuri Lifshits (Caltech)

Shay Mozes (Brown Uni.)

Oren Weimann (MIT)

Michal Ziv-Ukelson (Ben-Gurion Uni.)

Hidden Markov Models (HMMs)

Sunny

Rainy

Cloudy

Transition probabilities: the probability of the weather given the previous day's weather.Pi←j = the probability to make a transition to state i from state j

Hidden States: sunny, rainy, cloudy. q1 , … ,qk

Observable states : the states of the process that are `visible‘ Σ ={soggy, damp, dryish, dry}

Humidity in IBM

Emission probabilities : the probability of observing a particular observable state given that we are in a particular hidden state. ei(σ) = the probability to observe σєΣ given that the state is i

Shortly:

- HiddenMarkov Models are extensively used to model processes in many fields (error-correction, speech recognition, computational linguistics, bioinformatics)
- We show how to exploit repetitions to obtain speedup of HMM algorithms
- Can use different compression schemes
- Applies to several decoding and training algorithms

HMM Decoding and Training

- Decoding:

Given the model (emission and transition probabilities) and an observed sequence X, find the hidden sequence of states that is most likely to have generated the observed sequence.

- X = dry,dry,damp,soggy…
- Training:

Given the number of hidden states and an observed sequence estimate emission and transition probabilities Pi←j , ei(σ)

time

1

1

1

1

2

2

2

2

states

k

k

k

k

xn

x1

x2

x3

observed string

- Decoding: Given the model and the observed string find the hidden sequence of states that is most likely to have generated the observed string

Decoding – Viterbi’s Algorithm (VA)

time

states

v6[4]=maxj{e4(c)·P4←j·v5[j]}

v6[4]= e4(c)·P4←2·v5[2]

v5[2]

v6[4]= P4←2·v5[2]

v6[4]= v5[2]

probability of best sequence of states that emits first 5 chars and ends in state 2

probability of best sequence of states that emits first 5 chars and ends in state j

Outline

- Overview
- Exploiting repetitions
- Using Four-Russions speedup
- Using LZ78
- Using Run-Length Encoding
- Training
- Summary of results

VA in Matrix Notation

j

j

i

k

i

v1[i]=maxj{ei(x1)·Pi←j · v0[j]}

Mij(σ) = ei (σ)·Pi←j

1

1

1

1

1

2

2

2

2

2

v1[i]=maxj{Mij(x1) · v0[j]}

(A⊗B)ij= maxk{Aik ·Bkj}

k

k

k

k

k

σ

Viterbi’s algorithm:

O(k2n)

v1= M(x1) ⊗v0

vn=M(xn) ⊗ M(xn-1) ⊗ ··· ⊗ M(x1) ⊗v0

v2= M(x2) ⊗ M(x1) ⊗v0

x1

x2

O(k3n)

Exploiting Repetitions

c a t g a a c t g a a c

vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0

12 steps

- compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once

- use it twice! vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0

6 steps

Exploiting repetitions

ℓ - length of repetition W

λ – number of times W repeats in string

computing M(W) costs (ℓ -1)k3

each time W appears we save (ℓ -1)k2

W is good if λ(ℓ -1)k2 > (ℓ -1)k3

number of repeats = λ > k = number of states

matrix-matrix multiplication

>

matrix-vector multiplication

General Scheme

- dictionary selection: choose the set D={Wi } of good substrings
- encoding: compute M(Wi ) for every Wi in D
- parsing: partition the input X into good substringsX’ = Wi1Wi2 … Win’
- propagation: run Viterbi’s Algorithm on X’ using M(Wi)

Outline

- Overview
- Exploiting repetitions
- Using Four-Russions speedup
- Using LZ78
- Using Run-Length Encoding
- Training
- Summary of results

Using the Four-Russians Method

Cost

- dictionary selection:D = all strings over Σ of length < l
- encoding: incremental constructionM(Wσ)= M(W) ⊗M(σ)
- parsing:X’ = split X to words of lengthl
- propagation: run VA on X’ using M(Wi )
- Speedup: k2n O(2|Σ|lk3 + k2n/ l)

- O(1)
- O(2|Σ|lk3)
- O(n)
- O(k2n/ l)

= Θ(log n)

Outline

- Overview
- Exploiting repetitions
- Using Four-Russions speedup
- Using LZ78
- Using Run-Length Encoding
- Training
- Summary of results

Lempel Ziv 78

- The next LZ-word is the longest LZ-word previously seen plus one character
- Use a trie

- Number of LZ-words is asymptotically < n ∕ log n

g

a

c

aacgacg

g

Using LZ78

Cost

- dictionary selection:D = all LZ-words in X
- encoding: use incremental nature of LZM(Wσ)= M(W) ⊗M(σ)
- parsing:X’ = LZ parse of X
- propagation: run VA on X’ using M(Wi )
- Speedup: k2n log n k3n ∕ log n k

- O(n)
- O(k3n ∕ log n)
- O(n)
- O(k2n∕ log n)

Improvement

a

g

c

g

- Remember speedup condition: λ > k
- Use just LZ-words that appear more than k times
- These words are represented by trie nodes with more than k descendants
- Now must parse X (step III) differently
- Ensures graceful degradation with increasing k:Speedup: min(1,log n∕ k)

Experimental Results – CpG Islands

~x5 faster:

- Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)
- Long - 22Mbp human Y-chromosome

- Overview
- Exploiting repetitions
- Using Four-Russions speedup
- Using LZ78
- Using Run-Length Encoding
- Training
- Summary of results

Path traceback

- In VA, easy to do in O(n) time by keeping track of maximizing states during computation
- The problem: we only get the states on the boundaries of good substrings of X
- Solution: keep track of maximizing states when computing the matrices M(W)=M(W1) ⊗ M(W2). Takes O(n) time and O(n’k2) space

- Overview
- Exploiting repetitions
- Using Four-Russions speedup
- Using LZ78
- Using Run-Length Encoding
- Training
- Summary of results

Training

- Estimate model θ = {Pi←j, ei(σ)}given X.
- find θ that maximize P(X | θ).
- Use Expectation Maximization:
- Decoding using current θ
- Use decoding result to update θ

VA Training

- Aij= #of times state i follows state j in the most likely sequence of states.
- Ei(σ) = #of times the letter σis emitted by the state i in the most likely sequence.
- Each iteration costs O( VA + n + k2)

path traceback +

update Pi←j , ei(σ)

Decoding (bottleneck) speedup!

The Forward-Backward Algorithm

- The forward algorithm calculates ft[i] the probability to observe the sequence x1, x2, …, xt requiring that the t’th state is i.
- The backward algorithm calculates bt[i] the probability to observe the sequence xt+1, xt+2, …, xngiven that the t’th state is i.

ft=M(xt) ●M(xt-1) ● … ●M(x1) ●f0

bt= bn ●M(xn) ●M(xn-1) ● … ●M(xt+2) ● M(xt+1)

Baum Welch Training (in a nutshell)

Σ

t

- Aij= ft [j]●Pi←j ● ei(xt+1)●bt+1[i]
- each iteration costs: O( FB + nk2)
- If substring W has length l and repeats λ times satisfies:then can speed up the entire process by precalculation

path traceback +

update Pi←j , ei(σ)

Decoding O(nk2)

- Overview
- Exploiting repetitions
- Using Four-Russions speedup
- Using LZ78
- Using Run-Length Encoding
- Training
- Summary of results

Summary of results

- General framework
- Four-Russians log(n)
- LZ78 log(n) ∕ k
- RLE r ∕ log(r)
- Byte-Pair Encoding r
- SLP, LZ77 r/k
- Path reconstruction O(n)
- Forward-Backward same speedups
- Viterbi training same speedups
- Baum-Welch training speedup, many details
- Parallelization

Download Presentation

Connecting to Server..