speeding up algorithms for hidden markov models by exploiting repetitions n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions PowerPoint Presentation
Download Presentation
Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Loading in 2 Seconds...

play fullscreen
1 / 30

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions - PowerPoint PPT Presentation


  • 139 Views
  • Uploaded on

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions. Yuri Lifshits (Caltech) Shay Mozes (Brown Uni.) Oren Weimann (MIT) Michal Ziv-Ukelson (Ben-Gurion Uni.). Hidden Markov Models (HMMs). Sunny. Rainy. Cloudy.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
speeding up algorithms for hidden markov models by exploiting repetitions

Speeding Up Algorithms for Hidden Markov Models by Exploiting Repetitions

Yuri Lifshits (Caltech)

Shay Mozes (Brown Uni.)

Oren Weimann (MIT)

Michal Ziv-Ukelson (Ben-Gurion Uni.)

hidden markov models hmms
Hidden Markov Models (HMMs)

Sunny

Rainy

Cloudy

Transition probabilities: the probability of the weather given the previous day's weather.Pi←j = the probability to make a transition to state i from state j

Hidden States: sunny, rainy, cloudy. q1 , … ,qk

slide3

Hidden Markov Models (HMMs)

Observable states : the states of the process that are `visible‘ Σ ={soggy, damp, dryish, dry}

Humidity in IBM

Emission probabilities : the probability of observing a particular observable state given that we are in a particular hidden state. ei(σ) = the probability to observe σєΣ given that the state is i

shortly
Shortly:
  • HiddenMarkov Models are extensively used to model processes in many fields (error-correction, speech recognition, computational linguistics, bioinformatics)
  • We show how to exploit repetitions to obtain speedup of HMM algorithms
  • Can use different compression schemes
  • Applies to several decoding and training algorithms
hmm decoding and training
HMM Decoding and Training
  • Decoding:

Given the model (emission and transition probabilities) and an observed sequence X, find the hidden sequence of states that is most likely to have generated the observed sequence.

    • X = dry,dry,damp,soggy…
  • Training:

Given the number of hidden states and an observed sequence estimate emission and transition probabilities Pi←j , ei(σ)

slide6

Decoding

time

1

1

1

1

2

2

2

2

states

k

k

k

k

xn

x1

x2

x3

observed string

  • Decoding: Given the model and the observed string find the hidden sequence of states that is most likely to have generated the observed string
decoding viterbi s algorithm va
Decoding – Viterbi’s Algorithm (VA)

time

states

v6[4]=maxj{e4(c)·P4←j·v5[j]}

v6[4]= e4(c)·P4←2·v5[2]

v5[2]

v6[4]= P4←2·v5[2]

v6[4]= v5[2]

probability of best sequence of states that emits first 5 chars and ends in state 2

probability of best sequence of states that emits first 5 chars and ends in state j

outline
Outline
  • Overview
  • Exploiting repetitions
  • Using Four-Russions speedup
  • Using LZ78
  • Using Run-Length Encoding
  • Training
  • Summary of results
va in matrix notation
VA in Matrix Notation

j

j

i

k

i

v1[i]=maxj{ei(x1)·Pi←j · v0[j]}

Mij(σ) = ei (σ)·Pi←j

1

1

1

1

1

2

2

2

2

2

v1[i]=maxj{Mij(x1) · v0[j]}

(A⊗B)ij= maxk{Aik ·Bkj}

k

k

k

k

k

σ

Viterbi’s algorithm:

O(k2n)

v1= M(x1) ⊗v0

vn=M(xn) ⊗ M(xn-1) ⊗ ··· ⊗ M(x1) ⊗v0

v2= M(x2) ⊗ M(x1) ⊗v0

x1

x2

O(k3n)

exploiting repetitions
Exploiting Repetitions

c a t g a a c t g a a c

vn=M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(c)⊗M(a)⊗M(a)⊗M(g)⊗M(t)⊗M(a)⊗M(c)⊗v0

12 steps

  • compute M(W) = M(c)⊗M(a)⊗M(a)⊗M(g) once
  • use it twice! vn=M(W)⊗M(t)⊗M(W)⊗M(t)⊗M(a)⊗M(c) ⊗v0

6 steps

exploiting repetitions1
Exploiting repetitions

ℓ - length of repetition W

λ – number of times W repeats in string

computing M(W) costs (ℓ -1)k3

each time W appears we save (ℓ -1)k2

W is good if λ(ℓ -1)k2 > (ℓ -1)k3

number of repeats = λ > k = number of states

matrix-matrix multiplication

>

matrix-vector multiplication

general scheme

Offline

General Scheme
  • dictionary selection: choose the set D={Wi } of good substrings
  • encoding: compute M(Wi ) for every Wi in D
  • parsing: partition the input X into good substringsX’ = Wi1Wi2 … Win’
  • propagation: run Viterbi’s Algorithm on X’ using M(Wi)
outline1
Outline
  • Overview
  • Exploiting repetitions
  • Using Four-Russions speedup
  • Using LZ78
  • Using Run-Length Encoding
  • Training
  • Summary of results
using the four russians method
Using the Four-Russians Method

Cost

  • dictionary selection:D = all strings over Σ of length < l
  • encoding: incremental constructionM(Wσ)= M(W) ⊗M(σ)
  • parsing:X’ = split X to words of lengthl
  • propagation: run VA on X’ using M(Wi )
  • Speedup: k2n O(2|Σ|lk3 + k2n/ l)
  • O(1)
  • O(2|Σ|lk3)
  • O(n)
  • O(k2n/ l)

= Θ(log n)

outline2
Outline
  • Overview
  • Exploiting repetitions
  • Using Four-Russions speedup
  • Using LZ78
  • Using Run-Length Encoding
  • Training
  • Summary of results
lempel ziv 78
Lempel Ziv 78
  • The next LZ-word is the longest LZ-word previously seen plus one character
  • Use a trie
  • Number of LZ-words is asymptotically < n ∕ log n

g

a

c

aacgacg

g

using lz78
Using LZ78

Cost

  • dictionary selection:D = all LZ-words in X
  • encoding: use incremental nature of LZM(Wσ)= M(W) ⊗M(σ)
  • parsing:X’ = LZ parse of X
  • propagation: run VA on X’ using M(Wi )
  • Speedup: k2n log n k3n ∕ log n k
  • O(n)
  • O(k3n ∕ log n)
  • O(n)
  • O(k2n∕ log n)
improvement
Improvement

a

g

c

g

  • Remember speedup condition: λ > k
  • Use just LZ-words that appear more than k times
  • These words are represented by trie nodes with more than k descendants
  • Now must parse X (step III) differently
  • Ensures graceful degradation with increasing k:Speedup: min(1,log n∕ k)
experimental results cpg islands
Experimental Results – CpG Islands

~x5 faster:

  • Short - 1.5Mbp chromosome 4 of S. Cerevisiae (yeast)
  • Long - 22Mbp human Y-chromosome
outline3
Outline
  • Overview
  • Exploiting repetitions
  • Using Four-Russions speedup
  • Using LZ78
  • Using Run-Length Encoding
  • Training
  • Summary of results
run length encoding

Offline

Run Length Encoding

aaaccggggg → a3c2g5

||

aaaccggggg → a2a1c2g4g1

path traceback
Path traceback
  • In VA, easy to do in O(n) time by keeping track of maximizing states during computation
  • The problem: we only get the states on the boundaries of good substrings of X
  • Solution: keep track of maximizing states when computing the matrices M(W)=M(W1) ⊗ M(W2). Takes O(n) time and O(n’k2) space
outline4
Outline
  • Overview
  • Exploiting repetitions
  • Using Four-Russions speedup
  • Using LZ78
  • Using Run-Length Encoding
  • Training
  • Summary of results
training
Training
  • Estimate model θ = {Pi←j, ei(σ)}given X.
    • find θ that maximize P(X | θ).
  • Use Expectation Maximization:
      • Decoding using current θ
      • Use decoding result to update θ
va training
VA Training
  • Aij= #of times state i follows state j in the most likely sequence of states.
  • Ei(σ) = #of times the letter σis emitted by the state i in the most likely sequence.
  • Each iteration costs O( VA + n + k2)

path traceback +

update Pi←j , ei(σ)

Decoding (bottleneck) speedup!

the forward backward algorithm
The Forward-Backward Algorithm
  • The forward algorithm calculates ft[i] the probability to observe the sequence x1, x2, …, xt requiring that the t’th state is i.
  • The backward algorithm calculates bt[i] the probability to observe the sequence xt+1, xt+2, …, xngiven that the t’th state is i.

ft=M(xt) ●M(xt-1) ● … ●M(x1) ●f0

bt= bn ●M(xn) ●M(xn-1) ● … ●M(xt+2) ● M(xt+1)

baum welch training in a nutshell
Baum Welch Training (in a nutshell)

Σ

t

  • Aij= ft [j]●Pi←j ● ei(xt+1)●bt+1[i]
  • each iteration costs: O( FB + nk2)
  • If substring W has length l and repeats λ times satisfies:then can speed up the entire process by precalculation

path traceback +

update Pi←j , ei(σ)

Decoding O(nk2)

outline5
Outline
  • Overview
  • Exploiting repetitions
  • Using Four-Russions speedup
  • Using LZ78
  • Using Run-Length Encoding
  • Training
  • Summary of results
summary of results
Summary of results
  • General framework
  • Four-Russians log(n)
  • LZ78 log(n) ∕ k
  • RLE r ∕ log(r)
  • Byte-Pair Encoding r
  • SLP, LZ77 r/k
  • Path reconstruction O(n)
  • Forward-Backward same speedups
  • Viterbi training same speedups
  • Baum-Welch training speedup, many details
  • Parallelization