Hidden markov models
Download
1 / 27

Hidden Markov Models - PowerPoint PPT Presentation


  • 67 Views
  • Uploaded on

1. 2. 2. 1. 1. 1. 1. …. 2. 2. 2. 2. …. K. …. …. …. …. x 1. K. K. K. K. x 2. x 3. x K. …. Hidden Markov Models. 1. 1. 1. 1. …. 2. 2. 2. 2. …. …. …. …. …. K. K. K. K. …. Generating a sequence by the model.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Hidden Markov Models' - sadie


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Hidden markov models

1

2

2

1

1

1

1

2

2

2

2

K

x1

K

K

K

K

x2

x3

xK

Hidden Markov Models


Generating a sequence by the model

1

1

1

1

2

2

2

2

K

K

K

K

Generating a sequence by the model

Given a HMM, we can generate a sequence of length n as follows:

  • Start at state 1 according to prob a01

  • Emit letter x1 according to prob e1(x1)

  • Go to state 2 according to prob a12

  • … until emitting xn

1

a02

2

2

0

K

e2(x1)

x1

x2

x3

xn


Evaluation
Evaluation

We will develop algorithms that allow us to compute:

P(x) Probability of x given the model

P(xi…xj) Probability of a substring of x given the model

P(i = k | x) “Posterior” probability that the ith state is k, given x

A more refined measure of which states x may be in


The forward algorithm
The Forward Algorithm

fk(i) = P(x1…xi, i = k) (the forward probability)

Initialization:

f0(0) = 1

fk(0) = 0, for all k > 0

Iteration:

fk(i) = ek(xi) l fl(i – 1) alk

Termination:

P(x) = k fk(N)


Motivation for the backward algorithm
Motivation for the Backward Algorithm

We want to compute

P(i = k | x),

the probability distribution on the ith position, given x

We start by computing

P(i = k, x) = P(x1…xi, i = k, xi+1…xN)

= P(x1…xi, i = k) P(xi+1…xN | x1…xi, i = k)

= P(x1…xi, i = k) P(xi+1…xN | i = k)

Then, P(i = k | x) = P(i = k, x) / P(x)

Forward, fk(i)

Backward, bk(i)


The backward algorithm derivation
The Backward Algorithm – derivation

Define the backward probability:

bk(i) = P(xi+1…xN | i = k) “starting from ith state = k, generate rest of x”

= i+1…N P(xi+1,xi+2, …, xN, i+1, …, N | i = k)

= li+1…N P(xi+1,xi+2, …, xN, i+1 = l, i+2, …, N | i = k)

= l el(xi+1) akli+1…N P(xi+2, …, xN, i+2, …, N | i+1 = l)

= l el(xi+1) aklbl(i+1)


The backward algorithm
The Backward Algorithm

We can compute bk(i) for all k, i, using dynamic programming

Initialization:

bk(N) = 1, for all k

Iteration:

bk(i) = l el(xi+1) akl bl(i+1)

Termination:

P(x) = l a0l el(x1) bl(1)


Computational complexity
Computational Complexity

What is the running time, and space required, for Forward, and Backward?

Time: O(K2N)

Space: O(KN)

Useful implementation technique to avoid underflows

Viterbi: sum of logs

Forward/Backward: rescaling at each few positions by multiplying by a constant


Posterior decoding
Posterior Decoding

P(i = k | x) =

P(i = k , x)/P(x) =

P(x1, …, xi, i = k, xi+1, … xn) / P(x) =

P(x1, …, xi, i = k) P(xi+1, … xn | i = k) / P(x) =

fk(i) bk(i) / P(x)

We can now calculate

fk(i) bk(i)

P(i = k | x) = –––––––

P(x)

Then, we can ask

What is the most likely state at position i of sequence x:

Define ^ by Posterior Decoding:

^i = argmaxkP(i = k | x)


Posterior decoding1
Posterior Decoding

  • For each state,

    • Posterior Decoding gives us a curve of likelihood of state for each position

    • That is sometimes more informative than Viterbi path *

  • Posterior Decoding may give an invalid sequence of states (of prob 0)

    • Why?


Posterior decoding2
Posterior Decoding

x1 x2 x3 …………………………………………… xN

  • P(i = k | x) = P( | x) 1(i = k)

    =  {:[i] = k}P( | x)

State 1

P(i=l|x)

l

k

1() = 1, if  is true

0, otherwise


Viterbi forward backward
Viterbi, Forward, Backward

VITERBI

Initialization:

V0(0) = 1

Vk(0) = 0, for all k > 0

Iteration:

Vl(i) = el(xi) maxkVk(i-1) akl

Termination:

P(x, *) = maxkVk(N)

  • FORWARD

  • Initialization:

  • f0(0) = 1

  • fk(0) = 0, for all k > 0

  • Iteration:

  • fl(i) = el(xi) k fk(i-1) akl

  • Termination:

  • P(x) = k fk(N)

BACKWARD

Initialization:

bk(N) = 1, for all k

Iteration:

bl(i) = k el(xi+1) akl bk(i+1)

Termination:

P(x) = k a0k ek(x1) bk(1)



Higher order hmms
Higher-order HMMs

  • How do we model “memory” larger than one time point?

  • P(i+1 = l | i = k) akl

  • P(i+1 = l | i = k, i -1 = j) ajkl

  • A second order HMM with K states is equivalent to a first order HMM with K2 states

aHHT

state HH

state HT

aHT(prev = H)

aHT(prev = T)

aHTH

state H

state T

aHTT

aTHH

aTHT

state TH

state TT

aTH(prev = H)

aTH(prev = T)

aTTH


Similar algorithms to 1 st order
Similar Algorithms to 1st Order

  • P(i+1 = l | i = k, i -1 = j)

    • Vlk(i) = maxj{ Vkj(i – 1) + … }

    • Time? Space?


Modeling the duration of states
Modeling the Duration of States

1-p

Length distribution of region X:

E[lX] = 1/(1-p)

  • Geometric distribution, with mean 1/(1-p)

    This is a significant disadvantage of HMMs

    Several solutions exist for modeling different length distributions

X

Y

p

q

1-q



Solution 1 chain several states
Solution 1: Chain several states

p

1-p

X

Y

X

X

q

1-q

Disadvantage: Still very inflexible

lX = C + geometric with mean 1/(1-p)


Solution 2 negative binomial distribution
Solution 2: Negative binomial distribution

Duration in X: m turns, where

  • During first m – 1 turns, exactly n – 1 arrows to next state are followed

  • During mth turn, an arrow to next state is followed

    m – 1 m – 1

    P(lX = m) = n – 1 (1 – p)n-1+1p(m-1)-(n-1) = n – 1 (1 – p)npm-n

p

p

p

1 – p

1 – p

1 – p

Y

X(n)

X(1)

X(2)

……


Example genes in prokaryotes
Example: genes in prokaryotes

  • EasyGene:

    Prokaryotic

    gene-finder

    Larsen TS, Krogh A

  • Negative binomial with n = 3


Solution 3 duration modeling
Solution 3: Duration modeling

Upon entering a state:

  • Choose duration d, according to probability distribution

  • Generate d letters according to emission probs

  • Take a transition to next state according to transition probs

    Disadvantage: Increase in complexity of Viterbi:

    Time: O(D)

    Space: O(1)

    where D = maximum duration of state

F

d<Df

xi…xi+d-1

Pf

Warning, Rabiner’s tutorial claims O(D2) & O(D) increases


Viterbi with duration modeling
Viterbi with duration modeling

emissions

emissions

Recall original iteration:

Vl(i) = maxk Vk(i – 1) akl el(xi)

New iteration:

Vl(i) = maxk maxd=1…DlVk(i – d) Pl(d) akl j=i-d+1…iel(xj)

F

L

d<Df

d<Dl

Pl

Pf

transitions

xi…xi + d – 1

xj…xj + d – 1

Precompute cumulative values



A state model for alignment
A state model for alignment

M

(+1,+1)

Alignments correspond 1-to-1 with sequences of states M, I, J

I

(+1, 0)

J

(0, +1)

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---

TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC

IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII


Let s score the transitions
Let’s score the transitions

s(xi, yj)

M

(+1,+1)

Alignments correspond 1-to-1 with sequences of states M, I, J

s(xi, yj)

s(xi, yj)

-d

-d

I

(+1, 0)

J

(0, +1)

-e

-e

-AGGCTATCACCTGACCTCCAGGCCGA--TGCCC---

TAG-CTATCAC--GACCGC-GGTCGATTTGCCCGACC

IMMJMMMMMMMJJMMMMMMJMMMMMMMIIMMMMMIII


Alignment with affine gaps state version
Alignment with affine gaps – state version

Dynamic Programming:

M(i, j): Optimal alignment of x1…xi to y1…yjending in M

I(i, j): Optimal alignment of x1…xi to y1…yj ending in I

J(i, j): Optimal alignment of x1…xi to y1…yjending in J

The score is additive, therefore we can apply DP recurrence formulas


Alignment with affine gaps state version1
Alignment with affine gaps – state version

Initialization:

M(0,0) = 0;

M(i, 0) = M(0, j) = -, for i, j > 0

I(i,0) = d + ie; J(0, j) = d + je

Iteration:

M(i – 1, j – 1)

M(i, j) = s(xi, yj) + max I(i – 1, j – 1)

J(i – 1, j – 1)

e + I(i – 1, j)

I(i, j) = max

d + M(i – 1, j)

e + J(i, j – 1)

J(i, j) = max

d + M(i, j – 1)

Termination:

Optimal alignment given by max { M(m, n), I(m, n), J(m, n) }


ad