Hidden Markov Models

1 / 32

# Hidden Markov Models - PowerPoint PPT Presentation

Hidden Markov Models. Yves Moreau Katholieke Universiteit Leuven. Regular expressions. Alignment Regular expression Problem: regular expression does not distinguish Exceptional TGCTAGG Consensus ACACATC. ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Hidden Markov Models' - alika

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Hidden Markov Models

Yves Moreau

Katholieke Universiteit Leuven

Regular expressions
• Alignment
• Regular expression
• Problem: regular expression does not distinguish
• Exceptional TGCTAGG
• Consensus ACACATC

ACA---ATG

TCAACTATC

ACAC--AGC

AGA---ATC

ACCG--ATC

[AT][CG][AC][ACGT]*A[TG][GC]

Sequence score

Transition probabilities

.4

A .2C .4

G .2

T .2

Emission probabilities

.6

.6

A .8C 0

G 0

T .2

A 0C .8

G .2

T 0

A .8C .2

G 0

T 0

A 1C 0

G 0

T 0

A 0

C 0

G .2

T .8

A 0C .8

G .2

T 0

1.0

1.0

1.0

.4

1.0

Hidden Markov Models
Log odds
• Use logarithm for scaling and normalize by random model
• Log odds for sequence S:

-0.92

A:-0.22C: 0.47

G:-0.22

T:-0.22

-0.51

-0.51

A: 1.16

T:-0.22

C: 1.16

G:-0.22

A: 1.16

C:-0.22

A: 1.39

G:-0.22

T: 1.16

C: 1.16

G:-0.22

0

0

0

-0.92

0

A

T

C

G

Markov chain
• Sequence:
• Example of a Markov chain
• Probabilistic model of a DNA sequence

Transition probabilities

Markov property
• Probability of a sequence through Bayes’ rule
• Markov property
• “The future is only function of the present and not of the past”

A

T

a

w

C

G

Beginning and end of a sequence
• Computation of the probability is not homogeneous
• Length distribution is not modeled
• P(length=L) unspecified
• Solution
• Modeling of beginning and end of the sequence
• The probability to observe a sequence of a given length decreases with the length of the sequence

Sequence score

Transition probabilities

.4

A .2C .4

G .2

T .2

Emission probabilities

.6

.6

A .8C 0

G 0

T .2

A 0C .8

G .2

T 0

A .8C .2

G 0

T 0

A 1C 0

G 0

T 0

A 0

C 0

G .2

T .8

A 0C .8

G .2

T 0

1.0

1.0

1.0

.4

1.0

Hidden Markov Models
Hidden Markov Model
• In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (pathp)
• Transition probabilities (a: a0l, w: ak0)
• Emission probabilities
• Joint probability of the sequence a,x1,...,xL,w and the path

0.9

0.95

1: 1/6

2: 1/6

3: 1/6

4: 1/6

5: 1/6

6: 1/6

1: 1/10

2: 1/10

3: 1/10

4: 1/10

5: 1/10

6: 1/2

0.05

0.1

Fair

Casino (I) – problem setup
• The casino uses mostly a fair die but switches sometimes to a loaded die
• We observe the outcome x of the successive throws but want to know when the die was fair or loaded (pathp)
The Viterbi algorithm
• We look for the most probable path p*
• This problem can be tackled by dynamic programming
• Let us define vk(i) as the probability of the most probable path that ends in state k for the emission of symbol xi
• Then we can compute this probability recursively as
The Viterbi algorithm
• The Viterbi algorithm grows the best path dynamically
• Initial condition: sequence in beginning state
• Traceback pointers tot follow the best path (= decoding)
The forward algorithm
• The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM
• This is important for the computation of posterior probabilities and the comparison of HMMs
• The sum over all paths (exponentially many) can be computed by dynamic programming
• Les us define fk(i) as the probability of the sequence for the paths that end in state k with the emission of symbol xi
• Then we can compute this probability as
The forward algorithm
• The forward algorithm grows the total probability dynamically from the beginning to the end of the sequence
• Initial condition: sequence in beginning state
• End: all states converge to the end state
The backward algorithm
• The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol xi is emitted from state k
• This is important to compute the probability of a given state at symbol xi
• P(x1,...,xi,pi=k) can be computed by the forward algorithm fk(i)
• Let us define bk(i) as the probability that the rest of the sequence for the paths that pass through state k at symbol xi
The backward algorithm
• The backward algorithm grows the probability bk(i) dynamically backwards (from end to beginning)
• Border condition: start in end state
• Once both forward and backward probabilities are available, we can compute the posterior probability of the state
Posterior decoding
• Instead of using the most probable path for decoding (Viterbi), we can use the path of the most probable states
• The path p^ can be “illegal” (P(p^|x)=0)
• This approach can also be used when we are interested in a function g(k) of the state (e.g., labeling)
Casino (III) – posterior decodering
• Posterior probability of the state “fair” w.r.t. the die throws

0.9

0.99

1: 1/6

2: 1/6

3: 1/6

4: 1/6

5: 1/6

6: 1/6

1: 1/10

2: 1/10

3: 1/10

4: 1/10

5: 1/10

6: 1/2

0.01

0.1

Fair

Casino (IV) – posterior decodering
• New situation : P(xi+1 = FAIR | xi = FAIR) = 0.99
• Viterbi decoding cannot detect the cheating from 1000 throws, while posterior decoding does
Choice of the architecture
• For the parameter estimation, we assume that the architecture of the HMM is known
• Choice of architecture is an essential design choice
• Duration modeling
• “Silent states” for gaps
Parameter estimation with known paths
• HMM with parameters q (transition and emission probabilities)
• Training set D of N sequences x1,...,xN
• Score of the model is the likelihood of the parameters given the training data
Parameter estimation with known paths
• If the state paths are known, the parameters are estimated through counts (how often is a transition used, how often is a symbol produced by a given state)
• Use of ‘pseudocounts’ if necessary
• Akl= number of transitions from k to l in training set + pseudocount rkl
• Ek(b) = number of emissions of b from k in training set + pseudocount rk(b)
Parameter estimation with unknown paths: Viterbi training
• Strategy: iterative method
• Suppose that the parameters are known and find the best path
• Use Viterbi decoding to estimate the parameters
• Iterate till convergence
• Viterbi training does not maximize the likelihood of the parameters
• Viterbi training converges exactly in a finite number of steps
Parameter estimation with unknown paths: Baum-Welch training
• Strategy: parallel to Viterbi but we use the expected value for the transition and emission counts (instead of using only the best path)
• For the transitions
• For the emissions
Parameter estimation with unknown paths: Baum-Welch training
• Initialization: Choose arbitrary model parameters
• Recursion:
• Set all transitions and emission variables to their pseudocount
• For all sequences j = 1,...,n
• Compute fk(i) for sequence j with the forward algorithm
• Compute bk(i) for sequence j with the backward algorithm
• Add the contributions to A and E
• Compute the new model parameters akl =Akl/SAkl’ and ek(b)
• Compute the log-likelihood of the model
• End: stop when the log-likelihood does not change more than by some threshold or when the maximum number of iterations is exceeded

0.88

0.71

0.9

0.73

0.95

0.93

1: 0.19

2: 0.19

3: 0.23

4: 0.08

5: 0.23

6: 0.08

1: 0.17

2: 0.17

3: 0.17

4: 0.17

5: 0.17

6: 0.15

1: 1/6

2: 1/6

3: 1/6

4: 1/6

5: 1/6

6: 1/6

1: 0.07

2: 0.10

3: 0.10

4: 0.17

5: 0.05

6: 0.52

1: 1/10

2: 1/10

3: 1/10

4: 1/10

5: 1/10

6: 1/2

1: 0.10

2: 0.11

3: 0.10

4: 0.11

5: 0.10

6: 0.48

0.27

0.07

0.05

0.12

0.29

0.1

Fair

Fair

Fair

Casino (V) – Baum-Welch training

Originalmodel

300 throws

30000 throws

Numerical stability
• Many expressions contain products of many probabilities
• This causes underflow when we compute these expressions
• For Viterbi, this can be solved by working with the logarithms
• For the forward and backward algorithms, we can work with an approximation to the logarithm or by working with rescaled variables
Summary
• Hidden Markov Models
• Computation of sequence and state probabilities
• Viterbi computation of the best state path
• The forward algorithm for the computation of the probability of a sequence
• The backward algorithm for the computation of state probabilities
• Parameter estimation for HMMs
• Parameter estimation with known paths
• Parameter estimation with unknown paths
• Viterbi training
• Baum-Welch training