hidden markov models n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hidden Markov Models PowerPoint Presentation
Download Presentation
Hidden Markov Models

Loading in 2 Seconds...

play fullscreen
1 / 32

Hidden Markov Models - PowerPoint PPT Presentation


  • 66 Views
  • Uploaded on

Hidden Markov Models. Yves Moreau Katholieke Universiteit Leuven. Regular expressions. Alignment Regular expression Problem: regular expression does not distinguish Exceptional TGCTAGG Consensus ACACATC. ACA---ATG TCAACTATC ACAC--AGC AGA---ATC ACCG--ATC.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Hidden Markov Models' - alika


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
hidden markov models

Hidden Markov Models

Yves Moreau

Katholieke Universiteit Leuven

regular expressions
Regular expressions
  • Alignment
  • Regular expression
  • Problem: regular expression does not distinguish
    • Exceptional TGCTAGG
    • Consensus ACACATC

ACA---ATG

TCAACTATC

ACAC--AGC

AGA---ATC

ACCG--ATC

[AT][CG][AC][ACGT]*A[TG][GC]

hidden markov models1

Sequence score

Transition probabilities

.4

A .2C .4

G .2

T .2

Emission probabilities

.6

.6

A .8C 0

G 0

T .2

A 0C .8

G .2

T 0

A .8C .2

G 0

T 0

A 1C 0

G 0

T 0

A 0

C 0

G .2

T .8

A 0C .8

G .2

T 0

1.0

1.0

1.0

.4

1.0

Hidden Markov Models
log odds
Log odds
  • Use logarithm for scaling and normalize by random model
  • Log odds for sequence S:

-0.92

A:-0.22C: 0.47

G:-0.22

T:-0.22

-0.51

-0.51

A: 1.16

T:-0.22

C: 1.16

G:-0.22

A: 1.16

C:-0.22

A: 1.39

G:-0.22

T: 1.16

C: 1.16

G:-0.22

0

0

0

-0.92

0

markov chain

A

T

C

G

Markov chain
  • Sequence:
  • Example of a Markov chain
    • Probabilistic model of a DNA sequence

Transition probabilities

markov property
Markov property
  • Probability of a sequence through Bayes’ rule
  • Markov property
    • “The future is only function of the present and not of the past”
beginning and end of a sequence

A

T

a

w

C

G

Beginning and end of a sequence
  • Computation of the probability is not homogeneous
  • Length distribution is not modeled
    • P(length=L) unspecified
  • Solution
    • Modeling of beginning and end of the sequence
  • The probability to observe a sequence of a given length decreases with the length of the sequence
hidden markov models2

Sequence score

Transition probabilities

.4

A .2C .4

G .2

T .2

Emission probabilities

.6

.6

A .8C 0

G 0

T .2

A 0C .8

G .2

T 0

A .8C .2

G 0

T 0

A 1C 0

G 0

T 0

A 0

C 0

G .2

T .8

A 0C .8

G .2

T 0

1.0

1.0

1.0

.4

1.0

Hidden Markov Models
hidden markov model
Hidden Markov Model
  • In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (pathp)
  • Transition probabilities (a: a0l, w: ak0)
  • Emission probabilities
  • Joint probability of the sequence a,x1,...,xL,w and the path
casino i problem setup

0.9

0.95

1: 1/6

2: 1/6

3: 1/6

4: 1/6

5: 1/6

6: 1/6

1: 1/10

2: 1/10

3: 1/10

4: 1/10

5: 1/10

6: 1/2

0.05

0.1

Fair

Loaded

Casino (I) – problem setup
  • The casino uses mostly a fair die but switches sometimes to a loaded die
  • We observe the outcome x of the successive throws but want to know when the die was fair or loaded (pathp)
the viterbi algorithm
The Viterbi algorithm
  • We look for the most probable path p*
  • This problem can be tackled by dynamic programming
  • Let us define vk(i) as the probability of the most probable path that ends in state k for the emission of symbol xi
  • Then we can compute this probability recursively as
the viterbi algorithm1
The Viterbi algorithm
  • The Viterbi algorithm grows the best path dynamically
  • Initial condition: sequence in beginning state
  • Traceback pointers tot follow the best path (= decoding)
the forward algorithm
The forward algorithm
  • The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM
  • This is important for the computation of posterior probabilities and the comparison of HMMs
  • The sum over all paths (exponentially many) can be computed by dynamic programming
  • Les us define fk(i) as the probability of the sequence for the paths that end in state k with the emission of symbol xi
  • Then we can compute this probability as
the forward algorithm1
The forward algorithm
  • The forward algorithm grows the total probability dynamically from the beginning to the end of the sequence
  • Initial condition: sequence in beginning state
  • End: all states converge to the end state
the backward algorithm
The backward algorithm
  • The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol xi is emitted from state k
  • This is important to compute the probability of a given state at symbol xi
  • P(x1,...,xi,pi=k) can be computed by the forward algorithm fk(i)
  • Let us define bk(i) as the probability that the rest of the sequence for the paths that pass through state k at symbol xi
the backward algorithm1
The backward algorithm
  • The backward algorithm grows the probability bk(i) dynamically backwards (from end to beginning)
  • Border condition: start in end state
  • Once both forward and backward probabilities are available, we can compute the posterior probability of the state
posterior decoding
Posterior decoding
  • Instead of using the most probable path for decoding (Viterbi), we can use the path of the most probable states
  • The path p^ can be “illegal” (P(p^|x)=0)
  • This approach can also be used when we are interested in a function g(k) of the state (e.g., labeling)
casino iii posterior decodering
Casino (III) – posterior decodering
  • Posterior probability of the state “fair” w.r.t. the die throws
casino iv posterior decodering

0.9

0.99

1: 1/6

2: 1/6

3: 1/6

4: 1/6

5: 1/6

6: 1/6

1: 1/10

2: 1/10

3: 1/10

4: 1/10

5: 1/10

6: 1/2

0.01

0.1

Fair

Loaded

Casino (IV) – posterior decodering
  • New situation : P(xi+1 = FAIR | xi = FAIR) = 0.99
  • Viterbi decoding cannot detect the cheating from 1000 throws, while posterior decoding does
choice of the architecture
Choice of the architecture
  • For the parameter estimation, we assume that the architecture of the HMM is known
  • Choice of architecture is an essential design choice
  • Duration modeling
  • “Silent states” for gaps
parameter estimation with known paths
Parameter estimation with known paths
  • HMM with parameters q (transition and emission probabilities)
  • Training set D of N sequences x1,...,xN
  • Score of the model is the likelihood of the parameters given the training data
parameter estimation with known paths1
Parameter estimation with known paths
  • If the state paths are known, the parameters are estimated through counts (how often is a transition used, how often is a symbol produced by a given state)
  • Use of ‘pseudocounts’ if necessary
    • Akl= number of transitions from k to l in training set + pseudocount rkl
    • Ek(b) = number of emissions of b from k in training set + pseudocount rk(b)
parameter estimation with unknown paths viterbi training
Parameter estimation with unknown paths: Viterbi training
  • Strategy: iterative method
    • Suppose that the parameters are known and find the best path
    • Use Viterbi decoding to estimate the parameters
    • Iterate till convergence
  • Viterbi training does not maximize the likelihood of the parameters
  • Viterbi training converges exactly in a finite number of steps
parameter estimation with unknown paths baum welch training
Parameter estimation with unknown paths: Baum-Welch training
  • Strategy: parallel to Viterbi but we use the expected value for the transition and emission counts (instead of using only the best path)
  • For the transitions
  • For the emissions
parameter estimation with unknown paths baum welch training1
Parameter estimation with unknown paths: Baum-Welch training
  • Initialization: Choose arbitrary model parameters
  • Recursion:
    • Set all transitions and emission variables to their pseudocount
    • For all sequences j = 1,...,n
      • Compute fk(i) for sequence j with the forward algorithm
      • Compute bk(i) for sequence j with the backward algorithm
      • Add the contributions to A and E
    • Compute the new model parameters akl =Akl/SAkl’ and ek(b)
    • Compute the log-likelihood of the model
  • End: stop when the log-likelihood does not change more than by some threshold or when the maximum number of iterations is exceeded
casino v baum welch training

0.88

0.71

0.9

0.73

0.95

0.93

1: 0.19

2: 0.19

3: 0.23

4: 0.08

5: 0.23

6: 0.08

1: 0.17

2: 0.17

3: 0.17

4: 0.17

5: 0.17

6: 0.15

1: 1/6

2: 1/6

3: 1/6

4: 1/6

5: 1/6

6: 1/6

1: 0.07

2: 0.10

3: 0.10

4: 0.17

5: 0.05

6: 0.52

1: 1/10

2: 1/10

3: 1/10

4: 1/10

5: 1/10

6: 1/2

1: 0.10

2: 0.11

3: 0.10

4: 0.11

5: 0.10

6: 0.48

0.27

0.07

0.05

0.12

0.29

0.1

Fair

Fair

Fair

Loaded

Loaded

Loaded

Casino (V) – Baum-Welch training

Originalmodel

300 throws

30000 throws

numerical stability
Numerical stability
  • Many expressions contain products of many probabilities
  • This causes underflow when we compute these expressions
  • For Viterbi, this can be solved by working with the logarithms
  • For the forward and backward algorithms, we can work with an approximation to the logarithm or by working with rescaled variables
summary
Summary
  • Hidden Markov Models
  • Computation of sequence and state probabilities
    • Viterbi computation of the best state path
    • The forward algorithm for the computation of the probability of a sequence
    • The backward algorithm for the computation of state probabilities
  • Parameter estimation for HMMs
    • Parameter estimation with known paths
    • Parameter estimation with unknown paths
      • Viterbi training
      • Baum-Welch training