1 / 37

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability . Chapter 3.3-3.7. Overview . Parameter estimation for HMMs Baum-Welch algorithm HMM model structure More complex Markov chains Numerical stability of HMM algorithms. Specifying a HMM model.

delling
Download Presentation

Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Parameter estimation for HMMs, Baum-Welch algorithm, Model topology, Numerical stability Chapter 3.3-3.7 Elze de Groot

  2. Overview • Parameter estimation for HMMs • Baum-Welch algorithm • HMM model structure • More complex Markov chains • Numerical stability of HMM algorithms Elze de Groot

  3. Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values Elze de Groot

  4. Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values Elze de Groot

  5. Parameter estimation for HMMs • Estimate transition and emission probabilities akl and ek(b) • Two ways of learning: • Estimation when state sequence is known • Estimation when paths are unknown • Assume that we have a set of example sequences (training sequences x1, …xn) Elze de Groot

  6. Parameter estimation for HMMs • Assume that x1…xnindependent. • joint probability • Log space Since log ab = log a + logb Elze de Groot

  7. Estimation when state sequence is known • Easier than estimation when paths unknown • Akl= number of transitions k to l in trainingdata + rkl • Ek(b) = number of emissions of b from k in training data + rk(b) Elze de Groot

  8. Estimation when paths are unknown • More complex than when paths are known • We can’t use maximum likelihood estimators • Instead, an iterative algorithm is used • Baum-Welch Elze de Groot

  9. The Baum-Welch algorithm • We don’t know real values of Akl and Ek(b) • Estimate Akl and Ek(b) • Update akl and ek(b) • Repeat with new model parameters akl and ek(b) Elze de Groot

  10. Forward value Backward value Baum-Welch algorithm Elze de Groot

  11. Baum-Welch algorithm • Now that we have estimated Akl and Ek(b), use maximum likelihood estimators to compute akl and ek(b) • We use these values to estimate Akl and Ek(b) in the next iteration • Continue doing this iteration until change is very small or max number of iterations is exceeded Elze de Groot

  12. Baum-Welch algorithm Elze de Groot

  13. Example • Estimated model with 300 rolls and 30.000 rolls Elze de Groot

  14. Drawbacks • ML estimators • Vulnerable to overfitting if not enough data • Estimations can be undefined if never used in training set (so use of pseudocounts) • Baum-Welch • Many local maximums instead of global maximum can be found, depending on starting values of parameters • This problem will be worse for large HMMs Elze de Groot

  15. Viterbi Training • Most probable path derived using viterbi algorithm • Continue until none of paths change • Finds value of θ that maximises contribution to likelihood • Performs less well than baum welch Elze de Groot

  16. Modelling of labelled sequences • Only -- and ++ are calculated • Better than using ML estimators, when many different classes are present Elze de Groot

  17. Specifying a HMM model • Most difficult problem using HMMs is specifying the model • Design of the structure • Assignment of parameter values Elze de Groot

  18. Design of the structure • Design: how to connect states by transitions • A good HMM is based on the knowledge about the problem under investigation • Local maxima are biggest disadvantage in models that are fully connected • After deleting a transition from model Baum-Welch will still work: set transition probability to zero Elze de Groot

  19. p 1-p Example 1 • Geometric distribution Elze de Groot

  20. Example 2 • Model distribution of length between 2 and 10 Elze de Groot

  21. Example 3 • Negative binomial distribution • p=0.99 • n≤5 Elze de Groot

  22. B  Silent states • States that do not emit symbols • Also in other places in HMM Elze de Groot

  23. Silent states Example Elze de Groot

  24. Silent states • Advantage: • Less estimations of transition probabilities needed • Drawback: • Limits the possibilities of defining a model Elze de Groot

  25. Silent states • Change in forward algorithm • For ‘real’ states the same • For silent states set • Starting from lowest numbered silent state l add for all silent states k<l Elze de Groot

  26. More complex Markov chains • So far, we assumed that probability of a symbol in a sequence depends only on the probability of the previous symbol • More complex • High order Markov chains • Inhomogeneous Markov chains Elze de Groot

  27. P(AB|B) = P(A|B) High order Markov chains • An nth order Markov process • Probability of a symbol in a sequence depends on the probability of the previous n symbols • An nth order Markov chain over some alphabet A is equivalent to a first order Markov chain over the alphabet Anof n-tuples, because: Elze de Groot

  28. Example • A second order Markov chain with two different symbols {A,B} • This can be translated into a first order Markov chain of 2-tuples {AA, AB, BA, BB} Sometimes the framework of high order model is convenient Elze de Groot

  29. Finding prokaryotic genes • Gene candidates in DNA: -sequence of triplets of nucleotides: startcodon nr. of non-stopcodons stopcodon -open reading frame (ORF) • An ORF can be either a gene or a non-coding ORF (NORF) Elze de Groot

  30. Finding prokaryotic genes • Experiment: • DNA from bacterium E.coli • Dataset contains 1100 genes (900 used for training, 200 for testing) • Two models: • Normal model with first order Markov chains • Also first order Markov chains, but codons instead of nucleotides are used as symbol Elze de Groot

  31. Finding prokaryotic genes • Outcomes: Elze de Groot

  32. CAT GCA P(C)aCA aAT aTG aGC aCA P(C)a2CA a3AT a1TG a2GC a3CA Inhomogeneous Markov chains • Using the position information in the codon • Three models for position 1, 2 and 3 1 2 3 1 2 3 Homogeneous Inhomogeneous Elze de Groot

  33. Numerical Stability of HMM algorithms • Multiplying many probabilities can cause numerical problems: • Underflow errors • Wrong numbers are calculated • Solutions: • Log transformation • Scaling of probabilities Elze de Groot

  34. The log transformation • Compute log probabilities • Log 10-100000 = -100000 • Underflow problem is essentially solved • Sum operation is often faster than product operation • In the Viterbi algorithm: Elze de Groot

  35. Scaling of probabilities • Scale f and b variables • Forward variable: • For each i a scaling variable si is defined • New f variables are defined: • New forward recursion: Elze de Groot

  36. Scaling of probabilities • Backward variable • Scaling has to be with same numbers as forward variable • New backward recursion: • This normally works well, however underflow errors can still occur in models with many silent states (chapter 5) Elze de Groot

  37. Summary • Hidden Markov Models • Parameter estimation • State sequence known • State sequence unknown • Model structure • Silent states • More complex Markov chains • Numerical stability Elze de Groot

More Related