1 / 43

Expectation-Maximization for HMMs and Motif Discovery

An overview of the Expectation Maximization (EM) algorithm for Hidden Markov Models (HMMs) and motif discovery. Includes interpretation of the Baum-Welch algorithm, optimization strategies, convergence, why EM is used, and generalized EM.

bairdr
Download Presentation

Expectation-Maximization for HMMs and Motif Discovery

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Expectation-Maximizationfor HMMs and Motif Discovery Yves Moreau

  2. Overview • The general Expectation-Maxization algorithm • EM interpretation of the Baum-Welch algorithm for the learning of HMMs • MEME for motif discovery

  3. The general EM algorithm

  4. EM algorithm • Maximum likelihood estimation • Let us assume we have an algorithm that tries to optimize the likelihood • Let us look at the change in likelihood between two iterations of the algorithm

  5. EM algorithm • The likelihood is sometimes difficult to compute • We use a simpler generative model based on unobserved data (data augmentation) • We try to integrate out the unobserved data • The expectation can be an integral or a sum

  6. EM algorithm • Without loss of generality, we work with a sum • Problem: the expression contains the logarithm of a sum • Jensen’s inequality

  7. EM algorithm • Application of Jensen’s inequality • This gives a lower bound for the variation of the likelihood

  8. EM algorithm • Let us try to maximize (the bound on) the variation independent of q

  9. Optimization strategy of EM

  10. EM for independent records • If the data set consists of Nindependent records, we can introduce independent unobserved data • The expectation step (including the use of Jensen’s inequality) takes place “inside” the summation over all records

  11. Convergence of the EM algorithm • Likelihood increases monotonically • + Equilibrium q* of EM is maximum of lnP(D|q*)+ d(q,q*) • Thus q* must be a stationary point of the likelihood (because the bound is tangent to the log-likelihood) • No guarantee for a global optimum (often local minima) • In some cases the stationary point is not a even maximum

  12. Why EM? • EM serves to find a maximum likelihood solution • This can also be achieved by gradient descent • But the computation of the gradients of the likelihood P(D|q) is often difficult • By introducing the unobserved data in the EM algorithm, we can compute the Expectation step more easily

  13. Generalized EM • It is not absolutely necessary to maximize Q(q) at the Expectation step • If Q(qi+1)  Q(qi), convergence can also be achieved • This is the generalized EM algorithm • This algorithm is applied when the results of the Expectation step are too complex to maximize directly

  14. Baum-Welch algorithm

  15. Sequence score Transition probabilities .4 A .2C .4 G .2 T .2 Emission probabilities .6 .6 A .8C 0 G 0 T .2 A 0C .8 G .2 T 0 A .8C .2 G 0 T 0 A 1C 0 G 0 T 0 A 0 C 0 G .2 T .8 A 0C .8 G .2 T 0 1.0 1.0 1.0 .4 1.0 Hidden Markov Models

  16. Hidden Markov Model • In a hidden Markov model, we observe the symbol sequence x but we want to reconstruct the hidden state sequence (pathp) • Transition probabilities (a: a0l, w: ak0) • Emission probabilities • Joint probability of the sequence a,x1,...,xL,w and the path

  17. The forward algorithm • The forward algorithm let us compute the probability P(x) of a sequence w.r.t. an HMM • This is important for the computation of posterior probabilities and the comparison of HMMs • The sum over all paths (exponentially many) can be computed by dynamic programming • Les us define fk(i) as the probability of the sequence for the paths that end in state k with the emission of symbol xi • Then we can compute this probability as

  18. The backward algorithm • The backward algorithm let us compute the probability of the complete sequence together with the condition that symbol xi is emitted from state k • This is important to compute the probability of a given state at symbol xi • P(x1,...,xi,pi=k) can be computed by the forward algorithm fk(i) • Let us define bk(i) as the probability that the rest of the sequence for the paths that pass through state k at symbol xi

  19. EM interpretation of Baum-Welch • We want to estimate the parameters of the hidden Markov model (transition probabilities en emission probabilities that maximize the likelihood of the sequence(s) • Unobserved data = paths p: • EM algorithm

  20. EM interpretation of Baum-Welch • Let us work out the function Q further • The generative model gives the joint probability of the sequence and the path • Define the number of times that a given probability gets used for a given path • Define the number of times that a given emission is observed for a given sequence and a given path

  21. EM interpretation van Baum-Welch • The joint probability of the sequence and the path can be written as • By taking the logarithm, the function Q becomes

  22. EM interpretation van Baum-Welch • Define the expected number of times that a transition gets used (independently of the path) • Define the expected number of times that a transition is observed (independently of the path)

  23. EM interpretation of Baum-Welch • For the function Q, we have • Given that P(x,p|q) is independent of k and b, we can reorder the sums and use the definitions of Akl and Ek(b) • Let us now maximize Q w.r.t. q : akl, ek(b)

  24. EM interpretation of Baum-Welch • Let us look at the A term • Let us define the following candidate for the optimum • Compare with other parameter choices

  25. EM interpretation Baum-Welch • The previous sum has the form of a relative entropy and is always positive • Our candidate maximize thus the A term • Identical procedure for the E term

  26. EM interpretation of Baum-Welch • Baum-Welch • Expectation step • Compute the expected number of times that a transition gets used • Compute the expected number of times that an emission is observed • Use the forward and backward algorithm for this • Maximization step • Update the parameters with the normalized counts

  27. EM interpretation of Baum-Welch • For the transitions • For the emissions

  28. Motif finding

  29. Combinatorial control • Complex integration of multiple cis-regulatory signals controls gene activity

  30. Sequence model : one occurrence per sequence

  31. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  32. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  33. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  34. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  35. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  36. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  37. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  38. Iterative motif discovery • Initialization • Sequences • Random motif matrix • Iteration • Sequence scoring • Alignment update • Motif instances • Motif matrix • Termination • Convergence of the alignment and of the motif matrix

  39. Multiple EM for Motif Elicitation (MEME)

  40. MEME • Expectation-Maximization • Data = set of independent sequences • Likelihood = “one occurrence per sequence” model • Parameters = motif matrix (+ background model) • Missing data = alignment

  41. MEME • Sequence scoring (per sequence) • Uniform prior • Sequence scoring for uniform prior

  42. MEME • Expectation • Maximization - intuitively • If we had only one alignment • Background model: observed frequences at background positions • Motif matrix: observe frequentiesat aligned positions • Here: sum over all possible alignments (independently for each sequence) • Weighted sum:

  43. Summary • The abstract Expectation-Maximization algorithm • EM interpretation of Baum-Welch training for HMMs • EM for motif finding • MEME

More Related