1 / 13

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment. Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993. Motif Finding Problem.

azize
Download Presentation

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993

  2. Motif Finding Problem • Biological description: given a set of sequences, find the motif shared by all or most sequences, while its starting position in each sequence is unknown • Assumption: • Each motif appears exactly once in one sequence • The motif has fixed length

  3. Generative Model • Suppose the sequences are aligned, the aligned regions are generated from a motif model (position-specific multinomial distribution) • The unaligned regions are generated from a background model

  4. Notations • Set of symbols: • Sequences: S = {S1, S2, …, SN} • Starting positions of motifs: A = {a1, a2, …, aN} • Motif model ( ) : qij = P(symbol at the i-th position = j) • Background model: pj = P(symbol = j) • Count of symbols in each column: cij= count of symbol, j, in the i-th column in the aligned region

  5. Motif Finding Problem • Problem: find starting positions and model parameters simultaneously to maximize the posterior probability: • This is equivalent to maximizing the likelihood by Bayes’ Theorem, assuming uniform prior distribution:

  6. Equivalent Scoring Function • Maximize the log-odds ratio:

  7. Sampling and Optimization • To maximize a function, f(x): • Brute force method: try all possible x • Sample method: sample x from probability distribution: p(x) ~ f(x) • Idea: suppose xmax is argmax of f(x), then it is also argmax of p(x), thus we have a high probability of selecting xmax

  8. Detailed Balance of Markov Chain • To sample from a probability distribution p(x), we set up a Markov chain s.t. each state represents a value of x and for any two states, x and y, the transitional probabilities satisfy: • This would then imply:

  9. Gibbs Sampling • Idea: a joint distribution may be hard to sample from, but it may be easy to sample from the conditional distributions where all variables are fixed except one • To sample from p(x1, x2, …xn), let each state of the Markov chain represent (x1, x2, …xn), the probability of moving to a state (x1, x2, …xn) is: p(xi |x1, …xi-1,xi+1,…xn). Then the detailed balance is satisfied.

  10. Gibbs Sampling

  11. Gibbs Sampling in Motif Finding • We should sample from joint distribution of A and θ, but θ has a high dimension, and thus this is inefficient. • Instead, we draw sample from: • It can be shown that to find the conditional probability, p(az|A*,S) is roughly equivalent to: compute the estimator of θ from A*, then compute the probability of generating az according to this θ.

  12. Estimator of θ • Given an alignment A, i.e. the starting positions of motifs, θ can be estimated by its MLE with smoothing (alternatively, Dirichlet prior with parameter bj):

  13. Algorithm Randomly initialize A0; Repeat: (1) randomly choose a sequence z from S; A* = At \ az; compute θt = estimator of θ given S and A*; (2) sample az according to P(az = x), which is proportional to Qx/Px; update At+1 = A* union x; Select At that maximizes F; Qx: the probability of generating x according to θt; Px: the probability of generating x according to the background model

More Related