Detecting subtle sequence signals a gibbs sampling strategy for multiple alignment
Download
1 / 13

- PowerPoint PPT Presentation


  • 227 Views
  • Uploaded on

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment. Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993. Motif Finding Problem.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - azize


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Detecting subtle sequence signals a gibbs sampling strategy for multiple alignment l.jpg

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993


Motif finding problem l.jpg
Motif Finding Problem for Multiple Alignment

  • Biological description: given a set of sequences, find the motif shared by all or most sequences, while its starting position in each sequence is unknown

  • Assumption:

    • Each motif appears exactly once in one sequence

    • The motif has fixed length


Generative model l.jpg
Generative Model for Multiple Alignment

  • Suppose the sequences are aligned, the aligned regions are generated from a motif model (position-specific multinomial distribution)

  • The unaligned regions are generated from a background model


Notations l.jpg
Notations for Multiple Alignment

  • Set of symbols:

  • Sequences: S = {S1, S2, …, SN}

  • Starting positions of motifs: A = {a1, a2, …, aN}

  • Motif model ( ) : qij = P(symbol at the i-th position = j)

  • Background model: pj = P(symbol = j)

  • Count of symbols in each column: cij= count of symbol, j, in the i-th column in the aligned region


Motif finding problem5 l.jpg
Motif Finding Problem for Multiple Alignment

  • Problem: find starting positions and model parameters simultaneously to maximize the posterior probability:

  • This is equivalent to maximizing the likelihood by Bayes’ Theorem, assuming uniform prior distribution:


Equivalent scoring function l.jpg
Equivalent Scoring Function for Multiple Alignment

  • Maximize the log-odds ratio:


Sampling and optimization l.jpg
Sampling and Optimization for Multiple Alignment

  • To maximize a function, f(x):

    • Brute force method: try all possible x

    • Sample method: sample x from probability distribution: p(x) ~ f(x)

    • Idea: suppose xmax is argmax of f(x), then it is also argmax of p(x), thus we have a high probability of selecting xmax


Detailed balance of markov chain l.jpg
Detailed Balance of Markov Chain for Multiple Alignment

  • To sample from a probability distribution p(x), we set up a Markov chain s.t. each state represents a value of x and for any two states, x and y, the transitional probabilities satisfy:

  • This would then imply:


Gibbs sampling l.jpg
Gibbs Sampling for Multiple Alignment

  • Idea: a joint distribution may be hard to sample from, but it may be easy to sample from the conditional distributions where all variables are fixed except one

  • To sample from p(x1, x2, …xn), let each state of the Markov chain represent (x1, x2, …xn), the probability of moving to a state (x1, x2, …xn) is: p(xi |x1, …xi-1,xi+1,…xn). Then the detailed balance is satisfied.


Gibbs sampling10 l.jpg
Gibbs Sampling for Multiple Alignment


Gibbs sampling in motif finding l.jpg
Gibbs Sampling in Motif Finding for Multiple Alignment

  • We should sample from joint distribution of A and θ, but θ has a high dimension, and thus this is inefficient.

  • Instead, we draw sample from:

  • It can be shown that to find the conditional probability, p(az|A*,S) is roughly equivalent to: compute the estimator of θ from A*, then compute the probability of generating az according to this θ.


Estimator of l.jpg
Estimator of θ for Multiple Alignment

  • Given an alignment A, i.e. the starting positions of motifs, θ can be estimated by its MLE with smoothing (alternatively, Dirichlet prior with parameter bj):


Algorithm l.jpg
Algorithm for Multiple Alignment

Randomly initialize A0;

Repeat:

(1) randomly choose a sequence z from S;

A* = At \ az;

compute θt = estimator of θ given S and A*;

(2) sample az according to P(az = x), which is proportional to Qx/Px;

update At+1 = A* union x;

Select At that maximizes F;

Qx: the probability of generating x according to θt;

Px: the probability of generating x according to the background model


ad