1 / 13

# - PowerPoint PPT Presentation

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment. Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993. Motif Finding Problem.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

## PowerPoint Slideshow about '' - azize

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993

Motif Finding Problem for Multiple Alignment

• Biological description: given a set of sequences, find the motif shared by all or most sequences, while its starting position in each sequence is unknown

• Assumption:

• Each motif appears exactly once in one sequence

• The motif has fixed length

Generative Model for Multiple Alignment

• Suppose the sequences are aligned, the aligned regions are generated from a motif model (position-specific multinomial distribution)

• The unaligned regions are generated from a background model

Notations for Multiple Alignment

• Set of symbols:

• Sequences: S = {S1, S2, …, SN}

• Starting positions of motifs: A = {a1, a2, …, aN}

• Motif model ( ) : qij = P(symbol at the i-th position = j)

• Background model: pj = P(symbol = j)

• Count of symbols in each column: cij= count of symbol, j, in the i-th column in the aligned region

Motif Finding Problem for Multiple Alignment

• Problem: find starting positions and model parameters simultaneously to maximize the posterior probability:

• This is equivalent to maximizing the likelihood by Bayes’ Theorem, assuming uniform prior distribution:

Equivalent Scoring Function for Multiple Alignment

• Maximize the log-odds ratio:

Sampling and Optimization for Multiple Alignment

• To maximize a function, f(x):

• Brute force method: try all possible x

• Sample method: sample x from probability distribution: p(x) ~ f(x)

• Idea: suppose xmax is argmax of f(x), then it is also argmax of p(x), thus we have a high probability of selecting xmax

Detailed Balance of Markov Chain for Multiple Alignment

• To sample from a probability distribution p(x), we set up a Markov chain s.t. each state represents a value of x and for any two states, x and y, the transitional probabilities satisfy:

• This would then imply:

Gibbs Sampling for Multiple Alignment

• Idea: a joint distribution may be hard to sample from, but it may be easy to sample from the conditional distributions where all variables are fixed except one

• To sample from p(x1, x2, …xn), let each state of the Markov chain represent (x1, x2, …xn), the probability of moving to a state (x1, x2, …xn) is: p(xi |x1, …xi-1,xi+1,…xn). Then the detailed balance is satisfied.

Gibbs Sampling for Multiple Alignment

Gibbs Sampling in Motif Finding for Multiple Alignment

• We should sample from joint distribution of A and θ, but θ has a high dimension, and thus this is inefficient.

• Instead, we draw sample from:

• It can be shown that to find the conditional probability, p(az|A*,S) is roughly equivalent to: compute the estimator of θ from A*, then compute the probability of generating az according to this θ.

Estimator of θ for Multiple Alignment

• Given an alignment A, i.e. the starting positions of motifs, θ can be estimated by its MLE with smoothing (alternatively, Dirichlet prior with parameter bj):

Algorithm for Multiple Alignment

Randomly initialize A0;

Repeat:

(1) randomly choose a sequence z from S;

A* = At \ az;

compute θt = estimator of θ given S and A*;

(2) sample az according to P(az = x), which is proportional to Qx/Px;

update At+1 = A* union x;

Select At that maximizes F;

Qx: the probability of generating x according to θt;

Px: the probability of generating x according to the background model