Loading in 5 sec....

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple AlignmentPowerPoint Presentation

Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

- By
**azize** - Follow User

- 227 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about '' - azize

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### Detecting Subtle Sequence Signals: a Gibbs Sampling Strategy for Multiple Alignment

Lawrence, Altschul, Boguski, Liu, Neuwald, Wootton, Science, 1993

Motif Finding Problem for Multiple Alignment

- Biological description: given a set of sequences, find the motif shared by all or most sequences, while its starting position in each sequence is unknown
- Assumption:
- Each motif appears exactly once in one sequence
- The motif has fixed length

Generative Model for Multiple Alignment

- Suppose the sequences are aligned, the aligned regions are generated from a motif model (position-specific multinomial distribution)
- The unaligned regions are generated from a background model

Notations for Multiple Alignment

- Set of symbols:
- Sequences: S = {S1, S2, …, SN}
- Starting positions of motifs: A = {a1, a2, …, aN}
- Motif model ( ) : qij = P(symbol at the i-th position = j)
- Background model: pj = P(symbol = j)
- Count of symbols in each column: cij= count of symbol, j, in the i-th column in the aligned region

Motif Finding Problem for Multiple Alignment

- Problem: find starting positions and model parameters simultaneously to maximize the posterior probability:

- This is equivalent to maximizing the likelihood by Bayes’ Theorem, assuming uniform prior distribution:

Equivalent Scoring Function for Multiple Alignment

- Maximize the log-odds ratio:

Sampling and Optimization for Multiple Alignment

- To maximize a function, f(x):
- Brute force method: try all possible x
- Sample method: sample x from probability distribution: p(x) ~ f(x)
- Idea: suppose xmax is argmax of f(x), then it is also argmax of p(x), thus we have a high probability of selecting xmax

Detailed Balance of Markov Chain for Multiple Alignment

- To sample from a probability distribution p(x), we set up a Markov chain s.t. each state represents a value of x and for any two states, x and y, the transitional probabilities satisfy:

- This would then imply:

Gibbs Sampling for Multiple Alignment

- Idea: a joint distribution may be hard to sample from, but it may be easy to sample from the conditional distributions where all variables are fixed except one
- To sample from p(x1, x2, …xn), let each state of the Markov chain represent (x1, x2, …xn), the probability of moving to a state (x1, x2, …xn) is: p(xi |x1, …xi-1,xi+1,…xn). Then the detailed balance is satisfied.

Gibbs Sampling for Multiple Alignment

Gibbs Sampling in Motif Finding for Multiple Alignment

- We should sample from joint distribution of A and θ, but θ has a high dimension, and thus this is inefficient.
- Instead, we draw sample from:

- It can be shown that to find the conditional probability, p(az|A*,S) is roughly equivalent to: compute the estimator of θ from A*, then compute the probability of generating az according to this θ.

Estimator of θ for Multiple Alignment

- Given an alignment A, i.e. the starting positions of motifs, θ can be estimated by its MLE with smoothing (alternatively, Dirichlet prior with parameter bj):

Algorithm for Multiple Alignment

Randomly initialize A0;

Repeat:

(1) randomly choose a sequence z from S;

A* = At \ az;

compute θt = estimator of θ given S and A*;

(2) sample az according to P(az = x), which is proportional to Qx/Px;

update At+1 = A* union x;

Select At that maximizes F;

Qx: the probability of generating x according to θt;

Px: the probability of generating x according to the background model

Download Presentation

Connecting to Server..