1 / 21

Finding Subtle Motifs by Branching from Sample Strings

Finding Subtle Motifs by Branching from Sample Strings. Xuan Qi Computer Science Dept. Utah State Univ. Preface. This presentation is based on paper “Finding subtle motifs by branching from sample strings” by Alkes Price, Sriram Ramabhadran and Pavel A. Pevzner. Outline.

melora
Download Presentation

Finding Subtle Motifs by Branching from Sample Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Finding Subtle Motifs by Branching from Sample Strings Xuan Qi Computer Science Dept. Utah State Univ.

  2. Preface • This presentation is based on paper “Finding subtle motifs by branching from sample strings” by Alkes Price, Sriram Ramabhadran and Pavel A. Pevzner.

  3. Outline • Motif finding problem. • Methods that have been proposed to address this problem. • The contribution of the method presented in this paper. • The algorithms proposed in this paper. • Experiment results. • Discussion of the advantages and disadvantages of the method proposed in this paper. • Future research direction.

  4. Motif Finding Problem • Given a set of DNA sequences, find a set of l-mers, one from each sequence, that maximizes the consensus score. Input: A t*n matrix of DNA, and l, the length of the pattern to find. Output: An array of t starting positions s = (s1, s2, …, st) maximizing Score(s, DNA). • Subtle motif: low score, not significant pattern among the sequences, and thus more difficult to identify

  5. Methods Proposed • Category1: Searching possible starting points of the motif Methods: CONSENSUS, GibbsSampling Disadvantages: Search space is very large. They are not always capable to find optimal motifs. • Category2: Searching possible samples of the motif Methods: Vanet et al. 2000, Marsan and Sagot 2000, Pavesi et al. 2001, Apostolico et al. 2002, Eskin and Pevzner 2002 Advantages: Reduce down the search space. Disadvantages: Still have high computational cost especially for long motifs. The selected sample may only converge to local optima instead of global optimal point. An alternative: extended sample-driven approach Search the neighbors of all samples with exhaustive search.

  6. Contribution of This Paper • Basic idea: branching from the sample strings • Contribution: Much more efficient than previous algorithms. Very powerful to find subtle motifs.

  7. Comparison between the Methods

  8. The Algorithms Proposed • Two ways to model a motif: 1. as a pattern 2. as a profile: 4*l matrix • Two algorithms proposed: 1. Pattern-Branching algorithm 2. Profile-Branching algorithm

  9. Pattern-Branching Algorithm • Distance between M and a sample A0: d(M,A0) = k • D = k(A0): a set of patterns of distance exactly k from A0 • Neighbor: D = 1(A0), changing a single nucleotide of A E.g., ATTGCCAG, ATTGCCTG, GTTGCCAG • Score of a pattern: total distance from the sequences 1. For each sequence si, d(A, si) = min{d(A, P)|Psi}, p is a l- mer (a pattern of length n). 2. The total distance of A from S is d(A, S) = ∑ siS d(A, si) • BestNeighbor(A): the pattern B  D = 1(A0) with the lowest total distance d(B, S)

  10. Pattern-Branching Algorithm • Input: A set of sequences S, the length of the motif l and * of mutations k. • Output: motif of length l with k mutations. • Algorithm: PatternBranching(S, l, k) 1. Motif M arbitrary motif pattern 2. Get a set of samples of M in the sequences (S) 3. For each l-mer A0 in S 4. For j  0 to k 5. { 6. if d(Aj, S) < d(M, S) 7. M  Aj 8. Aj+1  Bestneighbour(Aj) 9. Output M 10. }

  11. Profile-Branching Algorithm • Similar to Pattern-Branching • Some changes: 1. convert each sample string to a profile X(A0) 2. generalize the scoring method to score profiles 3. modify the branching method to apply to profiles 4. use the top-scoring profile we find as a seed to the EM algorithm

  12. Profile-Branching Algorithm • Convert a sample string to a profile X(A0):

  13. Profile-Branching Algorithm • Use entropy to score profiles: Given a profile X = (xvw)and a pattern P =p1…pl, let e(X, P) be the log probability of sampling P from X, i.e. e(X, P) = ∑wlog(xpww). G T G A C A T 1/6 1/2 1/2 1/6 1/2 1/2 1/2

  14. Profile-Branching Algorithm • For each sequence Si in the sample S = {S1, …, Sn}, let e(X, Si) = max{e(X, P)|PSi}. • Then the entropy score of X is e(X, S) = ∑siS e(X, si). • Intuitively, e(X, S) describes how well X matches its best occurrence in each sequence of the sample.

  15. Profile-Branching Algorithm • Branching from the sample string: 1. Amplify only one column in the profile (which corresponds to one position in the sample pattern), and we only amplify a nucleotide v if xvw < 0.5. 2. Make sure that the relative entropy ∑v xvwlog(x’vm/xvm) = . We use  = -0.3.

  16. Profile-Branching Algorithm • Algorithm: ProfileBranching(S, l, k) 1. M  arbitrary motif profile 2. For each l-mer A0 in S 3. { 4. X0  X(A0) 5. For j  0 to k 6. { 7. if e(Xj, S) > e(Motif, S) 8. Motif  Xj 9. Xj+1 BestNeighbor(Xj) 10. } 11. Run EM algorithm with Motif as seed

  17. Results on Implanted Motifs • Pattern-Branching algorithm VS previous pattern-based motif finding algorithms WINNOWER, SP-STAR: unable to find subtle motifs PROJECTION, MITRA, MULTIPROFILER

  18. Results on Implanted Motifs • Profile-Branching algorithm VS previous profile-based motif finding algorithms • Performance coefficient: Let k be the set of n implanted motifs found, and let p be the set of predicted motif positions,the performance coefficient is defined to be |K ∩ P|/|K ∪ P|.

  19. Results on Biological Samples • Pattern-Branching Algorithm: • Profile-Branching Algorithm: The pattern returned by profile-branching matches the reference motif.

  20. Discussion • Advantages: Much more efficient than previous algorithms. Very powerful to find subtle motifs. • Disadvantages: 1. Pattern-Branching has difficulty finding motifs with many degenerate positions. But profile-Branching works well on it. 2. Profile-Branching is very powerful to find subtle motifs but is comparatively slow.

  21. Future Work • Apply Pattern-Branching and Profile-Branching algorithms to more challenging biological samples 1. Larger samples 2. Corrupted samples • Extend the algorithms to address the motif finding problem which involves not only A, T, G, C, but purine(R), pryrimidine(Y), weak bond(W) and strong bond(S).

More Related