1 / 26

Randomized Algorithms for Motif Finding [1] Ch 12.2

Randomized Algorithms for Motif Finding [1] Ch 12.2. l = 8. DNA. cctgatagacgctatctggctatcc a G gtac T t aggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgat C c A tacgt acaccggcaacctgaaacaaacgctcagaaccagaagtgc aa acgt TA gt gcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt

Download Presentation

Randomized Algorithms for Motif Finding [1] Ch 12.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Randomized Algorithmsfor Motif Finding[1] Ch 12.2

  2. l= 8 DNA cctgatagacgctatctggctatccaGgtacTtaggtcctctgtgcgaatctatgcgtttccaaccat agtactggtgtacatttgatCcAtacgtacaccggcaacctgaaacaaacgctcagaaccagaagtgc aaacgtTAgtgcaccctctttcttcgtggctctggccaacgagggctgatgtataagacgaaaatttt agcctccgatgtaagtcatagctgtaactattacctgccacccctattacatcttacgtCcAtataca ctgttatacaacgcgtcatggcggggtatgcgttttggtcgtcgtacgctcgatcgttaCcgtacgGc t=5 n = 69 s s1= 26s2= 21s3= 3 s4= 56s5= 60

  3. a G g t a c T t C c A t a c g t a c g t T A g t a c g t C c A t C c g t a c g G _________________ A 3 0 1 0 3 1 1 0 C 2 4 0 0 1 4 0 0 G 0 1 4 0 0 0 3 1 T 0 0 0 5 1 0 1 4 _________________ Consensusa c g t a c g t Score= 3+4+4+5+3+4+3+4=30 Profile P l t

  4. When P is Given P by count: P by probability: Score: 4+7+4+5+3+7=30

  5. Scoring a String with P The probability of a possible consensus string: Prob(aaacct|P) = 1/2 x 7/8 x 3/8 x 5/8 x 3/8 x 7/8 = .033646 Prob(atacag|P) = 1/2 x 1/8 x 3/8 x 5/8 x 1/8 x 1/8 = .001602

  6. P-Most Probable l-mer P = Given P and a sequence ctataaaccttacatc, find the P-most probable l-mer

  7. P-Most Probable l-mer Find the Prob(a|P) of every possible 6-mer: First try: c t a t a a a c c t t a c a t c Second try: c t a t a a a c c t t a c a t c Third try: c t a t a a a c c t t a c a t c -Continue this process to evaluate every possible 6-mer

  8. P-Most Probable l-mer

  9. P-Most Probable l-mers in All Sequences • Find the P-most probable l-mer in each of the sequences. ctataaacgttacatc atagcgattcgactg cagcccagaaccct cggtataccttacatc tgcattcaatagctta tatcctttccactcac ctccaaatcctttaca ggtcatcctttatcct P=

  10. New Profile ctataaacgttacat atagcgattcgactg cagcccagaaccctg cggtgaaccttacat tgcattcaatagctt tgtcctttccactca ctccaaatcctttac ggtctacctttatcc P-Most Probable l-mers form a new profile

  11. Compare New and Old Profiles New Score: 5+5+4+6+4+7=31 Old Score: 4+7+4+5+3+7=30

  12. Greedy Profile Motif Search • Select random starting positions s • Create profile P • Compute Score • Find P-most probable l-mers • Compute new profile based on new starting positions s • Compute Score • Repeat step 4-5 as long as Score increases Cost: Polynomial for each round, and number of round can be controlled

  13. Performance • Since we choose starting positions randomly, there is little chance that our guess will be close to an optimal motif, meaning it will take a very long time to find the optimal motif. • It is unlikely that the random starting positions will lead us to the correct solution at all. • In practice, this algorithm is run many times with the hope that random starting positions will be close to the optimum solution simply by chance.

  14. Gibbs Sampling • Gibbs Sampling improves on one l-mer at a time • Thus proceeds more slowly and cautiously

  15. Gibbs Sampling: an Example Input: t = 5 sequences, motif length l = 8 1. GTAAACAATATTTATAGC 2. AAAATTTACCTCGCAAGG 3. CCGTACTGTCAAGCGTGG 4. TGAGTAAACGACGTCCCA 5. TACTTAACACCCTGTCAA

  16. Gibbs Sampling: an Example 1) Randomly choose starting positions, s=(s1,s2,s3,s4,s5) in the 5 sequences: s1=7GTAAACAATATTTATAGC s2=11AAAATTTACCTTAGAAGG s3=9CCGTACTGTCAAGCGTGG s4=4TGAGTAAACGACGTCCCA s5=1TACTTAACACCCTGTCAA

  17. Gibbs Sampling: an Example 2) Choose one of the sequences at random: Sequence 2: AAAATTTACCTTAGAAGG s1=7GTAAACAATATTTATAGC s2=11AAAATTTACCTTAGAAGG s3=9CCGTACTGTCAAGCGTGG s4=4TGAGTAAACGACGTCCCA s5=1TACTTAACACCCTGTCAA

  18. Gibbs Sampling: an Example 2) Choose one of the sequences at random: Sequence 2: AAAATTTACCTTAGAAGG s1=7GTAAACAATATTTATAGC s3=9CCGTACTGTCAAGCGTGG s4=4TGAGTAAACGACGTCCCA s5=1TACTTAACACCCTGTCAA

  19. Gibbs Sampling: an Example 3) Create profile P from l-mers in remaining 4 sequences:

  20. Gibbs Sampling: an Example 4) Calculate the prob(a|P) for every possible 8-mer in the removed sequence

  21. Gibbs Sampling: an Example 5) Create a distribution of probabilities of l-mers prob(a|P), and randomly select a new starting position based on this distribution. a) To create this distribution, divide each probability prob(a|P) by the lowest probability: Starting Position 1: prob( AAAATTTA | P ) = .000732 / .000122 = 6 Starting Position 2: prob( AAATTTAC | P ) = .000122 / .000122 = 1 Starting Position 8: prob( ACCTTAGA | P ) = .000183 / .000122 = 1.5 Ratio = 6 : 1 : 1.5

  22. Turning Ratios into Probabilities b) Define probabilities of starting positions according to computed ratios Probability (Selecting Starting Position 1): 6/(6+1+1.5)= 0.706 Probability (Selecting Starting Position 2): 1/(6+1+1.5)= 0.118 Probability (Selecting Starting Position 8): 1.5/(6+1+1.5)=0.176

  23. Gibbs Sampling: an Example c) Select the start position according to computed ratios: P(selecting starting position 1): .706 P(selecting starting position 2): .118 P(selecting starting position 8): .176

  24. Gibbs Sampling: an Example Assume we select the substring with the highest probability – then we are left with the following new substrings and starting positions. s1=7 GTAAACAATATTTATAGC s2=1 AAAATTTACCTCGCAAGG s3=9 CCGTACTGTCAAGCGTGG s4=5TGAGTAATCGACGTCCCA s5=1 TACTTCACACCCTGTCAA Earlier s1=7GTAAACAATATTTATAGC s2=11AAAATTTACCTTAGAAGG s3=9CCGTACTGTCAAGCGTGG s4=4TGAGTAAACGACGTCCCA s5=1TACTTAACACCCTGTCAA

  25. Gibbs Sampling: an Example 6) iterate the procedure again with the above starting positions until we can not improve the score any more.

  26. How Gibbs Sampling Works • Randomly choose starting positions s = (s1,...,st) • Randomly choose one of the t sequences • Create a profile P from the other t -1 sequences • For each position in the removed sequence, calculate the probability that the l-mer starting at that position was generated by P. • Choose a new starting position for the removed sequence at random based on the probabilities calculated in step 4. • Repeat steps 2-5 until there is no improvement

More Related