1 / 22

Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann

Local Alignment of RNA Sequences with Arbitrary Scoring Schemes. Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann. C G. C G. G C. C. A U. U A. C G. A. G. U. A. G. U. C. G. A. C. G. U. G. U. C. A. A. A. C. G. U. U. G. G. C. RNA sequences. RNA sequences.

hisa
Download Presentation

Rolf Backofen Danny Hermelin Gad M. Landau Oren Weimann

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Local Alignment of RNA Sequences with Arbitrary Scoring Schemes Rolf Backofen Danny Hermelin Gad M. LandauOren Weimann

  2. C G C G G C C A U U A C G A G U A G U C G A C G U G U C A A A C G U U G G C RNA sequences

  3. RNA sequences C G C G G C C A U U A C G A G U A G U C G A C G U G U C A A A C G U U G G C

  4. RNA sequences C G C G A U G C C U A C G A G U A G U C G A C G U G U C A A A C G U U G G C

  5. Alignment of Strings S1 = U C A C C G __ A __ G S2 = U C G C G G U A U G Global Alignment:

  6. Alignment of RNA sequences A AG GC C CUG AU A U AG AC CGUU

  7. Alignment of RNA sequences A A G GC C C U G AU U A G A C C G UU

  8. Alignment of RNA sequences A A G GC C C U G AU U A G A C C G UU RNA Global Alignment via tree edit distance: [SZ 1989] Theorem: All these algorithms compute the edit distance between any two arcs provided we match these arcs. [K 1998] n [DMRW 2006] m

  9. The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

  10. The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2.

  11. The Alignment graph U C A C C G A G U C G C G G U A U G

  12. The Alignment graph U C A C C G A G U C G C G G U A U G

  13. The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between all paths in the alignment graph and all alignments of substrings of R1 and R2 in which all arcs are deleted.

  14. The Alignment graph U C A C C G A G U C G C G G U A U G

  15. The Alignment graph U C A C C G A G U C G C G G U A U G Theorem: There is a one to one correspondence between HEAVIEST paths in the alignment graph and OPTIMAL alignments of substrings of R1 and R2.

  16. The Local Alignment algorithms • We use the alignment graph to compute the local similarity between two RNA sequences according to two well known metrics: • Smith-Waterman – the highest scoring alignment between any pair of substrings of the input RNAs. • It’s normalized version.

  17. Standard Local Similarity (Smith-Waterman) U C A C C G A G • The score is computed via dynamic program: Score(i,j) = max U C G C G G U A U G Score(i’,j’) + Weight of the incoming edge from (i’,j’), 0 Time complexity: O(mn) + one run of a global algorithm = n m

  18. Normalized Local Similarity • The weakness of Smith Waterman approach [AP 2001]: • Solution: look for the substrings (with their arcs) that maximize: and some given value.

  19. Normalized Local Similarity U C A C C G A G U • Again, dynamic program: C G Define Length(k,i,j) to be the length of the shortest path that ends at vertex (i,j) and has weight equal to k. C G G U • The best k/Length(k,i,j) over all i,j,k is the normalized score. A U G

  20. w j’-j i’-i Normalized Local Similarity • Again, dynamic program: Length(k-w,i’,j’) Define Length(k,i,j) to be the length of the shortest path that ends at vertex (i,j) and has weight equal to k. For every k,i,j compute Length(k,i,j) = min Length(k,i,j) Length(k-w,i’,j’) + (j’-j+i’-i) | where w = weight of the incoming edge from (i’,j’) Time complexity: + one run of a global algorithm = n m

  21. Open Problems U C A C C G A G • Arc deletion: • Improve global tree edit distance U C G C G G U A U G

  22. Muchas Gracias por la atencion

More Related