1 / 31

Algorithms for Two Versions of LCS

Algorithms for Two Versions of LCS. Problem. for Indeterminate Strings. Goal of this paper͙. • Study the classic LCS and the Constrained LCS (CLCS) problems for Indeterminate strings. • Present efficient algorithms to solve them. 5-9 Nov 2007. IWOCA 2007. 2. Longest Common Subsequence.

hayley
Download Presentation

Algorithms for Two Versions of LCS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Algorithms for Two Versions of LCS Problem for Indeterminate Strings

  2. Goal of this paper͙ • Study the classic LCS and the Constrained LCS(CLCS) problems for Indeterminate strings • Present efficient algorithms to solve them 5-9 Nov 2007 IWOCA 2007 2

  3. Longest Common Subsequence • Given two sequences: - X = CAAGCTAAGCTAC - Y = TCAAGTAGAAC • Common Subsequence: A Subseq common toboth X and Y. • LCS- A subseq having the highest length 5-9 Nov 2007 IWOCA 2007 3

  4. LCS-Example 1 2 3 4 5 6 7 8 9 10 11 X= C A A G C T A A G C T A common subseq: CCT Length = 3 Y= C C G T A T 1 2 3 4 5 6 5-9 Nov 2007 IWOCA 2007 4

  5. LCS-Example 1 2 3 4 5 6 7 8 9 10 11 12 X= C A A G C T A A G C G T Y= C C G T A T A Longest common subseq: CCTAT Length = 5 1 2 3 4 5 6 5-9 Nov 2007 IWOCA 2007 5

  6. LCS-Example 1 2 3 4 5 6 7 8 9 10 11 12 X= C A A G C T A A G C G T Y= C C G T A T A Longest common subseq: CCTAT Length = 5 Another LCS: CGTAT 1 2 3 4 5 6 Length = 5 5-9 Nov 2007 IWOCA 2007 6

  7. CLCS: A relatively New Variant 1 2 3 4 5 6 1 2 3 4 5 6 X= X= T C C A C A T C C A C A Y= Y= A C C A A G A C C A A G Z= A C Z= A C 5-9 Nov 2007 IWOCA 2007 7

  8. Different Setting͙ • We study LCS and CLCS for indeterminatestrings (i-strings) • We call the two problems ILCS and CILCSrespectively 5-9 Nov 2007 IWOCA 2007 8

  9. i-strings͙ • Let Σ= {A, C, G, T} • Then we can get 2^4 -1 = 15 non-empty setsof letters. • At each position of an i-string we have one ofthose sets. 5-9 Nov 2007 IWOCA 2007 9

  10. i-strings Σ A C G T A C G A C T A G T C G T C G A C A G A T C T C G A C G T 5-9 Nov 2007 IWOCA 2007 10

  11. i-strings 1 2 3 4 5 6 7 T T X= A C C A C C C A 5-9 Nov 2007 IWOCA 2007 11

  12. i-strings: Equality/Match 1 2 3 4 5 6 7 X[3] = Y[1]. WHY? T Because, X[3] пY[1] = A ≠ Ø T X= A C C A C C Y = X[1..3] C Y = X[3..5]Y = X[4..6] A C TA C Y= A T T A C C C A C A A Interestingly, X[1..3] ≠ X[3..5]!!! X[1..3] X[3..5] 5-9 Nov 2007 IWOCA 2007 12

  13. i-strings: Equality/Match 1 2 3 4 5 6 7 T T X[3] =d Y[1]. WHY? X= A C C A C C C Because, , X[3] п Y[1] = A ≠ Ø A Y =d X[1..3] C TA C Y =d X[3..5]Y =d X[4..6] Y= A 5-9 Nov 2007 IWOCA 2007 13

  14. ILCS 1 2 3 4 5 6 7 A X= B D D A A AF A C D Y= B A A A C D F 5-9 Nov 2007 IWOCA 2007 14

  15. CILCS 1 2 3 4 5 6 7 A X= B D D A A AF A C D Y= B A A A C D F B D D Z= 5-9 Nov 2007 IWOCA 2007 15

  16. CILCS 1 2 3 4 5 6 7 A X= B D D A A AF A C D Y= B A A A C D F B D D Y= 5-9 Nov 2007 IWOCA 2007 16

  17. Motivation͙ • Motivations for LCS and CLCS are well-known. • But, why indeterminate strings? • Indeterminate strings are ubiquitous inbiological motifs • And, both LCS and CLCS gets motivation frombioinformatics 5-9 Nov 2007 IWOCA 2007 17

  18. Naive Algorithms • Using the existing LCS and CLCS algorithms wecan solve ILCS and CILCS easily. 5-9 Nov 2007 IWOCA 2007 18

  19. Naive ICLS Algorithm • We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS: 5-9 Nov 2007 IWOCA 2007 19

  20. Naive ICLS Algorithm • We use the basic and well-known O(n^2) DPsolution (Wagner&Fischer) to LCS: =d 5-9 Nov 2007 IWOCA 2007 20

  21. Naive ILCS Algorithm… • We assume a sorted order among the lettersin the sets of the i-strings • Then, intersection can be done in O(|Σ|)time. • So total running time O(|Σ|n^2) 5-9 Nov 2007 IWOCA 2007 21

  22. Our Goal • Our goal is to get a better running time thanO(|Σ|n^2). 5-9 Nov 2007 IWOCA 2007 22

  23. Our Strategy • We want to facilitate an O(1) time evaluationfor =d i.e. indeterminate equality • To achieve that we do some preprocessing onthe input i-strings • Then we employ existing LCS algorithms 5-9 Nov 2007 IWOCA 2007 23

  24. Preprocessing 1 for ILCS • We compute the following table: • With the above table, the indeterminate equality can evaluated in O(1). 5-9 Nov 2007 IWOCA 2007 24

  25. Computation of Table Σ ≡ A C G T 1 2 3 4 1 0 1 1 1 T 0 0 1 0 2 X= A G C A 0 1 0 0 3 A 0 0 1 0 4 1 0 1 0 1 A C T 0 1 1 0 2 Y= C T A G 0 0 0 1 3 1 0 0 1 4 5-9 Nov 2007 IWOCA 2007 25

  26. Computation of Table 1 0 1 1 0 0 1 0 0 1 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 0 0 1 0 1 0 1 5-9 Nov 2007 IWOCA 2007 27

  27. Complete Algorithm • With Table I, we can evaluate =d in O(1). • So, the DP requires O(n^2)! • But how much to compute Table I? 5-9 Nov 2007 IWOCA 2007 29

  28. Thank You

More Related