E N D
1. Searching for gapped palindromesGregory KucherovLIFL/CNRS/INRIALille, Francejoint work with Roman Kolpakov (Moscow University)
2. 2 Palindromes (basic definitions) : reverse image of
Example:
: even palindrome
: odd palindrome
: gapped palindrome
: arms, : spacer, : gap
3. 3 Palindromes (basic definitions) : reverse image of
Example:
: even palindrome
: odd palindrome
: gapped palindrome
: arms, : spacer, : gap
Example:
Before the talk I was so stressed that ate 3 desserts at once
4. 4 Computing (odd/even) palindromes: fundamental string matching problem palindrome recognition on Turing machines [Slisenko 73, Galil 78, Biedl et al 03]
... and on parallel models of computation [Apostolico&Breslauer&Galil 94, ...]
palindrome recognition is considered in the seminal Knuth-Morris-Pratt paper (1977)
Manacher (1975) proposed a beautiful algorithm for computing (a compact representation of) all palindromes in time O(n)
5. 5 Maximal gapped palindromes Example:
6. 6 Computing gapped palindromes applications to DNA sequences (will talk later)
7. 7 Computing gapped palindromes applications to DNA sequences (will talk later)
some related results:
computing maximal repeats in time [cf Gusfield’s book]
8. 8 Computing gapped palindromes applications to DNA sequences (will talk later)
some related results:
computing maximal repeats in time [cf Gusfield’s book]
gapped palindromes with fixed gap can be easily computed in time with LCA queries on suffix tree
9. 9 Computing gapped palindromes applications to DNA sequences (will talk later)
some related results:
computing maximal repeats in time [cf Gusfield’s book]
gapped palindromes with fixed gap can be easily computed in time with LCA queries on suffix tree
computing all repeats with fixed gap can be done in time [Kolpakov,Kucherov, SPIRE 00]
10. 10 Computing gapped palindromes applications to DNA sequences (will talk later)
some related results:
computing maximal repeats in time [cf Gusfield’s book]
gapped palindromes with fixed gap can be easily computed in time with LCA queries on suffix tree
computing all repeats with fixed gap can be done in time [Kolpakov,Kucherov, SPIRE 00]
computing repeats with . can be done in time [Brodal et al, CPM 99]
11. 11 Two classes of gapped palindromes Long-armed palindromes
Length-constrained palindromes
for pre-defined constants
12. 12 Problems considered here Compute, in a given word, all maximal palindromes that are
(i) long-armed
(ii) length-constrained
13. 13 Problems considered here Compute, in a given word, all maximal palindromes that are
(i) long-armed
(ii) length-constrained
14. Main ideas of the algorithms Computing long-armed palindromes
15. 15 Computing long-armed palindromes Similar to computing periodicities (squares, runs, ...), the algorithm is based on two basic techniques :
extension functions
Lempel-Ziv factorization
16. 16 Extension function: simplest definition all values can be computed in time
[Main&Lorentz 84]
a refined algorithm is presented in [K&K, Lothaire 05]
17. 17 Extension function: variants
18. 18 Extension function: variants
19. 19 Using extension functions to compute periodicities (squares) Lemma: There exists a square of period iff
20. 20 Using extension functions to compute periodicities (squares) Example:
21. 21 Using extension functions to compute periodicities (squares) This implies that
one can compute a compact representation of all squares (maximal periodicieis) in time
one can compute all squares in time [Crochemore 81, Main&Lorentz 84]
one can test the square-freeness in time
22. 22 Using extension functions to compute palindromes
23. 23 Using extension functions to compute palindromes
24. 24 Using extension functions to compute palindromes
25. 25 Using extension functions to compute palindromes
26. 26 Using extension functions to compute palindromes There exists a long-armed palindrome with an arm of length iff
27. 27 Using extension functions to compute palindromes There exists a long-armed palindrome with an arm of length iff
28. 28 Using extension functions to compute palindromes
29. 29 Using extension functions to compute palindromes
30. 30 Using extension functions to compute palindromes
31. 31 Using extension functions to compute palindromes There exists a corresponding long-armed palindrome iff
32. 32 s-factorization (Lempel-Ziv factorization) , where :
if letter which immediately follows does not occur in , then
otherwise is the longest subword occurring at least twice in
Example:
s-factorization (Lempel-Ziv factorization) can be computed in linear time using suffix tree or DAWG
33. 33 Why s-factorization is useful here
34. 34 Why s-factorization is useful here
35. 35 Why s-factorization is useful here lemma of [Main 89]
36. 36 Computing (a compact representation of) all squares in linear time compute the s-factorization of (in )
for each factor
compute all maximal periodicities ending inside and crossing the border between and (in )
recover all maximal periodicities occurring inside from a left copy of (in )
Important: the number of maximal periodicities is while the number of squares can be
37. 37 Using extension functions + s-factorization to compute periodicities This implies that
one can compute a compact representation of all squares (maximal periodicities) in time [Kolpakov,Kucherov 99]
one can compute all squares (but also cubes, ...) in time
one can test the square-freeness in time [Crochemore 83, Main&Lorentz 85]
38. 38 Reversed Lempel-Ziv factorization , where :
if letter which immediately follows does not occur in , then
otherwise is the longest subword following which occurs in
Example:
reversed Lempel-Ziv factorization can be computed in linear time using Weiner’s algorithm to construct the suffix tree in the right-to-left fashion
39. 39 Why reversed LZ-factorization is useful here
40. 40 Why reversed LZ-factorization is useful here
41. 41 Why reversed LZ-factorization is useful here
42. 42 Why reversed LZ-factorization is useful here all those palindromes can be found in time
43. 43 Why reversed LZ-factorization is useful here
44. 44 right arm starts inside Why reversed LZ-factorization is useful here
45. 45 right arm starts inside
the span is bounded by Why reversed LZ-factorization is useful here
46. 46 right arm starts inside
the span is bounded by
all those palindromes can be found in time Why reversed LZ-factorization is useful here
47. 47 Computing all long-armed palindromes in time compute the reversed Lempel-Ziv factorization of
for each factor
compute all palindromes ending inside and crossing the border between and (in time )
recover all palindromes occurring inside from an inverse copy of (in overall time )
48. Computing length-constrained palindromes
49. 49 Computing length-constrained palindromes For pre-defined constants
a palindrome is length-constrained if it verifies
50. 50 Computing length-constrained palindromes For pre-defined constants
a palindrome is length-constrained if it verifies
51. 51 Algorithm: first step for each position , consider “o-positions”
on all o-positions, define equivalence relation
annotate all o-positions with its equivalence class
use suffix array for . Takes time
52. 52 Algorithm: second step Goal: find all pairs of positions s.t.
(arm length constraint)
(gap length constraint)
(maximality condition)
53. 53 Algorithm: second step Goal: find all pairs of positions s.t.
(arm length constraint)
(gap length constraint)
(maximality condition)
For each such pair, make an longest extension query to obtain the resulting palindrome. Use suffix array [Kärkkäinen&Sanders, 2003] or LCS query on suffix tree [Gusfield].
54. 54 Algorithm: second step Goal: find all pairs of positions s.t.
(arm length constraint)
(gap length constraint)
(maximality condition)
For each such pair, make an longest extension query to obtain the resulting palindrome. Use suffix array [Kärkkäinen&Sanders, 2003] or LCS query on suffix tree [Gusfield].
All such pairs are found by an on-line traversal algorithm maintaining a position list for each equivalence class (details left out)
55. 55 Length-constrained palindromes: summary All maximal length-constrained palindromes can be found in time
56. 56 Extensions to biological palindromes alphabet (or for RNA)
: reversal + complementarity ( ,
)
Example:
all definitions extend trivially
any character-comparison-based algorithm extends to biological palindromes (just check complementarity instead of equality)
Algo 1: extension of reversed LZ-factorization: easy
Algo 2: extension of the 1st step and longest extension queries: straightforward
57. 57 Generalization Generalized long-armed palindromes verifying
can be found in time
58. THAT’S IT