1 / 57

Searching for gapped palindromes Gregory Kucherov LIFL

locke
Download Presentation

Searching for gapped palindromes Gregory Kucherov LIFL

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. Searching for gapped palindromes Gregory Kucherov LIFL/CNRS/INRIA Lille, France joint work with Roman Kolpakov (Moscow University)

    2. 2 Palindromes (basic definitions) : reverse image of Example: : even palindrome : odd palindrome : gapped palindrome : arms, : spacer, : gap

    3. 3 Palindromes (basic definitions) : reverse image of Example: : even palindrome : odd palindrome : gapped palindrome : arms, : spacer, : gap Example: Before the talk I was so stressed that ate 3 desserts at once

    4. 4 Computing (odd/even) palindromes: fundamental string matching problem palindrome recognition on Turing machines [Slisenko 73, Galil 78, Biedl et al 03] ... and on parallel models of computation [Apostolico&Breslauer&Galil 94, ...] palindrome recognition is considered in the seminal Knuth-Morris-Pratt paper (1977) Manacher (1975) proposed a beautiful algorithm for computing (a compact representation of) all palindromes in time O(n)

    5. 5 Maximal gapped palindromes Example:

    6. 6 Computing gapped palindromes applications to DNA sequences (will talk later)

    7. 7 Computing gapped palindromes applications to DNA sequences (will talk later) some related results: computing maximal repeats in time [cf Gusfield’s book]

    8. 8 Computing gapped palindromes applications to DNA sequences (will talk later) some related results: computing maximal repeats in time [cf Gusfield’s book] gapped palindromes with fixed gap can be easily computed in time with LCA queries on suffix tree

    9. 9 Computing gapped palindromes applications to DNA sequences (will talk later) some related results: computing maximal repeats in time [cf Gusfield’s book] gapped palindromes with fixed gap can be easily computed in time with LCA queries on suffix tree computing all repeats with fixed gap can be done in time [Kolpakov,Kucherov, SPIRE 00]

    10. 10 Computing gapped palindromes applications to DNA sequences (will talk later) some related results: computing maximal repeats in time [cf Gusfield’s book] gapped palindromes with fixed gap can be easily computed in time with LCA queries on suffix tree computing all repeats with fixed gap can be done in time [Kolpakov,Kucherov, SPIRE 00] computing repeats with . can be done in time [Brodal et al, CPM 99]

    11. 11 Two classes of gapped palindromes Long-armed palindromes Length-constrained palindromes for pre-defined constants

    12. 12 Problems considered here Compute, in a given word, all maximal palindromes that are (i) long-armed (ii) length-constrained

    13. 13 Problems considered here Compute, in a given word, all maximal palindromes that are (i) long-armed (ii) length-constrained

    14. Main ideas of the algorithms Computing long-armed palindromes

    15. 15 Computing long-armed palindromes Similar to computing periodicities (squares, runs, ...), the algorithm is based on two basic techniques : extension functions Lempel-Ziv factorization

    16. 16 Extension function: simplest definition all values can be computed in time [Main&Lorentz 84] a refined algorithm is presented in [K&K, Lothaire 05]

    17. 17 Extension function: variants

    18. 18 Extension function: variants

    19. 19 Using extension functions to compute periodicities (squares) Lemma: There exists a square of period iff

    20. 20 Using extension functions to compute periodicities (squares) Example:

    21. 21 Using extension functions to compute periodicities (squares) This implies that one can compute a compact representation of all squares (maximal periodicieis) in time one can compute all squares in time [Crochemore 81, Main&Lorentz 84] one can test the square-freeness in time

    22. 22 Using extension functions to compute palindromes

    23. 23 Using extension functions to compute palindromes

    24. 24 Using extension functions to compute palindromes

    25. 25 Using extension functions to compute palindromes

    26. 26 Using extension functions to compute palindromes There exists a long-armed palindrome with an arm of length iff

    27. 27 Using extension functions to compute palindromes There exists a long-armed palindrome with an arm of length iff

    28. 28 Using extension functions to compute palindromes

    29. 29 Using extension functions to compute palindromes

    30. 30 Using extension functions to compute palindromes

    31. 31 Using extension functions to compute palindromes There exists a corresponding long-armed palindrome iff

    32. 32 s-factorization (Lempel-Ziv factorization) , where : if letter which immediately follows does not occur in , then otherwise is the longest subword occurring at least twice in Example: s-factorization (Lempel-Ziv factorization) can be computed in linear time using suffix tree or DAWG

    33. 33 Why s-factorization is useful here

    34. 34 Why s-factorization is useful here

    35. 35 Why s-factorization is useful here lemma of [Main 89]

    36. 36 Computing (a compact representation of) all squares in linear time compute the s-factorization of (in ) for each factor compute all maximal periodicities ending inside and crossing the border between and (in ) recover all maximal periodicities occurring inside from a left copy of (in ) Important: the number of maximal periodicities is while the number of squares can be

    37. 37 Using extension functions + s-factorization to compute periodicities This implies that one can compute a compact representation of all squares (maximal periodicities) in time [Kolpakov,Kucherov 99] one can compute all squares (but also cubes, ...) in time one can test the square-freeness in time [Crochemore 83, Main&Lorentz 85]

    38. 38 Reversed Lempel-Ziv factorization , where : if letter which immediately follows does not occur in , then otherwise is the longest subword following which occurs in Example: reversed Lempel-Ziv factorization can be computed in linear time using Weiner’s algorithm to construct the suffix tree in the right-to-left fashion

    39. 39 Why reversed LZ-factorization is useful here

    40. 40 Why reversed LZ-factorization is useful here

    41. 41 Why reversed LZ-factorization is useful here

    42. 42 Why reversed LZ-factorization is useful here all those palindromes can be found in time

    43. 43 Why reversed LZ-factorization is useful here

    44. 44 right arm starts inside Why reversed LZ-factorization is useful here

    45. 45 right arm starts inside the span is bounded by Why reversed LZ-factorization is useful here

    46. 46 right arm starts inside the span is bounded by all those palindromes can be found in time Why reversed LZ-factorization is useful here

    47. 47 Computing all long-armed palindromes in time compute the reversed Lempel-Ziv factorization of for each factor compute all palindromes ending inside and crossing the border between and (in time ) recover all palindromes occurring inside from an inverse copy of (in overall time )

    48. Computing length-constrained palindromes

    49. 49 Computing length-constrained palindromes For pre-defined constants a palindrome is length-constrained if it verifies

    50. 50 Computing length-constrained palindromes For pre-defined constants a palindrome is length-constrained if it verifies

    51. 51 Algorithm: first step for each position , consider “o-positions” on all o-positions, define equivalence relation annotate all o-positions with its equivalence class use suffix array for . Takes time

    52. 52 Algorithm: second step Goal: find all pairs of positions s.t. (arm length constraint) (gap length constraint) (maximality condition)

    53. 53 Algorithm: second step Goal: find all pairs of positions s.t. (arm length constraint) (gap length constraint) (maximality condition) For each such pair, make an longest extension query to obtain the resulting palindrome. Use suffix array [Kärkkäinen&Sanders, 2003] or LCS query on suffix tree [Gusfield].

    54. 54 Algorithm: second step Goal: find all pairs of positions s.t. (arm length constraint) (gap length constraint) (maximality condition) For each such pair, make an longest extension query to obtain the resulting palindrome. Use suffix array [Kärkkäinen&Sanders, 2003] or LCS query on suffix tree [Gusfield]. All such pairs are found by an on-line traversal algorithm maintaining a position list for each equivalence class (details left out)

    55. 55 Length-constrained palindromes: summary All maximal length-constrained palindromes can be found in time

    56. 56 Extensions to biological palindromes alphabet (or for RNA) : reversal + complementarity ( , ) Example: all definitions extend trivially any character-comparison-based algorithm extends to biological palindromes (just check complementarity instead of equality) Algo 1: extension of reversed LZ-factorization: easy Algo 2: extension of the 1st step and longest extension queries: straightforward

    57. 57 Generalization Generalized long-armed palindromes verifying can be found in time

    58. THAT’S IT

More Related