1 / 40

Approximate On-line Palindrome Recognition, and Applications

Approximate On-line Palindrome Recognition, and Applications. Amihood Amir Benny Porat. Moskva River. Confluence of 4 Streams. Approximate Matching. Palindrome Recognition. CPM 2014. Online Algorithms. Interchange Matching. Palindrome Recognition.

zanta
Download Presentation

Approximate On-line Palindrome Recognition, and Applications

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate On-line Palindrome Recognition, and Applications Amihood Amir Benny Porat

  2. Moskva River

  3. Confluence of 4 Streams Approximate Matching Palindrome Recognition CPM 2014 Online Algorithms Interchange Matching

  4. Palindrome Recognition - Voz'mi-ka slovo ropot, - govoril Cincinnatu ego shurin, ostriak, -- I prochti obratno. A? Smeshno poluchaetsia? Vladimir Nabokov, Invitation to a Beheading (1) "Take the word ropot[murmur]," Cincinnatus' brother-in-law, the wit, was saying to him, "and read it backwards. Eh? Comes out funny, doesn't it?" [--› topor: the axe] A palindrome is a string that is the same whether read from right to left or from left to right: Examples:доход A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal-Panama!

  5. Palindrome Example Ibn Ezra: Medieval Jewish philosopher, poet, Biblical commentator, and mathematician. Was asked:"אבי אל חי שמך למה מלך משיח לא יבא" [ My Father, the Living God, why does the king messiah not arrive?] His response: "דעו מאביכם כי לא בוש אבוש, שוב אשוב אליכם כי בא מועד" [ Know you from your Father that I will not be delayed. I will return to you when the time will come ]

  6. Palindromes in Computer Science Great programming exercise in CS 101. Example of a problem that can be solved by a RAM in linear time, but not by a 1-tape Turing machine. (Can be done in linear time by a 2-tape TM)

  7. Palindrome Concatenation We may be interested in finding out whether a string is a concatenation of palindromes of length > 1. Example:ABCCBABBCCBCAACB Why would we be interested in such a funny problem? – we’ll soon see Exercise: Do this in linear time… ABCCBABBCCBCAACB

  8. Stream 2 - Approximations As in exact matching, there may be errors. Find the minimum number of errors that, if fixed, will give a string that is a concatenation of palindromes of length > 1 Example:ABCCBCBBCCBCABCB For Hamming distance: A-Porat [ISAAC 13]: Algorithm of time O(n2) ABCCBABBCCBCAACB

  9. Stream 3 - Reversals Why is this funny problem interesting? Sorting by reversals: In the evolutionary process a substring may “detach” and “reconnect” in reverse: ABCA BCDAABC BAD ABCA BCDAABC CBAADCB BAD

  10. Sorting by Reversals What is the minimum number of reversals that, when applied to string A, result in string B? History: Introduced: Bafna & Pevzner [95] NP-hard: Carpara [97] Approximations: Christie [98] Berman, Hannenhalli, Karpinski [02] Hartman [03]

  11. Sorting by Reversals – Polynomial time Relaxations • Signed reversals: Hannenhalli & Pevzner [99] • Kaplan, Shamir, Tarjan [00] • Tannier & Sagot [04] • . . . • Disjointness:Swap Matching Muthu [96] • Two constraints: • The length of the reversed substring is limited to 2. • All swaps are disjoint.

  12. Pattern Matching with Disjoint Reversals S1: S2: RD(S1,S2) = 2 • Reversal Distance (RD): • The RD between s1 and s2 is the minimum number k, such that there exist s2’ , where HAM(s1,s2’) =k, and s1 reversal match s2.

  13. Connection between Reversal Matching and Palindrome Matching S1: S2: A C D D C A B A A B E A D B B D A E Interleave Strings:

  14. On-line Input Suppose that we get the input a byte at a time: For the palindrome problem: A C D D C A B A A B E A D B B D A E A A A

  15. On-line Input Suppose that we get the input a byte at a time: For the reversal problem: AC DD CA BA AB EA DB BD AE A A A

  16. Main Idea – Palindrome Fingerprint The Rabin Karp Fingerprint Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p) s0,s1,s2,…sm-1 ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p) The Reversal Fingerprint If rm+1ΦR(S) = Φ(S) => S is a palindrome. w.h.p.

  17. Palindrome Fingerprint If rm+1ΦR(S) = Φ(S) => S is a palindrome. Example: S = A B C B A r6ΦR(S)= r6 (1/r A + 1/r2 B + 1/r3 C + 1/r4 B + 1/r5 A) = r5 A + r4 B + r3 C + r2 B + r A = Φ(S) Φ(S)=r1s0+ r2s1+… rmsm-1 mod (p) ΦR(S)=r-1s0+ r-2s1+… r-msm-1 mod (p)

  18. Simple Online Algorithm for Finding a Palindrome in a Text t1,t2,t3, … ti,ti+1,ti+2 ,…ti+m, ti+m+1 , … tn Φ=r1ti+ r2ti+1+… rmti+m mod (p) Ifrm+1ΦR =Φ=> there is a palindrome starting in the i-th position. ΦR=r-1ti+ r-2ti+1+… r-mti+m mod (p) If not, then for the next position: Note: This algorithm finds online whether the prefix of a text is a permutation. For finding online whether the text is a concatenation of permutations, assume even-length permutations, otherwise, every text is a concatenation of length-1 permutations. Φ=Φ+ rm+1ti+m+1 mod (p) ΦR=ΦR + r-(m+1)ti+m+1 mod (p)

  19. Palindrome with mismatches Start with 1 mismatch case.

  20. 1-Mismatch S= s0,s1,s2, … sm-1 Choosel prime numbers q1,…,ql< m such that

  21. 1-Mismatch S= s0,s1,s2, … sm-1 S2,0= s0,s2,s4… mod 2 S2,1= s1s3,s5… Examples:q1=2, q2=3 S3,0= s0,s3,s6… S3,1= mod 3 s1,s4,s7… For each qi construct qisubsequences of S as follows: subsequence Sqi,j is all elements of S whose index is j mod qi. S3,2= s2,s5,s8…

  22. Example s0,s1,s2, s3,s4,s5 S= s0,s2,s4 S2,0= mod 2 s1s3,s5 S2,1= s0,s3 S3,0= s1,s4 S3,1= mod 3 s2,s5 S3,2=

  23. 1-Mismatch • We need to compare: • We prove that in the partitions strings: s0 , s1, s2, … sm-2 ,sm-1 sm-1, sm-2, sm-3… s1 , s0 Sq,j= SRq,(m-1-j)mod q

  24. Example s0,s1,s2,s3,s4,s5 S= s5,s4,s3,s2,s1,s0 SR= s0,s2,s4 S2,0= s0,s3 S3,0= s1s3,s5 S2,1= s0,s2,s4 S2,0= s5,s2 SR3,2= s0,s3 s5s3,s1 S3,0= SR2,1= s1,s4 S3,1= s1,s4 S3,1= s4,s1 SR3,1= s2,s5 S3,2=

  25. Exact Matching Lemma: S=SR Sq,j = SRq,(m-1-j) modq for all q and all 0 ≤ j ≤ q.

  26. 1-Mismatch Lemma: There is exactly one mismatch There is exactly one subpattern in each group that does not match. C.R.T

  27. Chinese Remainder Theorem Let n and m two positive integers. In our case: if two different indices, i and j, have an error, and only one subsequence is erroneous, since the product of all q’s > m, it means that i=j.

  28. Complexity There exists a constant c such that, for any x<m, there are at least x/log m prime numbers between x and cx. Therefore, choose prime numbers between log m and c log m.

  29. Complexity For each qi we compute 2qi different fingerprints: Overall space: Each character participates in exactly two fingerprints (the regular and the reverse). Overall time:

  30. Online All fingerprint calculations can be done online We know the m at every input character, to compute the comparisons. Conclude: Our algorithm is online.

  31. k-Mismatches Use Group testing…

  32. k-Mismatches Group Testing • Given nitems with some positive ones, identify all positive ones by a small number of tests. • Each test is on a subset of items. • Test outcome is positive iff there is a positive item in the subset.

  33. k-Mismatch • Group: partition of the text. • Test: distinguish between: (using the 1-mismatch algorithm) • match • 1-mismatch • more then 1-mismatch

  34. k-Mismatches S= s0,s1,s2, … sm-1 Each Sq,j is a group in our group testing S2,0= s0,s2,s4… mod 2 S2,1= s1s3,s5… S3,0= s0,s3,s6… Similar to the 1-mismatch algorithm just with more prime numbers… S3,1= mod 3 s1,s4,s7… S3,2= s2,s5,s8…

  35. Our tests • We define The reversal pair of Sq,j to be SRq,(m-1-j)mod q • Each partition is “tested against” its reversal pair.

  36. Correctness s0,s1,s2, … sj …. sm-1 i2 i9 i5 i7 i For any group of k character i1,i2,..ik There exists a partition where sj appears alone C.R.T

  37. Correctness s0,s1,s2, … sj …. sm-1 i2 i9 i5 i7 i If sj invokes a mismatch we will catch it.

  38. Complexity • Overall space: • Overall time:

  39. Approximate Reversal Distance Using the palindrome up to k-mismatches algorithm, can be solved in time, and space.

  40. спасибо

More Related