2 dimensional parameterized matching
This presentation is the property of its rightful owner.
Sponsored Links
1 / 45

2 Dimensional Parameterized Matching PowerPoint PPT Presentation


  • 166 Views
  • Uploaded on
  • Presentation posted in: General

2 Dimensional Parameterized Matching. Carmit Hazay Moshe Lewenstein Dekel Tsur. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. CPM 2005. Parameterized Matching.

Download Presentation

2 Dimensional Parameterized Matching

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


2 dimensional parameterized matching

2 Dimensional Parameterized Matching

Carmit Hazay

Moshe Lewenstein

Dekel Tsur


Cpm 2005

CPM 2005


Cpm 20051

CPM 2005


Cpm 20052

CPM 2005


Cpm 20053

CPM 2005


Cpm 20054

CPM 2005


Cpm 20055

CPM 2005


Cpm 20056

CPM 2005


Cpm 20057

CPM 2005


Cpm 20058

CPM 2005


Cpm 20059

CPM 2005


Cpm 200510

CPM 2005


Cpm 200511

CPM 2005


Parameterized matching

Parameterized Matching

  • Input: two strings s and t, |s|=|t|, over alphabets ∑s and ∑t.

  • s parameterize matches t: if bijection : ∑s ∑t , such that (s) = t.

Example:

a

a

b

b

b

s

(a)=x

x

x

y

y

y

t

(b)=y


Parameterized matching1

Parameterized Matching

  • Input: Two strings T, P; |T|=n, |P|=m.

  • Output: All text locations i,

    such that (P)=Ti …Ti+m-1.


2d parameterized matching

2D Parameterized Matching

  • Input: Text T and pattern P; |T|=n*n, |P|=m*m.

  • Output: All text locations (i,j),

    such that (P)=Ti,j …Ti+m-1,j+m-1.

  • Example-

T

a b c

a a b

b b b

(x)=a

(y)=b

(z)=c

P

x y z

x x y

y y y


2 dimensional parameterized matching

2D Parameterized Matching

pattern

‘A horse is a horse,

it ain’t make a difference

what color it is’ John Wayne


Parameterized matching history

Parameterized Matching History

  • Introduced by Brenda Baker [Baker93].

  • Others: [AFM94], [Bak95], [Bak97].

  • Two Dimensions: [AACLP03][This work].

  • Used in scaled matching [ABL99].

  • Periodicity of parameterized matching [ApostolicoGiancarlo].

  • Approximate parameterized matching [AEL], [HLS04].


Na ve algorithm

Naïve Algorithm

For every location (i,j) of text

Check if P parameterized matches at (i,j):

1. For each a  alphabet of P, check if all

a’s of P align with same character

2. For each b  alphabet of T, check if all

b’s of T align with same character


Na ve algorithm1

Naïve Algorithm

Time Analysis: If done properly – O(n2m2)


Mismatch pairs

Mismatch pairs

  • Pair of locations such that the characters disagree parameterized.

  • Example,

a a b a a a

x x y x z y


1d encoding

1D Encoding

  • Encode every text location by its predecessor location.

First a to its left

T

a b a d d a b d b c b d a a b d a a a a b b b

1 3 6 13 14 15 16 17 18

Encoded T

0 1 3 6 13 14 15 16 17


1d encoding1

1D Encoding

  • Two p-matching strings have the same encoded texts.

S

a b b c b a a c b b c b a

Encoded S

0 0 2 0 3 1 6 4 5 9 8 10 7

T

x y y z y x x z y y z y x

Encoded T

0 0 2 0 3 1 6 4 5 9 8 10 7


1d encoding2

1D Encoding

  • Hence, in order to check whether two strings p-match, enough to compare their encoded strings.

  • Reduction to exact matching problem.

S

a b b c b b a c b b c b a

Encoded S

0 0 2 0 3 5 6 4 5 9 8 10 7

T

x y y z y x x z y y z y x

Encoded T

0 0 2 0 3 1 6 4 5 9 8 10 7


2d mismatch pairs

2D Mismatch Pairs

  • Same as 1D mismatch pairs, but with 2D strings.

    Example:

a b a

b a b

b a b

x y x

y y y

y y y


2d encoding

2D Encoding

  • First idea,

    Encode the linearization of text and pattern.

As you will all see this box frames the text that it

contains. That is 2D textall in this little box.

As you will see this box

frames the texts that it

Contains. That is 2D text

All in this little box.


2d encoding1

As you will see this box

frames the texts that it

Contains. That is 2D text

All in this little box.

2D Encoding

  • First idea,

    Encode the linearization of text and pattern.

As you will see this box

frames the texts that it

Contains. That is 2D text

All in this little box.


2d encoding2

2D Encoding

  • First idea,

    Encode the linearization of text and pattern.

    Overflow problem!!

a

b

Different character than b

b

a

b


2d encoding3

2D Encoding

  • Second idea, use strips.

  • Strip – Substring of T of size n*m.

  • i-th strip of T, is n*m substring T[1:n,i:i+m-1].

i


Second solution

Second Solution

  • For Pattern P compute predecessors on its linearization.

  • For each strip of T, compute predecessors on its linearization.

  • Do Pattern Matching for each strip.

  • Time – O(n2m).

    Can we do better?


A faster solution

A Faster Solution

  • Set into Duel-and-Sweep setting

  • Needs special care for Duel, Sweep

  • Especially difficult: Pattern preprocessing

  • Desired Time: O(n2 + poly(m))

  • We Achieve: O(n2 + m2.5polylog m)


Remember

Remember…

  • Observation:

    T p-matches P

    Every text location and its predecessor are not a mismatch pair

    +

    # of distinct characters in P and T equal


Algorithm outline

Algorithm Outline

  • Duel and sweep paradigm

    • Find candidates - Dueling

    • Divide candidates by strips

    • Update predecessors of every new strip

    • Check new predecessors - Sweep

    • Assume pattern witness table given.


Witness

Witness

  • Witness – Mismatch pair between P and its alignment to location (a,b).

+a

+b


Set candidates

Set Candidates

  • Using duel-

    Every two text locations that has a witness within their alignment can eliminate each other.

  • Apply algorithm [ABF94] and return list of candidates.

  • Time – O(n2).


Sweep technique

Sweep Technique

  • Observation,

    • All candidates agree with each other.

  • Hence,

    • Mismatch pair eliminates all candidates containing it.

  • Therefore,

    • For every predecessor, enough to find one candidate that contains it.


Sweep technique1

Sweep Technique

  • How to find?

  • Create new 2m*2m array A such that,

    A[i,j] = largest row among candidates that starts at column j and overlap with row i.

x


Sweep technique2

Sweep Technique

  • For every predecessor (i,j), (x,y), use range minima query to find highest candidate contain predecessor.


Sweep technique3

Sweep Technique

  • In case of a mismatch pair,

    eliminate all candidates containing it.

  • How?

    Use mismatch vector.

    Every mismatch pair translate into range.

    For new strips, delete old mistakes and add new.

All candidates within this

range are eliminated.


Sweep technique4

Sweep Technique

  • Reminder-

    T p-matches P

    Every text location and its predecessor are not mismatch pair

    +

    # of distinct characters in P and T equal

  • Left to do?

  • Count distinct characters for every candidates.

  • Use algorithm of Amir and Cole, time O(m2).


Overview

Overview

Checking all predecessors takes linear time.

Total time O(n2).


Pattern preprocessing

Pattern Preprocessing

  • Witness – Mismatch pair between P and its alignment to location (a,b).

+a

+b


Pattern preprocessing1

Pattern Preprocessing

  • Find witness table for P in time O(m2.5 * polylogm).

  • For every pattern location (i,j), create list of size O( ) pointers.

  • Pointer i is predecessor in lines above (i,j).

  • Reduce to exact matching with don’t cares.


Pattern preprocessing2

Pattern Preprocessing

  • End cases, multiple cases.

Less than

B1

A1

A2

B2

B3

A3

B4

A4


Open questions

Open Questions

  • Can the algorithm time complexity be reduced into O(n2+m2)?


  • Login