資料結構
Download
1 / 36

資料結構 與 演算法 - PowerPoint PPT Presentation


  • 128 Views
  • Uploaded on

資料結構 與 演算法. 台大資工系 呂學一 http://www.csie.ntu.edu.tw /~hil/algo/. Today. String matching. The Problem. Input a string P –– the pattern a string S –– the text Output all the occurrences of P in S . Illustration. 1 2 3 4 5 6 7 8 9 0 1 2 3 S = t a t a t t a t a t a t a

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' 資料結構 與 演算法' - callia


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

資料結構與演算法

台大資工系

呂學一

http://www.csie.ntu.edu.tw/~hil/algo/


Today
Today

  • String matching


The problem
The Problem

  • Input

    • a string P –– the pattern

    • a string S –– the text

  • Output

    • all the occurrences of P in S.


Illustration
Illustration

1 2 3 4 5 6 7 8 9 0 1 2 3

  • S = t a t a t t a t a t a t a

  • P =t a t a

  • Output

    1

    6

    8

    10


Notation for strings
Notation for strings

  • S is a string

    • |S| = the length of S.

    • substring: S[i, j] = S[i] S[i + 1] … S[j].


A na ve algorithm
A naïve algorithm

  • Input: S and P.

  • Output: all occurrences of P in S.

    for i=1 to |S|

    if S[i,i+|P|-1] = P

    output i;


Illustration1
Illustration

1 2 3 4 5 6 7 8 9 0 1 2 3

  • S = t a t a t t a t a t a t a

  • P =t a t a

  • Output

    1

    6

    8

    10


Time complexity

Input: S and P.

Output: all occurrences of P in S.

for i=1 to |S|

if S[i, i+|P|-1] = P then

output i;

O(|S|2 |P|)

O(|S| |P|)

O(|P| log|S|)

O(|S| + |P|)

Time complexity?


Tightness
Tightness

  • O(|S| |P|) is tight.

    • Why?

    • S = 1 1 1 1 1 1 1 1 1 1 1

    • P = 1 1 1 1 1 1 1

  • O(|S|2 |P|) is not tight.

    • Why?


Time q s p

Time = Q(|S| |P|)


Can we do better than the na ve algorithm

Can we do better than the naïve algorithm?

Yes, if we are careful enough…


Dan gusfield s z value algorithm

Dan Gusfield’s Z value algorithm


The z values of a string s
The Z values of a string S

  • Z(i) of a string S is the largest integer d such that S[1, d] = S[i, i+d-1]

i+d-1

i

S

Note that d could be zero.


Clearly
Clearly, …

  • If Z(1), Z(2), …, Z(|S|) are the Z values of S, then

    • Z(1) = |S|;

    • Z(i) ≥ 0 for each i = 1, 2, …, |S|.


For example
For example, …

S = a a g c a a t a a a g c

Z = 12 1 0 0 2 1 0 2 4 1 0 0

a a a a a

a a a g c

a


Question

Question

How do we find all occurrences of P in S using Z values (of what)?


String matching with z values
String Matching with Z values

computing Z values of PS;

for i=1 to |S|

if Z(i+|P|)>=|P| then

output i;

i+|P|

i+|P|+d-1

P

S


Time complexity1
Time complexity?

computing Z values of PS;

for i=1 to |S|

if Z(i+n)>=|P| then

output i;

  • O(|S|) + time for computing the Z values of PS.


Computing z i naively
Computing Z(i) naively

Time complexity?

For i=1 to |S| {

let j = i;

let Z(i) = 0;

while (S[j]==S[j-i+1]){

Z(i)++;

j++;

}

}

O(|S|2)

Is it tight?

S = 000…000


Z i can be naively computed in using z i 1 character comparisons
Z(i) can be naively computed in using Z(i)+1 character comparisons

We need this observation later.

For i=1 to |S| {

let j = i;

let Z(i) = 0;

while (S[j]==S[j-i+1]){

Z(i)++;

j++;

}

}


Z values in linear time

Z comparisons values in linear time


Definitions
Definitions comparisons

  • 右護法(i): max{j + Z(j) – 1 | 1<j ≤ i}.

    • Abbreviated as 右(i).

  • 左護法(i): can be any index j with 1<j ≤ i and j + Z(j) – 1 = 右(i).

    • Abbreviated as 左(i).


Illustration2
Illustration comparisons

i – 1 +Z(i – 1) – 1

i+Z(i) – 1

i – 1

i

S


Illustration3
Illustration comparisons

左(i)

1

2

i

右(i)


Q i i
Q: comparisons右(i) ≥ i一定成立嗎?

  • 右(i) 有可能會比 i小,但是右(i)至少大於或等於i – 1。

    • 因為Z(i)有可能等於0,

    • 所以右(i)可能等於i +Z(i) – 1 = i– 1。


Z values
例如 comparisons:假設已知所有Z-values

1 1 1 1 1 1 1 1

1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7

S = a a b a a b c a x a a b a a b c y

Z = 1 0 3 1 0 0 1 0 7 1 0 3 1 0 0 0

1 1 1 1 1 1 1 1

i+Z(i)-1 = 2 2 6 5 5 6 8 86 1 1 5 4 4 5 6

1 1 1 1 1 1 1 1

右(i) = 2 2 6 6 6 6 8 8 6 6 6 6 6 6 6 6

1 1 1 1 1 1 1 1

左(i) = 2 2 4 4 4 4 8 8 0 0 0 0 0 0 0 0

可以很容易算出左右護法


Z values1
問題是 comparisons:想算的正是Z-values

  • Our strategy:

    • Computing Z(i), 右(i), 左(i) from

      • Some of Z(1), Z(2), …, Z(i – 1);

      • 右(i – 1);

      • 左(i – 1).


Case 1 i 1 i 1
Case 1: comparisons右(i-1) ≤ i-1.

  • 右(i-1) does not cover i.

    • (S[i]未能受到右護法的庇護)

  • Computing Z(i) using Z(i)+1 character comparisons

  • 左(i) = i.

  • 右(i) = i + Z(i) – 1.

  • Observation (need this later)

    Z(i) + 1 = 右(i) – i + 2 ≤右(i) – 右(i – 1) + 1.


可能的情況 comparisons

左(i - 1)

右(i-1)

1

2


Case 2 i 1 i z j i 1 i 1
Case 2: comparisons右(i-1) ≥ i & Z(j) < 右(i-1)-i+1.

左(i-1)

右(i-1)

i

i – 左(i-1)+1 = j

Z(j)


Z i z j i i 1 i i 1
Z comparisons(i) = Z(j), 左(i) = 左(i-1), 右(i) = 右(i-1).

左(i-1)

右(i-1)

i

i – 左(i-1)+1 = j

Z(j)


Case 3 i 1 i z j i 1 i 1
Case 3: comparisons右(i-1) ≥ i & Z(j) ≥ 右(i-1)-i+1.

左(i-1)

右(i-1)

i

i – 左(i-1)+1 = j

Z(j)


Finding z i by comparsions starting from i 1 1 why
Finding comparisonsZ(i) by comparsions starting from 右(i-1)+1. Why?

左(i-1)

右(i-1)

i

i – 左(i-1)+1 = j

Z(j)


Computing i and i
Computing comparisons左(i) and 右(i).

  • 右(i) = i + Z(i) -1.

  • 左(i) = i.

  • How many comparisons?

    • 右(i)-右(i-1)+1.

左(i-1)

右(i-1)

i

i – 左(i-1) + 1 = j

Z(j)


Number of comparisons
Number of comparisonscomparisons

  • Case 1:

    • Z(i)+1 = 右(i)-右(i-1)+1.

  • Case 2:

    • 0 = 右(i)-右(i-1).

  • Case 3:

    • 右(i)-右(i -1)+1


Overall number of comparisons

= comparisons

+

O

(|

S

|)

O

(

(|

S

|))

=

O

(|

S

|).

Overall number of comparisons

|

S

|

å

-

-

+

(

(

i

)

(

i

1

)

1

)

=

i

1


ad