Bit parallel algorithms for computing all th e runs in a string l.jpg
This presentation is the property of its rightful owner.
Sponsored Links
1 / 28

Bit-parallel algorithms for computing all th e runs in a string PowerPoint PPT Presentation


  • 98 Views
  • Uploaded on
  • Presentation posted in: General

Bit-parallel algorithms for computing all th e runs in a string. Kazunori Hirashima 1 , Hideo Bannai 1 , Wataru Matsubara 2 , Kazuhiko Kusano 2 , Akira Ishino 2 , Ayumi Shinohara 2. 1 Kyushu University, Japan 2 Tohoku University, Japan. Contents. Runs

Download Presentation

Bit-parallel algorithms for computing all th e runs in a string

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Bit parallel algorithms for computing all th e runs in a string l.jpg

Bit-parallel algorithms for computing all the runs in a string

Kazunori Hirashima1, Hideo Bannai1, Wataru Matsubara2, Kazuhiko Kusano2, Akira Ishino2, Ayumi Shinohara2

1Kyushu University, Japan

2Tohoku University, Japan


Contents l.jpg

Contents

  • Runs

  • Bit-parallel algorithms for counting runs

    • Counting prefix runs

    • Removing duplicate runs by position

    • Removing duplicates by Sieve

  • Computational Experiments

  • Conclusion


Slide3 l.jpg

Runs

  • runs: occurrence of a periodic factor

    • non extendable(maximal)

    • exponent at least two

    • primitive-rooted

  • example:

    w = abbabbaccbcbcbc

  • run(w) : number of runs in string w


Calculating run w l.jpg

Calculating run(w)

  • Linear time algorithm [Kolpakov&Kucherov ‘99]

    • requires LZ-factorization of string

  • We present 3 bit-parallelalgorithms to calculate run(w)

    • does not require complicated data structures

    • very efficient for short strings


Contents5 l.jpg

Contents

  • Runs

  • Algorithms

    • Counting prefix runs

    • Removing duplicate runs by position

    • Removing duplicates by Sieve

  • Computational Experiments

  • Discussion


Bit parallel algorithms for counting runs l.jpg

Bit-parallel algorithms for counting runs

For general alphabet:

  • Counting prefix runs

    For binary alphabet:

  • Removing duplicate runs by position

  • Removing duplicate runs by Sieve


Algorithm counting prefix runs l.jpg

Algorithm (counting prefix runs)

prefix repetition = a repetition that is also a prefix

prefix run = a run that is also a prefix

  • Idea

    For each suffix:

    • detect right maximal prefix repetitions of each period

    • count only repetitions with exponent at least 2

    • count only left maximal repetitions


Algorithm counting prefix runs8 l.jpg

Algorithm(counting prefix runs)

Detect right maximal prefix repetitions of each period

prefix run

prefix run

example:

w=aabaabaaaacaac

ActiveArea

w[1]=w[4],w[2]=w[4]


Algorithm counting prefix run l.jpg

Algorithm(counting prefix run)

Detect right maximal prefix repetitions of each period

pseudo code

example:

w=aabaabaaaacaac

nextChar=w[i];

bitmask=((occ[nextChar] >> (Length-i)) | (~0) << i);

alive=alive&bitmask;

・・・

Length - i

alive

alive

alive

alive


Algorithm counting prefix run10 l.jpg

Algorithm(counting prefix run)

pseudo code

  • Count only repetitions with exponent at least 2

nextChar=w[i];

bitmask=((occ[nextChar] >> (Length-i)) | (~0) << i);

prevAlive=alive;

alive=alive&bitmask;

IfprevAlive ^ alive & ActiveArea≠0 then count++;

example:

w=aabaabaaaacaac

Ifimod 2 = 1 then

activeArea := (activeArea << 1) | 1 ;

prevAlive ^ alive


Algorithm counting prefix run11 l.jpg

Algorithm(counting prefix run)

  • Count only left maximal repetitions

w[3:8] seems to be run, but it can extend left.

So w[3:8] isn’t a run.

example:

w=aabaabaaaacaac

w[2]≠w[2+1]

w[2]=w[2+2]

w[2]=w[2+3]


Algorithm binary strings l.jpg

Algorithm (binary strings)

Idea

  • detect maximal repetition for each period 1, 2 ..., |w|/2.

  • count only repetitions with exponent at least 2.

  • count only repetitions of minimum period


Algorithm efficient algorithm for binary string l.jpg

Algorithm(Efficient algorithm for binary string)

Detect maximal repetition for each period 1, 2 ..., |w|/2.

  • v= w ^ ((~w)>>p)

  • Examplep=3

w =

w XOR ~w

v=

maximal repetition of period p in w

 stretch of 1’s in v

P


Slide14 l.jpg

Algorithm (binary strings)

Delete repetitions with exponent less than 2.

This is too short to be a run of period p = 3.

  • v= w ^ ((~w)>>p)

  • Examplep=3

2=5-3

5

7

4=7-3

w =

w XOR ~w

v =

p=3

Stretch of 1’s must be at least length p=3.


Slide15 l.jpg

Algorithm(Efficient algorithm for binary string)

Delete repetitions with exponent less than 2.

s = v;

While (p>1)

s = s & (v>>p);

p--;

END

v=s;

This calculation shortens each stretch of 1’s by p-1

&

&

&

p - 1

2

1


Slide16 l.jpg

Algorithm(Efficient algorithm for binary string)

Delete repetitions with exponent less than 2.

  • Example

    • v = 00111111110010p=7

  • selfAND(v,p)

  • While p>1

    s = p>>1;

    v = v & (v>>s);

    p = p – s;

    END

O(p) →O(log p).

p

s


Slide17 l.jpg

Algorithm(Efficient algorithm for binary string)

  • Examplew=00110011111111,p=4

    • v=w^((~w)>>p)= 000011110001111

    • selfAND(v,p) = 000000010000001

run with minimum period 1

We need to remove duplicates.

  • 2 approaches to remove repetitions of non-minimum periods:

  • Removing duplicate by Position

  • Removing duplicate by Sieve


Algorithm removing duplicate by position l.jpg

Algorithm(Removing duplicate by Position)

For period =1 to length/2 do

  • v=(w^((~w)>>1))&(1length>>period) ;

  • x=SelfAND(v,period);

  • While x ≠ 0do

    begPos=lsb(x);

    y=x+(1<<begPos);

    x= x & y;

    y=y & (-y);

    y=y << ((period – 1) << 1);

    If (runEndsByBegPos[begPos] & y) = 0then

    count ++;

  • runEndsByBegPos[begPos] = runEndsByBegPos[begPos] | y;

  • End

  • End

    End

only count maximal repetitions with

different begin and end positions

2

4

w=

Begin position

End position 

w^((~w)>>2)=

w^((~w)>>4)=


Algorithm removing duplicate by sieve l.jpg

Algorithm(Removing duplicate by Sieve)

  • Example:w=11110101010

For period =1 to length/2 do

pvec[period]=w^((~w)>>1) ;

End

For period=1 to length/2 do

x=SelfAND(pvec[period],period);

count=count+oneRuns(x);

For p=2*period to length/2 do

x=x & (x >> period);

Ifx=0 then break

pvec[p] =pvec[p] ^ (x);

End

End

w^((~w)>>1)

w^((~w)>>2)

xor

w^((~w)>>3)

delete runs in larger periods

w^((~w)>>4)

・・・・

・・・・


Algorithm removing duplicate by sieve20 l.jpg

Algorithm(Removing duplicate by Sieve)

  • count=0;

    While (v≠ 0)

    v = v & ((v | (v – 1)) + 1);

    count++;

    END

  • Examplev=1001110011

    • v | (v – 1) = 100111011

    • v | (v – 1) + 1 = 100111100

    • v & ((v | (v – 1))+1) = 100111000

    • v | (v – 1) = 100111111

    • v & ((v | (v – 1))+1) = 100000000

    • v & ((v | (v – 1))+1) = 000000000

bit operations to count the number of stretches of 1’s


Contents21 l.jpg

Contents

  • Runs

  • Algorithms

    • Counting prefix runs

    • Removing duplicate runs by position

    • Removing duplicates by Sieve

  • Computational Experiments

  • Discussion


Computational experiments l.jpg

Computational Experiments

Calculate run(w) for all binary strings of

length n

  • CPU:3.2GHz dual core Xeon

  • GPU:Geforce 8800GT

  • Memory:18GB

  • OS:MacOSX 10.5 Leopard


Computational experiments23 l.jpg

Computational Experiments

GPU

Use the programming tool CUDA

Use the programming tool CUDA

count=0

For period =1 to length/2 do

pvec[period] = w ^ ((~w) >> 1) ;

End

For period=1 to length/2 do

x = SelfAND(pvec[period],period);

count = count + oneRuns(x);

For p = 2 * period to length/2 do

x=x & (x >> period);

If x=0 then break

pvec[p] = pvec[p]^(~x);

End

End

Multi

Processor

Stream

Processor

Multi

Processor

・・・


Computational experiments24 l.jpg

Computational Experiments

Running time (seconds)

for calculating run(w) for all binary strings of length n


Computational experiments25 l.jpg

Computational Experiments

The maximum number of runs functionρ(n)=max { run(w) : |w| = n }for binary strings calculated for n up to 47

Kolpakov & Kucherov’99

New!


Lower and upper bounds of n l.jpg

Lower and Upper bounds of ρ(n)

0

n

2n

3n

4n

5n

ρ(n)

1.6n

[Crochemore& Ilie ’08]

5n

[Rytter ’06]

cn[Kolpakov & Kucherov ’99]

3.44n

[Rytter ’07]

3.48n

[Puglisi etal. ’08]

0.927n

[Franeck & Simpson ’06]

0.90n

0.95n

1.00n

1.05n

0.944565n

[Matsubara et al ’08]

0.94457571235n

[Matsubara et al ’09]

[Simpson ’09]

1.029n[Crochemoreet al. ’08]


Computational experiments f n r number of binary strings of length n with r runs l.jpg

Computational Experiments

f(n, r) : number of binary strings of length n with r runs

n

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

f(n, 1)

2

6

14

18

18

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

f(n, 2)

0

0

2

14

38

66

98

138

170

210

242

282

314

354

386

426

458

498

530

570

602

f(n, 3)

0

0

0

0

8

38

102

202

376

596

880

1220

1622

2080

2598

3174

3808

4502

5252

6064

6930

f(n, 4)

0

0

0

0

0

4

34

130

306

682

1314

2296

3736

5686

8260

11562

15642

20626

26574

33590

41754

n f(n, 1) f(n, 2) f(n, 3) f(n, 4)

23 20 642 7860 51184

24 20 674 8842 61898

25 20 714 9890 74070

26 20 746 10988 87732

27 20 786 12154 103000

28 20 818 13368 119922

29 20 858 14652 138664

30 20 890 15982 159216

31 20 930 17384 181764

32 20 962 18830 206308

33 20 1002 20350 233012

34 20 1034 21912 261896

35 20 1074 23550 293138

36 20 1106 25228 326696

37 20 1146 26984 362804

38 20 1178 28778 401434

39 20 1218 30652 442762

40 20 1250 32562 486776

41 20 1290 34554 533702

42 20 1322 36580 583470

f(n, 2) = f(n – 2, 2) + 72 for n 9.

f(n, 3) = 2f(n – 2, 3)- f(n – 4, 3) + 234 for n  16.


Conclusion l.jpg

Conclusion

  • We presented 3 bit-parallel algorithms for efficiently computing all the runs in short strings.

    • O(n2) time if n = O(word size)

    • First algorithm

      can be used for strings with larger alphabet size at some cost

    • Two latter algorithms

      specialized for binary strings* and very efficient

      * We recently noticed that they can be adapted to handle larger alphabets

  • Calculated ρ (n) for binary strings of length up to n=47


  • Login