Bit-parallel algorithms for computing all th e runs in a string

1 / 28

# Bit-parallel algorithms for computing all the runs in a string - PowerPoint PPT Presentation

Bit-parallel algorithms for computing all th e runs in a string. Kazunori Hirashima 1 , Hideo Bannai 1 , Wataru Matsubara 2 , Kazuhiko Kusano 2 , Akira Ishino 2 , Ayumi Shinohara 2. 1 Kyushu University, Japan 2 Tohoku University, Japan. Contents. Runs

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about 'Bit-parallel algorithms for computing all the runs in a string' - nash

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Bit-parallel algorithms for computing all the runs in a string

Kazunori Hirashima1, Hideo Bannai1, Wataru Matsubara2, Kazuhiko Kusano2, Akira Ishino2, Ayumi Shinohara2

1Kyushu University, Japan

2Tohoku University, Japan

Contents
• Runs
• Bit-parallel algorithms for counting runs
• Counting prefix runs
• Removing duplicate runs by position
• Removing duplicates by Sieve
• Computational Experiments
• Conclusion
Runs
• runs: occurrence of a periodic factor
• non extendable(maximal)
• exponent at least two
• primitive-rooted
• example:

w = abbabbaccbcbcbc

• run(w) : number of runs in string w
Calculating run(w)
• Linear time algorithm [Kolpakov&Kucherov ‘99]
• requires LZ-factorization of string
• We present 3 bit-parallelalgorithms to calculate run(w)
• does not require complicated data structures
• very efficient for short strings
Contents
• Runs
• Algorithms
• Counting prefix runs
• Removing duplicate runs by position
• Removing duplicates by Sieve
• Computational Experiments
• Discussion
Bit-parallel algorithms for counting runs

For general alphabet:

• Counting prefix runs

For binary alphabet:

• Removing duplicate runs by position
• Removing duplicate runs by Sieve
Algorithm (counting prefix runs)

prefix repetition = a repetition that is also a prefix

prefix run = a run that is also a prefix

• Idea

For each suffix:

• detect right maximal prefix repetitions of each period
• count only repetitions with exponent at least 2
• count only left maximal repetitions
Algorithm(counting prefix runs)

Detect right maximal prefix repetitions of each period

prefix run

prefix run

example:

w=aabaabaaaacaac

ActiveArea

w[1]=w[4],w[2]=w[4]

Algorithm(counting prefix run)

Detect right maximal prefix repetitions of each period

pseudo code

example:

w=aabaabaaaacaac

nextChar=w[i];

bitmask=((occ[nextChar] >> (Length-i)) | (~0) << i);

・・・

Length - i

alive

alive

alive

alive

Algorithm(counting prefix run)

pseudo code

• Count only repetitions with exponent at least 2

nextChar=w[i];

bitmask=((occ[nextChar] >> (Length-i)) | (~0) << i);

prevAlive=alive;

IfprevAlive ^ alive & ActiveArea≠0 then count++;

example:

w=aabaabaaaacaac

Ifimod 2 = 1 then

activeArea := (activeArea << 1) | 1 ;

prevAlive ^ alive

Algorithm(counting prefix run)
• Count only left maximal repetitions

w[3:8] seems to be run, but it can extend left.

So w[3:8] isn’t a run.

example:

w=aabaabaaaacaac

w[2]≠w[2+1]

w[2]=w[2+2]

w[2]=w[2+3]

Algorithm (binary strings)

Idea

• detect maximal repetition for each period 1, 2 ..., |w|/2.
• count only repetitions with exponent at least 2.
• count only repetitions of minimum period
Algorithm(Efficient algorithm for binary string)

Detect maximal repetition for each period 1, 2 ..., |w|/2.

• v= w ^ ((~w)>>p)
• Examplep=3

w =

w XOR ~w

v=

maximal repetition of period p in w

 stretch of 1’s in v

P

Algorithm (binary strings)

Delete repetitions with exponent less than 2.

This is too short to be a run of period p = 3.

• v= w ^ ((~w)>>p)
• Examplep=3

2=5-3

5

7

4=7-3

w =

w XOR ~w

v =

p=3

Stretch of 1’s must be at least length p=3.

Algorithm(Efficient algorithm for binary string)

Delete repetitions with exponent less than 2.

s = v;

While (p>1)

s = s & (v>>p);

p--;

END

v=s;

This calculation shortens each stretch of 1’s by p-1

&

&

&

p - 1

2

1

Algorithm(Efficient algorithm for binary string)

Delete repetitions with exponent less than 2.

• Example
• v = 00111111110010 p=7
• selfAND(v,p)
• While p>1

s = p>>1;

v = v & (v>>s);

p = p – s;

END

O(p) →O(log p).

p

s

Algorithm(Efficient algorithm for binary string)

• Examplew=00110011111111, p=4
• v=w^((~w)>>p) = 000011110001111
• selfAND(v,p) = 000000010000001

run with minimum period 1

We need to remove duplicates.

• 2 approaches to remove repetitions of non-minimum periods:
• Removing duplicate by Position
• Removing duplicate by Sieve
Algorithm(Removing duplicate by Position)

For period =1 to length/2 do

• v=(w^((~w)>>1))&(1length>>period) ;
• x=SelfAND(v,period);
• While x ≠ 0do

begPos=lsb(x);

y=x+(1<<begPos);

x= x & y;

y=y & (-y);

y=y << ((period – 1) << 1);

If (runEndsByBegPos[begPos] & y) = 0then

count ++;

• runEndsByBegPos[begPos] = runEndsByBegPos[begPos] | y;
• End
• End

End

only count maximal repetitions with

different begin and end positions

2

4

w=

Begin position

End position 

w^((~w)>>2)=

w^((~w)>>4)=

Algorithm(Removing duplicate by Sieve)
• Example: w=11110101010

For period =1 to length/2 do

pvec[period]=w^((~w)>>1) ;

End

For period=1 to length/2 do

x=SelfAND(pvec[period],period);

count=count+oneRuns(x);

For p=2*period to length/2 do

x=x & (x >> period);

Ifx=0 then break

pvec[p] =pvec[p] ^ (x);

End

End

w^((~w)>>1)

w^((~w)>>2)

xor

w^((~w)>>3)

delete runs in larger periods

w^((~w)>>4)

・・・・

・・・・

Algorithm(Removing duplicate by Sieve)
• count=0;

While (v≠ 0)

v = v & ((v | (v – 1)) + 1);

count++;

END

• Examplev=1001110011
• v | (v – 1) = 100111011
• v | (v – 1) + 1 = 100111100
• v & ((v | (v – 1))+1) = 100111000
• v | (v – 1) = 100111111
• v & ((v | (v – 1))+1) = 100000000
• v & ((v | (v – 1))+1) = 000000000

bit operations to count the number of stretches of 1’s

Contents
• Runs
• Algorithms
• Counting prefix runs
• Removing duplicate runs by position
• Removing duplicates by Sieve
• Computational Experiments
• Discussion
Computational Experiments

Calculate run(w) for all binary strings of

length n

• CPU :3.2GHz dual core Xeon
• GPU :Geforce 8800GT
• Memory :18GB
• OS :MacOSX 10.5 Leopard
Computational Experiments

GPU

Use the programming tool CUDA

Use the programming tool CUDA

count=0

For period =1 to length/2 do

pvec[period] = w ^ ((~w) >> 1) ;

End

For period=1 to length/2 do

x = SelfAND(pvec[period],period);

count = count + oneRuns(x);

For p = 2 * period to length/2 do

x=x & (x >> period);

If x=0 then break

pvec[p] = pvec[p]^(~x);

End

End

Multi

Processor

Stream

Processor

Multi

Processor

・・・

Computational Experiments

Running time (seconds)

for calculating run(w) for all binary strings of length n

Computational Experiments

The maximum number of runs functionρ(n)=max { run(w) : |w| = n }for binary strings calculated for n up to 47

Kolpakov & Kucherov’99

New!

Lower and Upper bounds of ρ(n)

0

n

2n

3n

4n

5n

ρ(n)

1.6n

[Crochemore& Ilie ’08]

5n

[Rytter ’06]

cn[Kolpakov & Kucherov ’99]

3.44n

[Rytter ’07]

3.48n

[Puglisi etal. ’08]

0.927n

[Franeck & Simpson ’06]

0.90n

0.95n

1.00n

1.05n

0.944565n

[Matsubara et al ’08]

0.94457571235n

[Matsubara et al ’09]

[Simpson ’09]

1.029n[Crochemoreet al. ’08]

Computational Experimentsf(n, r) : number of binary strings of length n with r runs

n

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

f(n, 1)

2

6

14

18

18

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

20

f(n, 2)

0

0

2

14

38

66

98

138

170

210

242

282

314

354

386

426

458

498

530

570

602

f(n, 3)

0

0

0

0

8

38

102

202

376

596

880

1220

1622

2080

2598

3174

3808

4502

5252

6064

6930

f(n, 4)

0

0

0

0

0

4

34

130

306

682

1314

2296

3736

5686

8260

11562

15642

20626

26574

33590

41754

n f(n, 1) f(n, 2) f(n, 3) f(n, 4)

23 20 642 7860 51184

24 20 674 8842 61898

25 20 714 9890 74070

26 20 746 10988 87732

27 20 786 12154 103000

28 20 818 13368 119922

29 20 858 14652 138664

30 20 890 15982 159216

31 20 930 17384 181764

32 20 962 18830 206308

33 20 1002 20350 233012

34 20 1034 21912 261896

35 20 1074 23550 293138

36 20 1106 25228 326696

37 20 1146 26984 362804

38 20 1178 28778 401434

39 20 1218 30652 442762

40 20 1250 32562 486776

41 20 1290 34554 533702

42 20 1322 36580 583470

f(n, 2) = f(n – 2, 2) + 72 for n 9.

f(n, 3) = 2f(n – 2, 3)- f(n – 4, 3) + 234 for n  16.

Conclusion
• We presented 3 bit-parallel algorithms for efficiently computing all the runs in short strings.
• O(n2) time if n = O(word size)
• First algorithm

can be used for strings with larger alphabet size at some cost

• Two latter algorithms

specialized for binary strings* and very efficient

* We recently noticed that they can be adapted to handle larger alphabets

• Calculated ρ (n) for binary strings of length up to n=47