1 / 32

Average Value of Sum of Exponents of Runs in Strings

This paper discusses the average value of the sum of exponents of runs in strings, along with conjectures and previous research on the topic.

Download Presentation

Average Value of Sum of Exponents of Runs in Strings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Average Value of Sum of Exponents of Runs in Strings Kazuhiko Kusano, Wataru Matsubara, Akira Ishino, Ayumi Shinohara Graduate School of Information SciencesTohoku University, Japan

  2. Background

  3. Run (Maximal Repetition) • Substring w which has period • Non-extendable left nor right • Count once with it’s minimal period 21010122122 :run 201101101122122 22102102102102102011

  4. The Number & The Sum of Exponents • The number of runs and the sum of exponents (repetition counts) of runs are interestingissue 101011010111 2.5 2.5 2 3 2 2.2 Number: 6 Sum of exponents: 14.2

  5. Maximum • The maximum number of runs and the maximum value of sum of exponents of runs are still unknown Number Sum of exponents ≦cn Kolpakov and Kucherov, 1999 ≦cnKolpakov and Kucherov, 1999 ≦5n Rytter, 2006 ≦25n Rytter, 2006 ≦3.48nPuglisi et al., 2007 ≦3.44nRytter, 2007 ≦2.9n Crochemore and Ilie, 2007 ≦1.048nCrochemoreand Ilie, 2008 =n? Conjecture =2n? Conjecture 1.0 2.0 ≧0.945nMatsubara et al., 2008 ≧1.889nMatsubara et al., 2008 ≧0.927nFranek et al., 2003 ≧1.854nFranek et al., 2003

  6. Average • The average number of runs is presented • We show the average value of sum of exponents of runs Number of runs s : alphabet size m (d): Möbius function Puglisi & Simpson Australasian Journal of Combinatorics To appear (2008) Sum of exponents Our result

  7. Our result

  8. Our result • The average value of sum of exponents of runs in strings of length nis represented as follows • s : alphabet size • L(p): number of Lyndon words of length p Number of runs [Puglisi & Simpson, 2008] Sum of exponents

  9. Detail

  10. Runs in all strings of length n 000000 010000 100000 000001 010001 100001 000010 010010 100010 000011 010011 100011 000100 010100 100100 000101 010101 100101 000110 010110 100110 000111 010111 100111 001000 011000 : : : Complicated!

  11. d(w,p) • A stringd(w,p) of length |w|-pis defined as follows • w[i..j+p] is a run if and only if d(w,p)[i..j] is a 0-segment (maximal block of 0's) of length l≧p 2101012 - 2101012 10002 w w>>2 d(w,2) 2101012 10002 w d(w,2)

  12. Runs are classified according to its period d(w,1) 00000 11000 00001 11001 00011 11011 00010 11010 00110 11110 00111 11111 : : w 000000 010000 000001 010001 000010 010010 000011 010011 000100 010100 000101 010101 : : d(w,2) 0000 0100 0001 0101 0010 0110 0011 0111 0101 0001 01000000 : : d(w,3) 000 010 001 011 010 000 011 001 100 110 101 111 : :

  13. 00000 11000 00001 11001 00011 11011 00010 11010 00110 11110 00111 11111 : : 0-segments are classified according to its length l=2 0000 0100 0001 0101 0010 0110 0011 0111 0101 0001 0100 0000 : : d(w,2) 0000 0100 0001 0101 0010 0110 0011 0111 0101 0001 01000000 : : l=3 0000 0100 0001 0101 0010 0110 0011 0111 0101 0001 0100 0000 : : d(w,3) 000 010 001 011 010 000 011 001 100 110 101 111 : : l=4 0000 0100 0001 0101 0010 0110 0011 0111 0101 0001 0100 0000 : :

  14. c(n,p) • The number of 0-segments of length p in S n • Examples = 2, n = 5, p = 2 00000 01000 10000 11000 00001 01001 10001 11001 00010 01010 10010 11010 00011 01011 10011 11011 00100 01100 10100 11100 00101 01101 10101 11101 00110 01110 10110 11110 00111 01111 10111 11111

  15. c(n,p) • The number of 0-segments of length p in S n • Instead of 0-segments, pairs of strings (a,b), which separated by 0-segments of length p, are counted up 01202100002100001121

  16. c(n,p) • The number of 0-segments of length p in S n • Instead of 0-segments, pairs of strings (a,b), which separated by 0-segments of length p, are counted up a b 01202100002100001121

  17. c(n,p) • The number of 0-segments of length p in S n • Instead of 0-segments, pairs of strings (a,b), which separated by 0-segments of length p, are counted up a b (σ-1)2 choices (s -1)2 choices 01202100002100001121 (n - p+1) choices for position of 0-segments σn-p-2 choices s n-p-2choices a = e a, b≠e b = e 00000000000000000000

  18. C(n,p) • 0-segments of length l in d(w,p) correspond to runs of period pin w • The length of the run is l+p and the exponents is (l+p)/p • We denote by C(n,p) the sum of (l+p)/pfor each 0-segments of length p or longer as follows 100022101012 p=2, l=3 w d(w, p)

  19. C(n,p) • Examples =2,n=5, p=2 00000 01000 10000 11000 00001 01001 10001 11001 00010 01010 10010 11010 00011 01011 10011 11011 00100 01100 10100 11100 00101 01101 10101 11101 00110 01110 10110 11110 00111 01111 10111 11111

  20. 0-segments and runs • An 0-segment of length l≧p in d(w,p) correspond to s pruns having period p in w because d(w,p) and w[0..p-1] determine w[p..n-1] w 002000002 012101012 022202022 100010100 110111110 120212120 201020201 211121211 221222221 d(w,2) 2010002 00000, 11111 and 22222 are not runs of period 2 but period 1

  21. 0-segments and runs • An 0-segment of length l≧p in d(w,p) correspond to s pruns having period p in w because d(w,p) and w[0..p-1] determine w[p..n-1] In the roots all strings of length p appear once w 002000002 012101012 022202022 100010100 110111110 120212120 201020201 211121211 221222221 d(w,2) 2010002 00000, 11111 and 22222 are not runs of period 2 but period 1

  22. Counting a run once • To avoid counting a run more than once a run which has shorter period should be ignored • A run has no shorter period ⇔ The root of a run is primitive • The number of primitive strings of length p is pL(p) 0120202020202020221 L(p):number of Lyndon words of length p

  23. Counting a run once • To avoid counting a run more than once a run which has shorter period should be ignored • A run has no shorter period ⇔ The root of a run is primitive • The number of primitive strings of length p is pL(p) 0120202120202120221 L(p):number of Lyndon words of length p

  24. Counting a run once • To avoid counting a run more than once a run which has shorter period should be ignored • A run has no shorter period ⇔ The root of a run is primitive • The number of primitive strings of length p is pL(p) 0120202120202120221 L(p):number of Lyndon words of length p

  25. Counting a run once • To avoid counting a run more than once a run which has shorter period should be ignored • A run has no shorter period ⇔ The root of a run is primitive • The number of primitive strings of length p is pL(p) 0120202020202020221 L(p):number of Lyndon words of length p

  26. Average value of sum of exponents • The sum of exponents of runs in S nandthe average value of sum of exponents of runs in strings of length n are as follows

  27. Limit of e(n) • The average value e(n) grows almost linearly, as nincreases

  28. Limit of e(n) • The limit of e(n)/nand the actual values are follows m (d):Möbius function

  29. Summary

  30. Summary • The number of 0-segments of length p in S n • The sum of (l+p)/p for each runs of period por longer as follows • The average value of sum of exponents of runs in strings of length n Thank you for your attension

  31. 周期 ? NG 010010100100101001010

More Related