1 / 34

The Minimum DAWG for

Shunsuke Inenaga, Masayuki Takeda, Ayumi Shinohara, Hiromasa Hoshino, Setsuo Arikawa. The Minimum DAWG for. All Suffixes of a String. and its Applications. The Minimum. for. All Suffixes of a String. and its Applications. DAWG. Dynamic Attractive Worldcup Game?. Directed

Download Presentation

The Minimum DAWG for

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Shunsuke Inenaga, Masayuki Takeda, Ayumi Shinohara, Hiromasa Hoshino, Setsuo Arikawa The Minimum DAWG for All Suffixes of a String and its Applications

  2. The Minimum for All Suffixes of a String and its Applications DAWG Dynamic Attractive Worldcup Game?

  3. Directed Acyclic Word Graph!!

  4. TheDirected Acyclic Word Graphof a stringw,DAWG(w), represents all substrings of w (Blumer et al., 1985). DAWG(w) is the smallest automaton that accepts all suffixes of w (Crochemore, 1986). Directed Acyclic Word Graph

  5. b a a b b b b a accepting node DAWG of string “abbab” DAWG(abbab) For any string w, DAWG(w) can be built in linear time and space in |w| (Blumer et al., 1985).

  6. b b b a a b b a b b b a a a b b a a b a b b The DAWGs for All Suffixes of “abbab” DAWG(abbab) The collection of these DAWGs is called the naive All-Suffixes DAWG (ASDAWG) of w. DAWG(bbab) DAWG(bab) DAWG(ab) DAWG(b) DAWG(e)

  7. 0 b b b a a b b a 1 b b b a DAG minimization algorithm (Revuz, 1992) a a 2 b b a a b a 3 b b 4 22 nodes 22 edges 5 Minimizing the naive ASDAWG(abbab)

  8. 5 0 b b b a a b b a b 1 a a a b 2 b b a 3 MASDAWG(abbab) 4 12 nodes 15 edges The Minimum ASDAWG(abbab)

  9. What is MASDAWG exactly? MASDAWG(w) is the smallest dag with n initial nodes, where the subgraph consisting of nodes reachable from the k-th initial node and their out-going edges is DAWG(w[k:n]).

  10. 5 0 b b b a a abbab b b b a b 0 1 2 3 4 5 b a a a b 1 a a a b 2 b b a 3 4 MASDAWG(abbab) MASDAWG(abbab) DAWG(bbab)

  11. Time Taken to Construct MASDAWGs MASDAWG(w) can be obtained in time linear in the number of edges of the naive ASDAWG(w). On the other hand, the size of the naive ASDAWG(w) is O(n2).

  12. The Size of MASDAWG Theorem 1 The number of nodes in MASDAWG(w)is Q (n) if |S| = 1; Q (n2) if |S| > 1.

  13. 1 0 2 3 4 5 a a a a a The Size of MASDAWG For a unary alphabet S = {a} MASDAWG(a5)

  14. The Size of MASDAWG For an alphabet S with |S| > 1 The series of string (ab)m(ba)m gives the lower bound W(n2).

  15. Is it possible to construct MASDAWG(w) directly? Direct Construction of MASDAWGs Question

  16. 0 1 0 2 1 1 0 On-Line Construction of MASDAWG(abbab) MASDAWG(e) MASDAWG(ab) b a b MASDAWG(a) b a

  17. 2 3 1 2 0 Direct Construction of MASDAWG(abbab) MASDAWG(abb) b b a b b b b

  18. 3 1 2 0 Direct Construction of MASDAWG(abbab) MASDAWG(abb) b b a b b b

  19. 2 3 4 3 1 2 0 Direct Construction of MASDAWG(abbab) MASDAWG(abba) b a b a b a b b a b a a

  20. 4 5 4 3 2 1 0 Direct Construction of MASDAWG(abbab) MASDAWG(abbab) b a b b a Finish!!! b a b b a b a a b b a b

  21. b a b b a b a b b a a a b b a b 3 4 5 2 1 0 Direct Construction of MASDAWG(abbab) MASDAWG(abbab)

  22. b a b b a b a b DAWG(abbab); abba DAWG(bbab); bba DAWG(bab); ba DAWG(ab); a b a b a a b b a b 4 3 4 5 2 1 0 Length Information MASDAWG(abbab)

  23. DAWG(w[i:n]); x1 . . x2 DAWG(w[i:n]); x1 … DAWG(w[i+1:n]); x2 DAWG(w[i+k-1:n]); xk … . . . . xk xk+1 … DAWG(w[i+k:n]); xk+1 … … . . DAWG(w[i+l-1:n]); xl … DAWG(w[i+l-1:n]); (i, k, k, |xk|) (i, l, k, |xk|) xl (i+k, l, k+1, |xk+1|) Length Information

  24. Direct Construction of MASDAWGs Theorem 2 For any string w, MASDAWG(w) can be constructed directly, in linear time and space in its output size.

  25. Why MASDAWGs? Application of MASDAWGs Question • Beginning Sensitive Pattern Matching • Region Sensitive Pattern Matching • VLDC Pattern Matching

  26. n w Does p appear in ? Beginning Sensitive Pattern Matching Beginning Sensitive Pattern a pair <p, i> where p is a string and i is a non-negative integer. i BS-Pattern Matching Problem Instance: text w and BS-pattern <p, i> Determine: whether p is a substring of w[i:n]

  27. abbab 0 1 2 3 4 5 BS-Pattern<ab, 1> ? BS-Pattern Matching with “abbab” MASDAWG(abbab) 5 0 b b b b a a b b a b 1 1 a a a a b 2 b b a 3 4

  28. n w Does p appear in ? Region Sensitive Pattern Matching Region Sensitive Pattern a triple <p,(i, j)> where p is a string and i, j are non-negative integers. j i RS-Pattern Matching Problem Instance: text w and RS-pattern <p, (i, j)> Determine: whether p is a substring of w[i:j]

  29. abbab 0 1 2 3 4 5 RS-Pattern<ab, (1, 4)> ? RS-Pattern Matching with “abbab” MASDAWG(abbab) 5 0 b b b b a a 0 3 1 4 4 5 5 2 b b a 2 b 1 1 a a 1 1 a a b 2 b b 2 3 a 3 3 4 4

  30. Let * be a variable-length-don’t-care (wildcard) that matches any string. A pattern containing characters in S and *’s is called a VLDC-pattern. VLDC-Pattern Matching An example of a VLDC-pattern is ab*ba*. The VLDC-pattern ab*ba*matches string abababb with the first and the second *’s being replaced by a and bb, respectively.

  31. Wildcard DAWGs The Wildcard DAWG of a string w, WDAWG(w), is the smallest automaton recognizing all VLDC-patterns matching w. WDAWG(w) is inherently the same structure as MASDAWG(w).

  32. a * b b b a a * b * b * a * b * * a * a a a * b b b * * * * WDAWG(abbab)

  33. VLDC-Pattern Matching VLDC-Pattern Matching Problem Instance: text w and VLDC-pattern q Determine: whether q matches w

  34. Space-Economical Construction of Index • Structures for All Suffixes of a String Shunsuke Inenaga, Ayumi Shinohara, Masayuki Takeda, Hideo Bannai, Setsuo Arikawa (To appear in MFCS 2002) • Discovering Best Variable-Length-Don’t-Care • Patterns Shunsuke Inenaga, Hideo Bannai, Ayumi Shinohara, Masayuki Takeda, Setsuo Arikawa (Submitted to DS 2002) Coming Soon

More Related