1 / 29

On-line Linear-time Construction of Word Suffix Trees

On-line Linear-time Construction of Word Suffix Trees. Shunsuke Inenaga (Japan Society for the Promotion of Science & Kyushu University) Masayuki Takeda (Kyushu University & JST). Pattern Searching Problem. Given : text T in S * and pattern P in S *

geoff
Download Presentation

On-line Linear-time Construction of Word Suffix Trees

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On-line Linear-time Construction of Word Suffix Trees Shunsuke Inenaga (Japan Society for thePromotion of Science & Kyushu University) Masayuki Takeda (Kyushu University & JST)

  2. Pattern Searching Problem • Given:text T in S* and pattern P in S* • Find:an occurrence of P in T • S: alphabet • S*: set of strings • Using an indexing structure for T, we can solve the above problem in O(|P|) time.

  3. Suffix Trie • A trie representing all suffixes of T $ T = aacb$ a b c a aacb$ acb$ cb$ b$ $ $ c b c b $ b $ $

  4. Unwanted Matching pattern s pace runner text

  5. Unwanted Matching pattern s pace runner text

  6. Unwanted Matching pattern s pace runner text

  7. Introducing Word Separator # • # : word separator - special symbol not in S • D = S* # : dictionary of words • Text T : an element of D+(T is a sequence T1T2…Tk of k words in D) • e.g., T = This#is#a#pen# • S = {A,…,z} • D = {...,This#,...,a#,...is#,...pen#,...}

  8. Word-level Pattern Searching Problem • Given: text T in D+ and pattern P in D+ • Find: an occurrence of P in T which immediately follows # e.g. The#space#runner#is#not#your#good#pace#runner#

  9. Word-level Pattern Searching Problem • Given: text T in D+ and pattern P in D+ • Find: an occurrence of P in T which immediately follows # e.g. The#space#runner#is#not#your#good#pace#runner#

  10. Word Suffix Trie • A trie representing the suffixes of T which immediately follows # (and T itself). T = aa#b# a b a aa#b# a#b# #b# b# # # # b #

  11. Comparison T = aa#b# a a b b # a a # # # b # # b # b b # # # Word Suffix Trie Suffix Trie

  12. Construction • Suffix Trie : Ukkonen’s on-line algorithm (1995) • Word Suffix Trie : We modify Ukkonen’s algorithm by: • Using minimum DFA accepting dictionary D • Redefining suffix links

  13. Minimum DFA • The minimum DFA accepting D = S* # clearly requires constant space (for fixed S). • We replace the root node of the suffix trie with the final state of the DFA. S #

  14. Suffix Links T = aa#b# a,b # a b a # # b # Word Suffix Trie

  15. Suffix Links [cont.] T = aa#b# a,b # a a b b # a # a # # b # # b # b b # # # Word Suffix Trie Suffix Trie

  16. On-line Construction T = aa#b# a,b # S, # a a Word Suffix Trie Suffix Trie

  17. On-line Construction T = aa#b# a,b # S, # a a a a Word Suffix Trie Suffix Trie

  18. On-line Construction T = aa#b# a,b # S, # a a # a a # # # Word Suffix Trie Suffix Trie

  19. On-line Construction T = aa#b# a,b # S, # a a b b # a a # b # # b b b Word Suffix Trie Suffix Trie

  20. On-line Construction T = aa#b# a,b # S, # a a b b # a a # # # b # # b # b b # # # Word Suffix Trie Suffix Trie

  21. Pseudo Code S,# # Just change here!! a b # a # # b # b b # # # a,b # a b a # # b #

  22. Like Dress-up Doll

  23. Like Dress-up Doll [cont.] S,# # Hair is different a b # a # # b # b b # # # Different looking!!!! a,b # a b a # # b # Body is the same!!

  24. Drawback of Word Suffix Trie • Word suffix tries require O(k|T|) space. • Andersson et al. introduced word suffix trees which can be implemented in O(k) space.

  25. Construction of Word Suffix Trees • Algorithm by Andersson et al.(1996) • for text T = T1T2…Tk, constructs word suffix trees in O(|T|) expected time with O(k) space. • Our algorithm • simulates the on-line word suffix trie algorithm on word suffix trees. • runs in O(|T|) time in the worst cases, with O(k) space.

  26. Normal and Word Suffix Trees T = aa#b# # a b b # # a # a a b # # # # b b b # # # Word Suffix Tree Suffix Tree

  27. Construction Algorithm Just change here!!

  28. Conclusions • We first proposed an on-line word suffix trie construction algorithm. • The keys to the algorithm are the minimal DFA accepting D and the re-defined suffix links. • Further, we introduced an on-line algorithm to build word suffix trees that works with O(k) space and in O(|T|) time in the worst cases.

  29. Further Work • “Sparse Directed Acyclic Word Graphs”by Shunsuke Inenaga and Masayuki TakedaAccepted to SPIRE’06 • “Sparse Compact Directed Acyclic Word Graphs”by Shunsuke Inenaga and Masayuki TakedaAccepted to PSC’06

More Related