1 / 21

Speaker: L. C. Chen Advisor: R. C. T. Lee

Approximate string matching using factor automata Jan Holub and Borivoj Melichar Theoretical Computer Science vol.249 p.305-311. Speaker: L. C. Chen Advisor: R. C. T. Lee. Problem.

kuper
Download Presentation

Speaker: L. C. Chen Advisor: R. C. T. Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate string matching using factor automataJan Holub and Borivoj MelicharTheoretical Computer Science vol.249 p.305-311 Speaker: L. C. Chen Advisor: R. C. T. Lee

  2. Problem • DL(P, X) between strings P and X is the minimum number of edit operations replace, insert and delete needed to convert string P to X. • Given a text T, a pattern P, and an integer k, k≦m≦n, approximate string matching can be defined as determining whether string X occurs in text T such that edit distance DL(P, X) between pattern P and string X is less than or equal to k.

  3. Basic definition • Fac(T): a set contains all the substrings of text T. • A nondeterministic finite automaton (NFA) is a five-tuple M=(Q, Σ, δ, q0 , F), where Q is a finite set of states, Σ is a finite input alphabet, δ is a mapping from Q×(Σ∪ {ε}) into the set of subsets of Q, q0 Qis an initial state, and F Q is a set of final states. • M(Fac(T)): a factor automaton accepts Fac(T).

  4. Factor automaton Factor automation M(Fac(T)): a deterministic finite automaton (DFA) accepts all substrings of the given text T. T=aabbabd Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}

  5. A suffix tree can also be used to recognize all substrings ofT=aabbabd, Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}

  6. One matched, 0 error. One matched, 0 error. One matched, 0 error. P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  7. Recognize ab P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  8. Recognize aab P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  9. Recognize bbab P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  10. Definition • Let An automaton for intersection of M1 and M2 is an automaton

  11. T=aabbabd P = bab, k=1 Intersectionof M(Lk(P)) and M(Fac(T)). Solutions : {ba, bab, bb, bbab, aab, ab}(All end with {3,0} or {3,1}.)

  12. T=aabbabd P = bab, k=1 Intersectionof M(Lk(P)) and M(Fac(T)).

  13. Intersection T P Delete!

  14. Intersection T P Match!

  15. Intersection T P Delete!

  16. Intersection T P Insert!

  17. Intersection T Replace! P

  18. Intersection T P Delete!

  19. Lemma • The number of automaton is always lower than .

  20. T=aabbabdP = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  21. Thank you!

More Related