1 / 22

Speaker: L. C. Chen Advisor: R. C. T. Lee

Approximate string matching using factor automata Jan Holub and Borivoj Melichar Theoretical Computer Science vol.249 p.305-311. Speaker: L. C. Chen Advisor: R. C. T. Lee. Problem.

chen
Download Presentation

Speaker: L. C. Chen Advisor: R. C. T. Lee

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Approximate string matching using factor automataJan Holub and Borivoj MelicharTheoretical Computer Science vol.249 p.305-311 Speaker: L. C. Chen Advisor: R. C. T. Lee

  2. Problem • DL(P, X) between strings P and X is the minimum number of edit operations (substitution, insertion and deletion) needed to convert string P to X. • Given a text T, a pattern P, and an integer k, k≦m≦n, approximate string matching can be defined as determining whether string X occurs in text T such that edit distance DL(P, X) between pattern P and string X is less than or equal to k.

  3. An example of Edit Distance To convert P into T: P = abcde T = bcfeg P = abcde T = bcfeg Delete a Substitute d with f g Insert f P2 = bcfe P1 = bcde

  4. Basic definition • Fac(T): a set contains all the substrings of text T. • A nondeterministic finite automaton (NFA) is a five-tuple M=(Q, Σ, δ, q0 , F), where Q is a finite set of states, Σ is a finite input alphabet, δ is a mapping from Q×(Σ∪ {ε}) into the set of subsets of Q, q0 Qis an initial state, and F Q is a set of final states. • M(Fac(T)): a factor automaton accepts Fac(T).

  5. Factor automaton Factor automation M(Fac(T)): a deterministic finite automaton (DFA) accepts all substrings of the given text T. T=aabbabd Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}

  6. A suffix tree can also be used to recognize all substrings ofT=aabbabd, Fac(T)={a,b,d,aa,ab,bb,ba,bd,aab,abb,bba,bab,abd,aabb,abba,bbab,babd aabba,abbab,bbabd,aabbab,abbabd,aabbabd}

  7. One matched, 0 error. Three matched, 0 error. One matched, one error. P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  8. Recognize ab P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  9. Recognize aab P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  10. Recognize bbab P = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  11. Definition • Let An automaton for intersection of M1 and M2 is an automaton

  12. T=aabbabd P = bab, k=1 Intersectionof M(Lk(P)) and M(Fac(T)). Solutions : {ba, bab, bb, bbab, aab, ab} (All end with {3,0} or {3,1}.)

  13. T=aabbabd P = bab, k=1 Intersectionof M(Lk(P)) and M(Fac(T)).

  14. Intersection T DL(P,ba)=1 P=bab

  15. Intersection T DL(P,bab)=0 P=bab

  16. Intersection T DL(P,bb)=1 P P=bab

  17. Intersection T DL(P,bbab)=1 P=bab

  18. Intersection T DL(P,aab)=1 P=bab

  19. Intersection T DL(P,ab)=1 P=bab

  20. Lemma • The number of automaton is always lower than .

  21. T=aabbabdP = bab, k=1. The finite automaton M(Lk(P)) accepts Lk(P). Lk(P)={ab, bb, ba, aab, bab, dab, bbb, bdb baa, bad, bbab, bdab, baab, badb}.

  22. Thank you!

More Related