1 / 34

Attempts to extend correction queries

31 st of October 2005 Seminar IV. Attempts to extend correction queries. Cristina Bibire Research Group on Mathematical Linguistics, Rovira i Virgili University Pl. Imperial Tarraco 1, 43005, Tarragona, Spain E-mail: cristina.bibire@estudiants.urv.es. Correction queries

vance
Download Presentation

Attempts to extend correction queries

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 31st of October 2005Seminar IV Attempts to extend correction queries Cristina Bibire Research Group on Mathematical Linguistics, Rovira i Virgili University Pl. Imperial Tarraco 1, 43005, Tarragona, Spain E-mail: cristina.bibire@estudiants.urv.es

  2. Correction queries • PAC learning of DFA • Learning CFL • Learning WFA • Redefining the correcting string • References

  3. Learning from corrections The correcting string of s in the language L is the smallest string s' (in lex-length order) such that s.s' belongs to L. The answer to a correction query for a string consists of its correcting string. Myhill-Nerode theorem: The number of states in the smallest DFA accepting L is equal to the number of equivalence classes in .

  4. Learning from corrections

  5. How can we extend CQ? ? PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ? ? ?

  6. PAC learning of DFA with CQ • We assume that there is some probability distribution Pr on the set of all strings over the alphabet Σ and let L be an unknown regular set • The Learner has access to information about L by means of two oracles: • C(x) returns the correcting string for x • Ex( ) is a random sampling oracle that selects a string x from Σ* according to the distribution Pr and returns the pair (x, C(x)). • In addition, the Learner is given the accuracyε and the confidenceδ. • Definition: We say that the language L1 is an ε-approximation of the language L2 provided that: • If A is a DFA, it is said to be an ε-approximation of the set L if L(A) is an ε-approximation of L.

  7. PAC learning of DFA with CQ • If A is an ε-approximation of L, then the probability of finding a discrepancy between L(A) and L with one call of the random sampling oracle Ex( ) is at most ε. • The approximate learner LCAapprox is obtained by modifying LCA. A correction query of the string x is satisfied by a call to C(x). Each conjecture is tested by a number of calls to Ex( ). • If any of the calls to Ex( ) returns a pair (t, C(t)) such that: • - C(t)=λ but A(S,E,C) rejects it or • - C(t)≠λ but A(S,E,C) accepts it • then t is said to be a counterexample and LCAapprox proceeds as LCA • If none of the calls to Ex( ) returns a counterexample, then LCAapprox halts and outputs A(S,E,C)

  8. PAC learning of DFA with CQ • How many calls to Ex( ) does LCAapprox make to test a given conjecture? • accuracy and confidence parameters, ε and δ • how many previous conjectures have been tested • Let • If i previous conjectures have been tested then LCAapprox makes [ri] calls to Ex( ). • Theorem. If n is the number of states in the minimum DFA for the target language L, then LCAapprox terminates after O(n+(1/ε) (ln(1/δ)n+n2)) calls to Ex( ) oracle. Moreover, the probability that the automaton output by LCAapprox is an ε-approximation of L is at least 1-δ.

  9. PAC learning of DFA with CQ • Sketch of the proof: • the total number of counterexamples is at most n-1, so the total number of calls to Ex( ) is at most • the probability that LCAapprox will terminate with an automaton that is not an ε-approximation of L is:

  10. How can we extend CQ? √ ? PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ? ? ?

  11. Learning CFL • The setting • There is an unknown CFG G in Chomsky normal form. The Learner knows the set T of terminal symbols, the set N of nonterminal symbols and the start symbol S of G. The Teacher is assumed to answer two types of questions: • MEMBER(x,A) – if the string x can be derived from the non-terminal A in the grammar G, the answer is yes; otherwise, it is no • EQUIV(H) – if H is equivalent to G, the answer is yes; otherwise, it replies with a counterexample t.

  12. Learning CFL • The Learner LCF • LCF can explicitly enumerate all the possible productions of G in polynomial time (in |T| and |N|). Initially LCF places all possible productions of G in the hypothesized set of productions P. • The main loop of LCF asks an EQUIV(H) question for the grammar H=(T,N,S,P). • if H is equivalent to G, then LCF halts and outputs H • otherwise, it “diagnoses” the counterexample t returned, which results in removing at least one production from P; the main loop is then repeated.

  13. How can we extend CQ? √ PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ? ? ?

  14. Learning WFA Let be a field and be a function. Associate with an infinite matrix with rows indexed by strings in and columns indexed by strings in . The entry of contains the value f(x.y). The function is called a power series and its Hankel matrix. If we have an WFA A we can associate a function and vice versa, for every function there exists a smallest WFA A such that . Theorem [Carlyle, Paz 1971] Let such that and let F be the corresponding Hankel matrix. Then, the size r of the smallest WFA A such that satisfies r=rank(F).

  15. Learning WFA • Let f be a target function. The learning algorithm may ask the oracle two types of query: • EQ(h): if h is equivalent to f on all input assignments then the answer to the query is yes; otherwise, the answer is no and it receives a counterexample z ( ). • MQ(z): the oracle has to return f(z) • The algorithm learns a function f using its Hankel matrix, F. Because of the mentioned theorem, it is enough to keep a sub-matrix of F of full rank. Therefore the learning algorithm can be viewed as a search for appropriate r rows and r columns.

  16. Learning WFA • The algorithm • Initialize: • Define a hypothesis h • Let • For every , define a matrix such that • For every , define • Ask an equivalence query EQ(h) • If the answer is yes, halt and output h • Otherwise, the answer is no and we receive a counterexample z • Using MQ find a string w.σ, prefix of z such that • (a) • (b) • Go to (2)

  17. How can we extend CQ? √ PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the CQ ? ?

  18. Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

  19. q3 q2 0 q0 q1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

  20. 0, 1 0, 1 q2 q3 q2 0 q0 q1 q0 q1 0, 1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

  21. Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

  22. q3 q2 0 q0 q1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

  23. q2 q2 q3 0 q0 q1 0 1 1 1 1 0 0 0, 1 1 q0 q1 0, 1 0 Redefining the correcting string • Hamming distance (only for strings of the same length). For two strings s and t, H(s, t) is the number of places in which the two string differ, i.e., have different characters.

  24. q3 q2 0 q0 q1 0 1 1 1 1 0 0 Redefining the correcting string • Hamming distance

  25. 1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Hamming distance

  26. 1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Hamming distance

  27. 1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Hamming distance

  28. Redefining the correcting string • Levenshtein (or edit) distance. It counts also when one has a character whereas the other does not. • For two characters a and b, define: • Assume we are given two strings s and t of length n and m, respectively. We are going to fill an (n+1)×(m+1) array d with integers such that the low right corner element d(n+1, m+1) will furnish the required values of the Levenshtein distance Lev(s, t). • The definition of entries of d is recursive. • First set and • For other pairs i, j use

  29. 1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Levenshtein distance

  30. 1 1 1 q2 q2 0 0 q0 q1 0 1 1 1 0 0 q0 q1 0 Redefining the correcting string • Levenshtein distance

  31. 1 1 1 q2 0 0 q0 q1 0 Redefining the correcting string • Levenshtein distance

  32. How can we extend CQ? √ PAC learning of DFA with CQ Learning CFL with CQ Learning WFA with CQ Redefining the correcting string ?

  33. References • D. Agluin. Learning Regular Sets from Queries and Counter-examples. Information and Computation 75, 87-106 (1987) • L. Lee. Learning of Context-Free Languages: A Survey of the Literature. Harvard University Technical Report TR-12-1996 (written in 1994) • C. de la Higuera. Learning Stochastic Finite Automata from Experts. In Proceedings of the 4th International Colloquium on Grammatical Inference, Lecture Notes In Computer Science 1433,79-89 (1998) • F. Bergadano, N. Bshouty, A. Beimel, E. Kushilevitz and S. Varricchio. Learning Functions Represented as Multiplicity Automata. Journal of the ACM 47, 506-530 (2000) • http://www.cut-the-knot.org/do_you_know/Strings.shtml

  34. Thank You!

More Related