1 / 132

Learning from an informant

Learning from an informant. Colin de la Higuera University of Nantes. Acknowledgements. Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these slides.

joanne
Download Presentation

Learning from an informant

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning from an informant Colin de la Higuera University of Nantes Zadar, August 2010

  2. Acknowledgements • Laurent Miclet, Jose Oncina and Tim Oates for previous versions of these slides. • Rafael Carrasco, Paco Casacuberta, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Thierry Murgue, Franck Thollard, Enrique Vidal, Frédéric Tantini,... • List is necessarily incomplete. Excuses to those that have been forgotten. http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/ Chapter 12 (and part of 14) Zadar, August 2010

  3. Outline • The rules of the game • Basic elements for learning DFA • Gold’s algorithm • RPNI • Complexity discussion • Heuristics • Open questions and conclusion Zadar, August 2010

  4. 1 The rules of the game Zadar, August 2010

  5. Motivation • We are given a set of strings S+ and a set of strings S- • Goal is to build a classifier • This is a traditional (or typical) machine learning question • How should we solve it? * Zadar, August 2010

  6. Ideas • Use a distance between strings and try k-NN • Embed strings into vectors and use some off-the-shelf technique (decision trees, SVMs, other kernel methods) Zadar, August 2010

  7. Alternative • Suppose the classifier is some grammatical formalism • Thus we have L and *\L L * Zadar, August 2010

  8. Informed presentations • An informed presentation (or an informant) of L* is a function f : ℕ *  {-,+} such that f(ℕ)=(L,+)(L,-) • f is an infinite succession of all the elements of * labelled to indicate if they belong or not to L. Zadar, August 2010

  9. Obviously many possible candidates • Any Grammar G such that • S+ L(G) • S- L(G) = • But there is an infinity of such grammars! Zadar, August 2010

  10. Structural completeness • (of S+re a DFA) each edge is used at least onceeach final state accepts at least one string • Look only at DFA for which the sample is structurally complete! Zadar, August 2010

  11. Example • S+={aab, b, aaaba, bbaba} a b a a b a b b Zadar, August 2010

  12. S+={aab, b, aaaba, bbaba} … a b a a b a b b add  and abb Zadar, August 2010

  13. Defining the search space by structural completeness (Dupont, Miclet, Vidal 94) • the basic operation: merging two states • a bias on the concepts : structural completeness of the positive sample S+ • a theorem: every biased solution can and can only be obtained by merging states in CA(S+) • the search space is a partition lattice. Zadar, August 2010

  14. {0}{1}{2}{3} {0, 1}{2}{3} {2, 3}{0}{1} {1, 3}{0}{2} {0, 2}{1}{3} {1, 2}{0}{3} {0, 3}{1}{2} {0, 1}{2, 3} {0, 2}{1, 3} {0, 3}{1, 2} {0} {1, 2, 3} {1}{0, 2, 3} {2}{0, 1, 3} {3}{0, 1, 2} {0, 1, 2, 3} Zadar, August 2010

  15. 3 1 1 1 2 3 2 0 0 2 0 a 1 a a a S- = {a} S+={aaa} 0 1 2 a a a a a a a a 3 2,3 0,3 0,2 0 0,1 a a a a a a a a a 1,2 3 1,3 a a a a a a a 1,2,3 3 0,3 0,1,2 1,2 a a a a a a a 0,1 2,3 0,1,3 0,2,3 1,3 2 0,2 a a a a a 0, 1, 2, 3 Zadar, August 2010

  16. The partition lattice • Let E be a set with n elements • The number of partitions of E is given by the Bell number (16)=10 480 142 147 Zadar, August 2010

  17. Regular inference as search • another result: the smallest DFA fitting the examples is in the lattice constructed on PTA(S+) • generally, algorithms would start from PTA(S+) and explore the corresponding lattice of solutions using the merging operation. S-is used to control the generalization. Zadar, August 2010

  18. CA (S+) or PTA(S+) Border Set  Universal Automaton Zadar, August 2010

  19. 2 Basic structures Zadar, August 2010

  20. Two types of final states S+={, aaa} S-={aa, aaaaa} 1 is accepting 3 is rejecting What about state 2? a 2 a 1 3 a Zadar, August 2010

  21. What is determinism about? a 2 a 2 1 1 a Merge 1 and 3? 4 3 a 4 But… a 2,4 1 Zadar, August 2010

  22. The prefix tree acceptor • The smallest tree-like DFA consistent with the data • Is a solution to the learning problem • Corresponds to a rote learner Zadar, August 2010

  23. From the sample to the PTA PTA(S+) 7 a 11 a a 4 8 b 2 a b a 14 a b 9 12 5 1 a 15 b 13 3 b a a 10 6 S+={, aaa, aaba, ababa, bb, bbaaa} S-={aa, ab, aaaa, ba} Zadar, August 2010

  24. From the sample to the PTA (full PTA) a 12 PTA(S+,S-) a 8 13 a a 4 9 b 2 a b a 16 a 10 b 5 14 1 a 17 b a 6 15 3 a a 11 b 7 S+={, aaa, aaba, ababa, bb, bbaaa} S-={aa, ab, aaaa, ba} Zadar, August 2010

  25. Red, Blue and White states -Red states are confirmed states -Blue states are the (non Red) successors of the Red states -White states are the others a b a a b a b a b Zadar, August 2010

  26. Merge and fold Suppose we want to merge state 3 with state 2 a 2 b a a 6 1 4 b a 9 3 b a 7 5 8 b Zadar, August 2010

  27. Merge and fold First disconnect 3 and reconnect to 2 a,b 2 b a a 6 1 4 a 9 3 b a 7 5 8 b Zadar, August 2010

  28. Merge and fold Then fold subtree rooted in 3 into the DFA starting in 2 a,b 2 b a a 6 9 3 1 4 a a 7 b 5 b 8 Zadar, August 2010

  29. Merge and fold Then fold subtree rooted in 3 into the DFA starting in 2 a,b 9 a 2 b a a 6 1 4 8 b Zadar, August 2010

  30. Other search spaces an augmented PTA can be constructed on both S+ and S-(Coste98, Oliveira 98) • but not every merge is possible • the search algorithms must run under a set of dynamic constraints Zadar, August 2010

  31. State splitting Searching by splitting • start from the one-state universal automaton, keep constructing DFA controlling the search with <S+, S-> Zadar, August 2010

  32. That seems a good idea… but take a5*. What 4 (or 3, 2, 1) state automaton is a decent approximation of a5* ? a a a a a Zadar, August 2010

  33. 3 Gold’s algorithm E. M. Gold. Complexity of automaton identification from given data. Information and Control, 37:302–320, 1978. Zadar, August 2010

  34. Key ideas • Use an observation table • Represent the states of an automaton as strings, prefixes of the strings in the learning set • Find some incompatibilities between these prefixes due to separating suffixes • This leads to equivalent prefixes • Invent the other equivalences Zadar, August 2010

  35. Strings as states a [b] [] b b a a [a] [bb] Zadar, August 2010

  36. Incompatible prefixes • S+={aab} • S-={bab} • Then clearly there are at least 2 states, one corresponding to a and another to b. Zadar, August 2010

  37. Observation table • The information is organised in a table <STA,EXP,OT> where: • RED* is the set of states • BLUE* is the set of transitions BLUE=(RED.)\RED • EXP* is the experiment set • OT: (STA=REDBLUE)EXP {0,1,*} such that: OT[u][e] = 1 if ueS+ 0 if ueS- * otherwise Zadar, August 2010

  38. An observation table The experiments (EXP)  a 1 0  The states (RED) a 0 * 1 0 b The transitions (BLUE) aa * 0 ab * * Zadar, August 2010

  39. Meaning  a  1 0 (q0, .)F   L a 0 * b 1 * aa * 0 * 0 ab Zadar, August 2010

  40. a  1 0 0 * (q0,ab.a)F  abaL a b 1 * aa * 0 ab 1 0 Zadar, August 2010

  41. Redundancy  a aa Must have identical label (redundancy)  1 * 0 a * 0 0 b * * 1 aa 0 0 0 ab 1 * 1 Zadar, August 2010

  42. DFA consistent with a table b  a 2 1  1 0 a a 0 * a b b 1 * a,b aa * 0 ab 1 0 2 1 b a Both DFA are consistent with this table Zadar, August 2010

  43. Closed table without holes • Let uREDBLUE and let OT[u][e] denote a cell OT[u] denotes the row indexed by u • We say that the table isclosedif tBLUE, sRED : OT[t] = OT[s] • We say that the table has noholesif uRED BLUE, eE OT[u][e]{0,1} Zadar, August 2010

  44. This table is closed  a  1 0 0 0 a b 1 0 aa 0 0 ab 1 0 Zadar, August 2010

  45. This table is not closed  a  1 0 0 0 a b 1 0 0 1 aa Not closed 1 0 ab Zadar, August 2010

  46. An equivalence relation Let <RED,EXP,OT> be a closed table and with no holes. Consider the equivalence relation over STA: s1s2 if OT[s1] =OT[s2] a OT[s1a] = OT[s2a] Class of s is denoted by[s] (also!) Zadar, August 2010

  47. Equivalent prefixes  a  1 0 0 0 a prefixes  and b are equivalent since OT[]=OT[b] b 1 0 0 0 aa 1 0 ab Zadar, August 2010

  48. Building an automaton from a table We define A(<STA,EXP,OT>) = (,Q,,q0,F) as follows: • Q = {[s]: s RED } • q0 = [] • F = {[ue]: OT[u][e] = 1} • ([s1], a)= • [s2a] ifs2[s1]: s2a RED • any [s]: s RED OT[s] =OT[s1a] Zadar, August 2010

  49. a  1 0 0 1 a a b 1 0 1 0 aa [a ] [] 0 1 a ab b b Zadar, August 2010

  50. Compatibility Theorem: • Let <STA,EXP,0T> be an observation table closed and with no holes • If STA isprefix-complete and EXP is suffix-complete then A(<STA,EXP,OT>) is compatible with the data in <STA,EXP,OT> Zadar, August 2010

More Related