Advances in Learning k-Testable Languages from Text: Challenges and Insights

Learning from Text Colin de la Higuera University of Nantes Zadar, August 2010

Acknowledgements • Laurent Miclet, Jose Oncina, Tim Oates, Anne-Muriel Arigon, Leo Becerra-Bonache, Rafael Carrasco, Paco Casacuberta, Pierre Dupont, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Jean-Christophe Janodet, Satoshi Kobayachi, Thierry Murgue, Frédéric Tantini, Franck Thollard, Enrique Vidal, Menno van Zaanen,... http://pagesperso.lina.univ-nantes.fr/~cdlh/ http://videolectures.net/colin_de_la_higuera/ Zadar, August 2010

Outline • Motivations, definition and difficulties • Some negative results • Learning k-testable languages from text • Learning k-reversible languages from text • Conclusions http://pagesperso.lina.univ-nantes.fr/~cdlh/slides/ Chapters 8 and 11 Zadar, August 2010

1 Identification in the limit yields A class of languages L Pres  ℕX a L A learner The naming function G A class of grammars L(a())=yields() (ℕ)=(ℕ) yields()=yields() Zadar, August 2010

Learning from text • Only positive examples are available • Danger of over-generalization: why not return *? • The problem is “basic”: • Negative examples might not be available • Or they might be heavily biased: near-misses, absurd examples… • Base line: all the rest is learning with help Zadar, August 2010

GI as a search problem PTA ?  Zadar, August 2010

Questions? • Data is unlabelled… • Is this a clustering problem? • Is this a problem posed in other settings? Zadar, August 2010

2 The theory • Gold 67: No super-finite class can be identified from positive examples (or text) only • Necessary and sufficient conditions for learning • Literature: • inductive inference, • ALT series, … Zadar, August 2010

Limit point • A class L of languages has a limit pointif there exists an infinite sequence Lnnℕ of languages in L such that L0  L1  … Ln  …, and there exists another language L L such that L= nℕLn • L is called a limit point of L Zadar, August 2010

L is a limit point L0 L1 Li L2 L L3 Zadar, August 2010

Theorem If L admits a limit point, then L is not learnable from text Proof: Let sibe a presentation in length-lex order for Li,and s be a presentation in length-lex order for L. Thennℕi / kn sik= sk Note: having a limit point is a sufficient condition for non learnability; not a necessary condition Zadar, August 2010

Mincons classes • A class is mincons if there is an algorithm which, given a sample S, builds a GG such that S L L(G) L =L(G) • Ie there is a unique minimum (for inclusion) consistent grammar Zadar, August 2010

Accumulation point (Kapur 91) A class L of languages has an accumulation pointif there exists an infinite sequence Sn nℕ of sets such that S0  S1  … Sn  …, and L= nℕSn  L …and for any nℕ there exists a language Ln’ in L such that Sn  Ln’  L. The language L is called an accumulation point of L Zadar, August 2010

Ln’ S0 S1 S2 S3 L is an accumulation point Sn L Zadar, August 2010

Theorem (for Mincons classes) L admits an accumulation point iff L is not learnable from text Zadar, August 2010

Infinite Elasticity • If a class of languages has a limit point there exists an infinite ascending chain of languages L0 L1  …  Ln  …. • This property is called infinite elasticity Zadar, August 2010

Infinite Elasticity x0 x1 xi Xi+1 Xi+2 Xi+3 Xi+4 x2 x3 Zadar, August 2010

Finite elasticity L has finite elasticity if it does not have infinite elasticity Zadar, August 2010

Theorem (Wright) If L(G) has finite elasticity and is mincons, then G is learnable. Zadar, August 2010

L(G’) TG x1 x3 x2 x4 Tell tale sets Forbidden L(G) Zadar, August 2010

Theorem (Angluin) G is learnable iff there is a computable partial function : Gℕ* such that: • nℕ, (G,n) is defined iffGG and L(G) • GG, TM={(G,n): nℕ} is a finite subset of L(G) called a tell-tale subset • G,G’M, if TM L(G’) then L(G’) L(G) Zadar, August 2010

Proposition (Kapur 91) A language L in L has a tell-tale subsetiffL is not an accumulation point. (for mincons) Zadar, August 2010

Summarizing • Many alternative ways of proving that identification in the limit is not feasible • Methodological-philosophical discussion • We still need practical solutions Zadar, August 2010

3 Learning k-testable languages P. García and E. Vidal. Inference of K-testable languages in the strict sense and applications to syntactic pattern recognition. Pattern Analysis and Machine Intelligence, 12(9):920–925, 1990 P. García, E. Vidal, and J. Oncina. Learning locally testable languages in the strict sense. In Workshop on Algorithmic Learning Theory (Alt 90), pages 325–338, 1990 Zadar, August 2010

Definition Let k0, a k-testable language in the strict sense (k-TSS) is a 5-tuple Zk=(, I, F, T, C) with: •  a finite alphabet • I, Fk-1(allowed prefixes of length k-1 and suffixes of length k-1) • Tk (allowed segments) • C<k contains all strings of length less than k • Note that I∩F=C∩Σk-1 Zadar, August 2010

The k-testable language is L(Zk)=I*  *F - *(k-T)*C • Strings (of length at least k) have to use a good prefix and a good suffix of length k-1, and all sub-strings have to belong to T. Strings of length less than k should be in C • Or: k-T defines the prohibited segments • Key idea: use a window of size k Zadar, August 2010

a b a a  a b An example (2-testable) I={a} F={a} T={aa, ab, ba} C={,a} Zadar, August 2010

Window language • By sliding a window of size 2 over a string we can parse • ababaaababababaaaab OK • aaabbaaaababab not OK Zadar, August 2010

The hierarchy of k-TSS languages • k-TSS()={L*: L is k-TSS} • All finite languages are in k-TSS() if k is large enough! • k-TSS()  [k+1]-TSS() • (bak)* [k+1]-TSS() • (bak)* k-TSS() Zadar, August 2010

a a a  b b a A language that is not k-testable Zadar, August 2010

K-TSS inference Given a sample S, L(ak-TSS(S))= Zk where Zk=((S), I(S), F(S), T(S), C(S) ) and • (S) is the alphabet used in S • C(S)=(S)<kS • I(S)=(S)k-1Pref(S) • F(S)= (S)k-1Suff(S) • T(S)=(S)k {v: uvwS} Zadar, August 2010

Example • S={a, aa, abba, abbbba} • Let k=3 • (S)={a, b} • I(S)= {aa, ab} • F(S)= {aa, ba} • C(S)= {a , aa} • T(S)={abb, bbb, bba} • L(a3-TSS(S))= ab*a+a Zadar, August 2010

Building the corresponding automaton • Each string in IC and PREF(IC) is a state • Each substring of length k-1 of strings in T is a state •  is the initial state • Add a transition labeled b from u to ub for each state ub • Add a transition labeled b from au to ub for each aub in T • Each state/substring that is in F is a final state • Each state/substring that is in C is a final state Zadar, August 2010

a aa aa a a a I={aa, ab}   b F={aa, ba} ab ab T={abb, bbb, bba} C={a, aa} b bb a bb ba ba b Running the algorithm S={a, aa, abba, abbbba} Zadar, August 2010

Properties (1) • S L(ak-TSS(S)) • L(ak-TSS(S)) is the smallest k-TSS language that contains S • If there is a smaller one, some prefix, suffix or substring has to be absent Zadar, August 2010

Properties (2) • ak-TSS identifies any k-TSS language in the limit from polynomial data • Once all the prefixes, suffixes and substrings have been seen, the correct automaton is returned • If YS, L(ak-TSS(Y)) L(ak-TSS(S)) Zadar, August 2010

Properties (3) • L(ak+1-TSS(S)) L(ak-TSS(S)) In Ik+1 (resp. Fk+1 and Tk+1) there are less allowed prefixes (resp. suffixes or substrings) than in Ik (resp. Fk and Tk) • k>maxxSx, L(ak-TSS(S))= S • Because for a large k, Tk(S)= Zadar, August 2010

4 Learning k-reversible languages from text D. Angluin. Inference of reversible languages. Journal of the Association for Computing Machinery, 29(3):741–765, 1982 Zadar, August 2010

The k-reversible languages • The class was proposed by Angluin (1982) • The class is identifiable in the limit from text • The class is composed by regular languages that can be accepted by a DFA such that its reverse is deterministic with a look-ahead of k Zadar, August 2010

Let A=(, Q, , I, F) be a NFA, we denote by AT=(, Q, T, F, I) the reversal automaton with: T(q,a)={q’Q: q(q’,a)} Zadar, August 2010

a b A a 2 b 1 0 a 4 a a 3 a AT b a 2 b 1 0 a 4 a a 3 Zadar, August 2010

Some definitions • u is a k-successor of q if │u│=k and (q,u) • u is a k-predecessor of q if │u│=k and T(q,uT) •  is 0-successor and 0-predecessor of any state Zadar, August 2010

a b A a 2 b • aa is a 2-successor of 0 and 1 but not of 3 • a is a 1-successor of 3 • aa is a 2-predecessor of 3 but not of 1 1 0 a 4 a a 3 Zadar, August 2010

A NFA is deterministic with look-ahead kifq,q’Q: qq’ (q,q’I)  (q,q’(q”,a))  (u is a k-successor of q)  (v is a k-successor of q’)  uv Zadar, August 2010

Prohibited: u 1 │u│=k a a u 2 Zadar, August 2010

Example a b This automaton is not deterministic with look-ahead 1 but is deterministic with look-ahead 2 a 2 b 1 0 a 4 a a 3 Zadar, August 2010

K-reversible automata • A is k-reversible if A is deterministic and AT is deterministic with look-ahead k • Example b b a a b a 2 b a 2 1 0 1 0 b b deterministic with look-ahead 1 deterministic Zadar, August 2010

Notations • RL(, k) is the set of all k reversible languages over alphabet  • RL() is the set of all k-reversible languages over alphabet  (ie for all values of k) • ak-RL is the learning algorithm we describe Zadar, August 2010

Properties • There are some regular languages that are not in RL() • RL(,k)RL(,k-1) check Zadar, August 2010

Violation of k-reversibility Two states q, q’ violate the k-reversibility condition if • they violate the deterministic condition: q,q’(q”,a) or • they violate the look-ahead condition: • q,q’F, uk: u is k-predecessor of both q and q’ • uk, (q,a)=(q’,a) and u is k-predecessor of both q and q’ Zadar, August 2010

Advances in Learning k-Testable Languages from Text: Challenges and Insights

Advances in Learning k-Testable Languages from Text: Challenges and Insights

Presentation Transcript

Learning Objective : Make predictions using evidence from the text

Supervised learning for text

Learning to Classify Text

Ontology Learning from Text

Ontology Learning from Text: A Survey of Methods

Lecture 16: Unsupervised Learning from Text

Text Learning

Learning for Text Categorization

Bloom’s text learning objectives

Ontology learning and population from from text

Ontology Learning and Population from Text

From Pictograph , Text to Hyper-Text:

Learning from Challenging Text

Search and Comprehension Processes in Learning from Text

Starting from Text Files

Learning from Text

Lecture 16: Unsupervised Learning from Text

Learning Text Mining

Information extraction from text

Learning to Classify Text