1 / 182

Colin de la Higuera

Grammatical inference: techniques and algorithms. Colin de la Higuera. Acknowledgements.

shaun
Download Presentation

Colin de la Higuera

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammatical inference: techniques and algorithms Colin de la Higuera

  2. Acknowledgements • Laurent Miclet, Tim Oates, Jose Oncina, Rafael Carrasco, Paco Casacuberta, Pedro Cruz, Rémi Eyraud, Philippe Ezequel, Henning Fernau, Jean-Christophe Janodet, Thierry Murgue, Frédéric Tantini, Franck Thollard, Enrique Vidal,... • … and a lot of other people to whom I am grateful

  3. Outline 1 An introductory example 2 About grammatical inference 3 Some specificities of the task 4 Some techniques and algorithms 5 Open issues and questions

  4. 1 How do we learn languages? A very simple example

  5. The problem: • You are in an unknown city and have to eat. • You therefore go to some selected restaurants. • Your goal is therefore to build a model of the city (a map).

  6. The data • Up Down Right Left Left Restaurant • Down Down Right Not a restaurant • Left Down Restaurant

  7. Hopefully something like this: u,r N u d R d,l N u R d r

  8. N d u N u d R R d N u u u R N d d R R r d u N

  9. Further arguments (1) • How did we get hold of the data? • Random walks • Following someone • someone knowledgeable • Someone trying to lose us • Someone on a diet • Exploring

  10. Further arguments (2) • Can we not have better information (for example the names of the restaurants)? • But then we may only have the information about the routes to restaurants (not to the “non restaurants”)…

  11. Further arguments (3) What if instead of getting the information “Elimo” or “restaurant”, I get the information “good meal” or “7/10”? Reinforcement learning: POMDP

  12. Further arguments (4) • Where is my algorithm to learn these things? • Perhaps should I consider several algorithms for the different types of data?

  13. Further arguments (5) • What can I say about the result? • What can I say about the algorithm?

  14. Further arguments (6) • What if I want something richer than an automaton? • A context-free grammar • A transducer • A tree automaton…

  15. Further arguments (7) • Why do I want something as rich as an automaton? • What about • A simple pattern? • Some SVM obtained from features over the strings? • A neural network that would allow me to know if some path will bring me or not to a restaurant, with high probability?

  16. Our goal/idea • Old Greeks: A whole is more than the sum of all parts • Gestalt theory A whole is different than the sum of all parts

  17. Better said • There are cases where the data cannot be analyzed by considering it in bits • There are cases where intelligibility of the pattern is important

  18. What do people know about formal language theory? Nothing Lots

  19. A small reminder on formal language theory • Chomsky hierarchy • + and – of grammars

  20. A crash course in Formal language theory • Symbols • Strings • Languages • Chomsky hierarchy • Stochastic languages

  21. Symbols are taken from some alphabet  Strings are sequences of symbols from 

  22. Languages are sets of strings over  Languages are subsets of *

  23. Special languages • Are recognised by finite state automata • Are generated by grammars

  24. b a a a b b DFA: Deterministic Finite State Automaton

  25. b a a a b b ababL

  26. What is a context free grammar? A 4-tuple (Σ, S, V, P) such that: • Σ is the alphabet; • V is a finite set of non terminals; • S is the start symbol; • P V (VΣ)*is a finite set of rules.

  27. Example of a grammar The Dyck1 grammar • (Σ, S, V, P) • Σ = {a, b} • V = {S} • P = {S  aSbS, S   }

  28. Derivations and derivation trees S S  aSbS  aaSbSbS  aabSbS  aabbS  aabb a S b S  a S b S  

  29. Chomsky Hierarchy • Level 0: no restriction • Level 1: context-sensitive • Level 2: context-free • Level 3: regular

  30. Chomsky Hierarchy • Level 0: Whatever Turing machines can do • Level 1: • {anbncn: n} • {anbmcndm: n,m} • {uu: u*} • Level 2: context-free • {anbn: n} • brackets • Level 3: regular • Regular expressions (GREP)

  31. The membership problem • Level 0: undecidable • Level 1: decidable • Level 2: polynomial • Level 3: linear

  32. The equivalence problem • Level 0: undecidable • Level 1: undecidable • Level 2: undecidable • Level 3: Polynomial only when the representation is DFA.

  33. b a b a a b PFA: Probabilistic Finite (state) Automaton

  34. 0.1 b 0.9 a a 0.35 0.7 a 0.7 b 0.65 b 0.3 0.3 DPFA: Deterministic Probabilistic Finite (state) Automaton

  35. What is nice with grammars? • Compact representation • Recursivity • Says how a string belongs, not just if it belongs • Graphical representations (automata, parse trees)

  36. What is not so nice with grammars? • Even the easiest class (level 3) contains SAT, Boolean functions, parity functions… • Noise is very harmful: • Think about putting edit noise to language {w: |w|a=0[2]|w|b=0[2]}

  37. 2 Specificities of grammatical inference Grammatical inference consists (roughly) in finding the (a) grammar or automaton that has produced a given set of strings (sequences, trees, terms, graphs).

  38. The field Inductive Inference Pattern Recognition Machine Learning Grammatical Inference Computational linguistics Computational biology Web technologies

  39. The data • Strings, trees, terms, graphs • Structural objects • Basically the same gap of information as in programming between tables/arrays and data structures

  40. Alternatives to grammatical inference • 2 steps: • Extract features from the strings • Use a very good method over n.

  41. Examples of strings A string in Gaelic and its translation to English: • Tha thu cho duaichnidh ri èarr àirde de a’ coisich deas damh • You are as ugly as the north end of a southward traveling ox

  42. >A BAC=41M14 LIBRARY=CITB_978_SKB AAGCTTATTCAATAGTTTATTAAACAGCTTCTTAAATAGGATATAAGGCAGTGCCATGTA GTGGATAAAAGTAATAATCATTATAATATTAAGAACTAATACATACTGAACACTTTCAAT GGCACTTTACATGCACGGTCCCTTTAATCCTGAAAAAATGCTATTGCCATCTTTATTTCA GAGACCAGGGTGCTAAGGCTTGAGAGTGAAGCCACTTTCCCCAAGCTCACACAGCAAAGA CACGGGGACACCAGGACTCCATCTACTGCAGGTTGTCTGACTGGGAACCCCCATGCACCT GGCAGGTGACAGAAATAGGAGGCATGTGCTGGGTTTGGAAGAGACACCTGGTGGGAGAGG GCCCTGTGGAGCCAGATGGGGCTGAAAACAAATGTTGAATGCAAGAAAAGTCGAGTTCCA GGGGCATTACATGCAGCAGGATATGCTTTTTAGAAAAAGTCCAAAAACACTAAACTTCAA CAATATGTTCTTTTGGCTTGCATTTGTGTATAACCGTAATTAAAAAGCAAGGGGACAACA CACAGTAGATTCAGGATAGGGGTCCCCTCTAGAAAGAAGGAGAAGGGGCAGGAGACAGGA TGGGGAGGAGCACATAAGTAGATGTAAATTGCTGCTAATTTTTCTAGTCCTTGGTTTGAA TGATAGGTTCATCAAGGGTCCATTACAAAAACATGTGTTAAGTTTTTTAAAAATATAATA AAGGAGCCAGGTGTAGTTTGTCTTGAACCACAGTTATGAAAAAAATTCCAACTTTGTGCA TCCAAGGACCAGATTTTTTTTAAAATAAAGGATAAAAGGAATAAGAAATGAACAGCCAAG TATTCACTATCAAATTTGAGGAATAATAGCCTGGCCAACATGGTGAAACTCCATCTCTAC TAAAAATACAAAAATTAGCCAGGTGTGGTGGCTCATGCCTGTAGTCCCAGCTACTTGCGA GGCTGAGGCAGGCTGAGAATCTCTTGAACCCAGGAAGTAGAGGTTGCAGTAGGCCAAGAT GGCGCCACTGCACTCCAGCCTGGGTGACAGAGCAAGACCCTATGTCCAAAAAAAAAAAAA AAAAAAAGGAAAAGAAAAAGAAAGAAAACAGTGTATATATAGTATATAGCTGAAGCTCCC TGTGTACCCATCCCCAATTCCATTTCCCTTTTTTGTCCCAGAGAACACCCCATTCCTGAC TAGTGTTTTATGTTCCTTTGCTTCTCTTTTTAAAAACTTCAATGCACACATATGCATCCA TGAACAACAGATAGTGGTTTTTGCATGACCTGAAACATTAATGAAATTGTATGATTCTAT

  43. <book> <part> <chapter> <sect1/> <sect1> <orderedlist numeration="arabic"> <listitem/> <f:fragbody/> </orderedlist> </sect1> </chapter> </part> </book>

  44. <?xml version="1.0"?><?xml-stylesheet href="carmen.xsl" type="text/xsl"?><?cocoon-process type="xslt"?> <!DOCTYPE pagina [<!ELEMENT pagina (titulus?, poema)><!ELEMENT titulus (#PCDATA)><!ELEMENT auctor (praenomen, cognomen, nomen)><!ELEMENT praenomen (#PCDATA)><!ELEMENT nomen (#PCDATA)><!ELEMENT cognomen (#PCDATA)><!ELEMENT poema (versus+)><!ELEMENT versus (#PCDATA)>]> <pagina><titulus>Catullus II</titulus><auctor><praenomen>Gaius</praenomen><nomen>Valerius</nomen><cognomen>Catullus</cognomen></auctor>

More Related