1 / 55

Grammar Induction

Grammar Induction. So what did we have?. cat. ?. node. edge. where. (1). 101. (2). (5). 104. (6). (1). 101. (2). BEGIN. is. (1). (2). 102. END. (6). (5). 104. 103. (2). (7). 103. (3). and. (1). (6). 104. (4). (3). 102. (4). the. (5). 102. 101. (3). that.

callum-levy
Download Presentation

Grammar Induction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Grammar Induction So what did we have?

  2. cat ? node edge where (1) 101 (2) (5) 104 (6) (1) 101 (2) BEGIN is (1) (2) 102 END (6) (5) 104 103 (2) (7) 103 (3) and (1) (6) 104 (4) (3) 102 (4) the (5) 102 101 (3) that a (3) (4) (6) horse (5) (4) dog The Model: Graph representation with words as vertices and sentences as paths. And is that a horse? Is that a dog? Where is the dog? Is that a cat?

  3. Detecting significant patterns • Identifying patterns becomes easier on a graph • Sub-paths are automatically aligned

  4. Motif EXtraction

  5. Pattern significance • Say we found a potential pattern-edge from nodes 1 to n. Define • m - the number of paths from 1 to n • r – the number of paths from 1 to n+1 • Because it’s a pattern edge, we know that • Let’s suppose that the true probability for n+1 given 1 through n is • r/m is our best estimate, but just an estimate • What are the odds of getting r and m but still have ?

  6. Pattern significance • Assume • The odds of getting result r and m or better are then given by • If this is smaller than a predetermined α, we say the pattern-edge candidate is significant

  7. Rewiring the graph Once a pattern is identified as significant, the sub-paths it subsumes are merged into a new vertex and the graph is rewired accordingly. Repeating this process, leads to the formation of complex, hierarchically structured patterns.

  8. Evaluating performance • Define • Recall – the probability of ADIOS recognizing an unseen grammatical sentence • Precision – the proportion of grammatical ADIOS productions • Recall can be assessed by leaving out some of the training corpus • Precision is trickier • Unless we’re learning a known CFG

  9. Determining L • Involves a tradeoff • Larger L will demand more context sensitivity in the inference • Will hamper generalization • Smaller L will detect more patterns • But many might be spurious

  10. The effects of context window width

  11. An ADIOS drawback • ADIOS is inherently a heuristic and greedy algorithm • Once a pattern is created it remains forever – errors conflate • Sentence ordering affects outcome • Running ADIOS with different orderings gives patterns that ‘cover’ different parts of the grammar

  12. An ad-hoc solution • Train multiple learners on the corpus • Each on a different sentence ordering • Create a ‘forest’ of learners • To create a new sentence • Pick one learner at random • Use it to produce sentence • To check grammaticality of given sentence • If any learner accepts sentence, declare as grammatical

  13. The ATIS experiments • ATIS-NL is a 13,043 sentence corpus of natural language • Transcribed phone calls to an airline reservation service • ADIOS was trained on 12,700 sentences of ATIS-NL • The remaining 343 sentences were used to assess recall • Precision was determined with the help of 8 graduate students from Cornell University

  14. The ATIS experiments • ADIOS’ performance scores (40 learners) – • Recall – 40% • Precision – 70% • For comparison, ATIS-CFG reached – • Recall – 45% • Precision - <1%(!)

  15. ADIOS/ATIS-N comparison

  16. Meta-analysis of ADIOS results • Define a pattern spectrum as the histogram of pattern types for an individual learner • A pattern type is determined by its contents • E.g. TT, TET, EE, PE… • A single ADIOS learner was trained with each of 6 translations of the bible

  17. Pattern spectra

  18. Language dendogram

  19. So why doesn’t it work?

  20. Our experience • ADIOS does nicely on • ATIS-N • Childes • Artificial CFGs • Fails miserably on almost anything else • The Wall-Street Journal • Children’s literature • The Bible

  21. Results • CHILDES • Very high recall + precision • The ESL test • ATIS-N • Up to 70% recall (with 700 learners) • Superior language model • Children’s lit • Very few patterns are detected

  22. Some example sentences • Childes • baby go ing to go up the ladder ? • the dog won 't sit in the chaise lounge . • take the lady for a ride • Atis-n • i would like one coach reservation for may ninth from pittsburgh to atlanta leaving pittsburgh before ten o'clock in the morning • where is the stopover of american airlines flight five four five nine • what are the flights from boston to washington on october fifteenth nineteen ninety one

  23. Some example sentences • Children’s lit • The Tin Woodman and the Scarecrow didn ' t mind the dark at all , but Woot the Wanderer felt worried to be left in this strange place in this strange manner , without being able to see any danger that might threaten . • I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .

  24. Some corpus statistics

  25. Possible causes for failure I • Sentence complexity and structural diversity • CHILDES and ATIS-N have very few sentence ‘types’ • Most of which are simple, single-clause sentences • Children’s lit has many complex sentences with multiple clauses

  26. Types of complex sentences • Complementary clauses • Peter promised that he would come • Sue wants Peter to leave • Relative clauses • Sally bought the bike that was on sale • Is that the driver causing the accidents? • Adverbial clauses • He arrived when Mary was just about to leave • She left the door open to hear the baby • Coordinate clauses • He tried hard, but he failed

  27. That example again I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .

  28. Possible causes for failure • Sentence complexity and structural diversity • CHILDES and ATIS-N have very few sentence ‘types’ • Most of which are simple, single-clause sentences • Children’s lit has many complex sentences with multiple clauses • The music lesson

  29. Possible remedies • How do children do it? • Incremental learning • On the importance of starting small • How might we mimic that? • Sorting sentences according to complexity • Starting out with a simpler corpus • The problem of the growing lexicon

  30. dogcat horse dogcat horse cow P1: I like the _E1 P1: I like the _E1 _E1 = _E1 = Generalizing patterns New sentence: I like the cow

  31. dogcat horse dogcat horse finer P1: I like the _E1 P1: I like the _E1 _E1 = _E1 = May cause overgeneralization New sentence: I like the finer things in life

  32. dogcat horse P2: I like the red _E1 _E1 = dogcat horse P1: I like the _E1 _E1 = Allowing gaps New sentence: I like the red dog

  33. Another approach • Two-phase learning • Split complex sentences into simple clauses • Learn simple clauses • Combine results back to complex sentences and resume learning • Sidesteps the problem of the growing lexicon • Introduces the problem of identifying clause boundaries

  34. That example again I know that some of you have been waiting for this story of the Tin Woodman , because many of my correspondents have asked me , time and again what ever became of the " pretty Munchkin girl " whom Nick Chopper was engaged to marry before the Wicked Witch enchanted his axe and he traded his flesh for tin .

  35. Possible causes for failure II • Sentence complexity and structural diversity • Lexicon size vs. #sentences • Large lexicon might curtail alignments necessary for generalization

  36. Possible remedies • How do children do it? • Have access to semantic information • Which may be used for alignment • How can we mimic it? • Introducing pre-existing ECs • WordNet • Distributional Clustering • Semantic tagging?

  37. An aside - bootstrapping • Used for very small corpora • Iteratively do – • Train a set of learners on the current corpus • Generate sentences • Replace corpus with generated sentences • Problematic for large corpora • Must be performed by transforming the existing sentences

  38. A word about the code

  39. A little on Java classes • Similar to struct in C • Also allow the definition of class-specific functions • Data members may be • Private – only accessible to class functions • Public – accessible to everyone • Protected – like private, for most of our purposes

  40. The code • Consists of three packages • Com.ADIOS.Model – contains classes defining the graph (graph.java, node.java, edge.java, etc’) • Com.ADIOS.Algorithm – the ‘brains’ of the implementation (most importantly contains MarkovMatrix.java and Trainer.java) • Com.ADIOS.Helpers – various helper classes

  41. The model • Node • EquivalenceClass • Pattern • Edge • Path • Graph

  42. The algorithm • Trainer • MarkovMatrix • also finds new equivalence class • Generator • calculates recall and generates new sentences

  43. The main package • Main • Processes command line arguments (context window width, corpus file name, etc’) • Finals • A repository of constants used throughout the code

  44. The Model – Node.java • Data members • Label, inEdges, outEdges • Nontrivial functions • getOutEdges(Vector inEdges) • Returns the edges going out if this node that come from inEdges • getInEdges(Vector outEdges) • Same, only in other direction

  45. The model – EquivalenceClass.java • Inherits from Node • Additional data members – • Nodes • Nontrivial functions – • getOutEdges(), getOutEdges(Vector inEdges) • Same as in Node, only sums for all constituent nodes

  46. The model – Pattern.java • Inherits from Node • Additional data members • Id, path (the pattern specification)

  47. The model – Path.java • Data members – • Id, nodes • Nontrivial functions – • Init(StringTokenizer st) – inits the path according to a line of text • Squeeze(Pattern p, int, int) – finds the instances of p in the path and replaces them by the single node p • Does not rewire the graph!

  48. The model – Edge.java • Data members – • fromNode, toNode • prevEdge, nextEdge, • path • No nontrivial functions

  49. The model – Graph.java • Main data members – • nodes, edges, paths, equivalenceClasses, patterns • Nontrivial functions – • addPattern(Pattern p) – rewires the graph • Print functions – print various data to files

  50. The algorithm – MarkovMatrix.java • Main data members – • path, matrix, pathsCountMatrix • winSize, winIndex, wildcardIndex • ec • Nontrivial functions – • findWildcardCandidate() – generates the new equivalence class in the wildcard position • initMarkovMatrix() – calculates the matrix

More Related