1 / 18

Modelling Language Evolution Lecture 2: Learning Syntax

Modelling Language Evolution Lecture 2: Learning Syntax. Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit. Multi-layer networks. For many modelling problems, multi-layer networks are used Three layers are common: Input layer Hidden layer Output layer

duc
Download Presentation

Modelling Language Evolution Lecture 2: Learning Syntax

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modelling Language EvolutionLecture 2: Learning Syntax Simon Kirby University of Edinburgh Language Evolution & Computation Research Unit

  2. Multi-layer networks • For many modelling problems, multi-layer networks are used • Three layers are common: • Input layer • Hidden layer • Output layer • What do the hidden-node activations correspond to? • Internal representation • For some problems, networks need to compute an “intermediate” representation of the data

  3. XOR network - step 1 • XOR is the same as OR but not AND • Calculate OR • Calculate NOT AND • AND the results AND NOT AND OR

  4. XOR network - step 2 OUTPUT BIAS NODE -7.5 AND -7.5 5 5 7.5 HIDDEN 1 HIDDEN 2 NOT AND OR 10 10 -5 -5 INPUT 1 INPUT 2

  5. Simple example (Smith 2003) • Smith wanted to model a simple language-using population • Needed a model that learned vocabulary • 3 “meanings” (1 0 0), (0 1 0), (0 0 1) • 6 possible signals (0 0 0), (1 0 0) , (1 1 0) … • Used networks for reception and production: SIGNAL MEANING Train Perform MEANING SIGNAL • After training, knowledge of language stored in the weights • During reception/production, internal representation is in the activations of the hidden nodes

  6. Can a network learn syntax? (Elman 1993) • Important question for the evolution of language: • Modelling can tell us what we can do without • Can we model the acquisition of syntax using a neural network? • One problem… sentences can be arbitrarily long How much knowledge of grammar are we born with?

  7. Representing time • Imagine we presented words one at a time to a network • Would it matter what order the words were give? • No: Each word is a brand new experience • The net has no way of relating each experience with what has gone before • Needs some kind of working memory • Intuitively: each word needs to be presented along with what the network was thinking about when it heard the previous word

  8. The Simple Recurrent Net (SRN) • At each time step, the input is: • a new experience • plus a copy of the hidden unit activations at the last time step Output Copy back connections Hidden Input Context

  9. What inputs and outputs? • How do we force the network to learning syntactic relations? • Can we do it without an external “teacher”? • Answer: the next-word prediction task • Inputs: Current word (and context) • Outputs: Predicted next word • The error signal is implicit in the data

  10. Long distance dependencies and hierarchy • Elman’s question: how much is innate? • Many argue: • Long-distances dependencies and hierarchical embedding are “unlearnable” without innate language faculty • How well can an SRN learn them? • Examples: • boys who chase dogs see girls • cats chase dogs • dogs see boys who cats who mary feeds chase • mary walks

  11. First experiments • Each word encoded as a single unit “on” in the input.

  12. Initial results • How can we tell if the net has learned syntax? • Check whether it predicts the correct number agreement • Gets some things right, but makes many mistakes boys who girl chase see dog • Seems not to have learned long-distance dependency.

  13. Incremental input • Elman tried teaching the network in stages • Five stages: • 10,000 simple sentences (x 5) • 7,500 simple + 2,500 complex (x 5) • 5,000 simple + 5,000 complex (x 5) • 2,500 simple + 7,500 complex (x 5) • 10,000 complex sentences (x 5) • Surprisingly, this training regime lead to success!

  14. Is this realistic? • Elman reasons that this is in some ways like children’s behaviour • Children seem to learn to produce simple sentences first • Is this a reasonable suggestion? • Where is the incremental input coming from? • Developmental schedule appears to be a product of changing the input.

  15. Another route to incremental learning • Rather than the experimenter selecting simple, then complex sentences, could the network? • Children’s data isn’t changing… children are changing • Elman gets the network to change throughout its “life” • What is a reasonable way for the network to change? • One possibility: memory

  16. Reducing the attention span of a network • Destroy memory by setting context nodes to 0.5 • Five stages of learning (with both simple and complex sentences): • Memory blanked every 3-4 words (x 12) • Memory blanked every 4-5 words (x 5) • Memory blanked every 5-6 words (x 5) • Memory blanked every 6-7 words (x 5) • No memory limitations (x 5) • The network learned the task.

  17. Counter-intuitive conclusion: starting small • A fully-functioning network cannot learn syntax. • A network that is initially limited (but matures) learns well. • This seems a strange result, suggesting that networks aren’t good models of language learning after all • On the other hand… • Children mature during learning • Infancy in humans is prolonged relative to other species • Ultimate language ability seems to be related to how early learning starts • i.e., there is a critical period for language acquisition.

  18. Individual learning Cultural evolution Biological evolution Next lecture • We’ve seen how we can model aspects of language learning in simulations • What about evolution?

More Related