Modelling Language Evolution Lecture 1: Introduction to Learning

Modelling Language EvolutionLecture 1: Introduction to Learning Simon KirbyUniversity of Edinburgh Language Evolution & Computation Research Unit

Course Overview • Learning • Introduction to neural nets • Learning syntax • Evolution • Syntax • Learning bias and structure • Culture • Iterated learning • The Talking Heads (practical)

Computers for modelling • Computers in linguistics • Engineering (speech and language technologies) • Research tools (waveform analysis, psycholinguistic stimuli etc.) • Recently: modelling building • Why build models? • Why use computers? • What is a model anyway?

What is a model? • One view: • We use models when we can’t be sure what our theories predict • Especially useful when dealing with complex systems MODEL PREDICTION THEORY OBSERVATION

A simple example • Vowels exist in a “space” • Only some patterns arise cross-linguistically • E.g. vowel space seems to be symmetrically filled • Why?

Theory to Model • We need a theory to explain vowel-space universal • Possible theory: • Vowels tend to avoid being close to each other to maintain perceptual distinctiveness. • Use model to test theory (Liljencrants & Lindblom 1972) • In general, computational modelsare useful when dealing with“complex systems”

Individual learning Cultural evolution Biological evolution Is language a complex system? • Yes – evolution on many different timescales: • Computational models will help us understand these interactions…

Learning • Language learning is crucial to language evolution • What is learning? • Learning occurs when an organism changes its internal state on the basis of experience • What do we need to model learning? • a model of internal states • A model of experience • An algorithm to change 1 into 2

One approach: Neural nets • An approach to internal states based on the brain • An artificial neuron is a computational unit that sums inputs and uses them to decide whether to produce an output

Networks of neurons • Typically there will be many connected neurons • Information is stored in weights on the connections • Weights multiply signals sent between nodes • Signals into a node can be excitatory or inhibitory

An artificial neuron • Add up all the inputs multiplied by their weights • f(net) is the “activation function” that scales the input

A useful activation function • All or nothing for big excitations or inhibitions… • … but more sensitive in between.

AND: a very simple network • A network that works out if both inputs are activated: OUTPUT -7.5 5 5 BIAS NODE (always set to 1.0) INPUT 1 INPUT 2 • Network gives an output over 0.5 only if both inputs are 1.

OR: another very simple network • A network that works out if either input is activated: OUTPUT -7.5 10 10 BIAS NODE (always set to 1.0) INPUT 1 INPUT 2 • Network gives an output over 0.5 if either input is 1.

XOR: a difficult challenge • A network that works out if only one input is activated: OUTPUT ? ? ? BIAS NODE (always set to 1.0) INPUT 1 INPUT 2 • Solution needs more complex net with three layers. WHY?

XOR network - step 1 • XOR is the same as OR but not AND • Calculate OR • Calculate NOT AND • AND the results AND NOT AND OR

XOR network - step 2 OUTPUT BIAS NODE -7.5 AND -7.5 5 5 7.5 HIDDEN 1 HIDDEN 2 NOT AND OR 10 10 -5 -5 INPUT 1 INPUT 2

But what about learning? • We now have: • a model of internal states (connection weights) • a model of experience (inputs and outputs) • Learning: • set the weights in response to experience • How? • Compare network behaviour with “correct” behaviour • Adjust the weights to reduce network error

Error-driven learning • Set weights to random values • Present input pattern • Feed-forward activation through the network to get an output • Calculate difference between output and desired output (i.e. error) • Adjust weights so that the error is reduced • Repeat until network is producing the desired results.

Gradient descent • Gradient descent is a form of error-driven learning • Start on random point of “error surface” • Move on surface in direction of steepest slope • Potential problems: • May overshoot the global minimum • Might get stuck in local minimum

Example: learning past tense of verbs • Network that takes present tense form of verb… • …and produces past tense. • Uses examples to set weights • Generalises to add /-ed/ to verbs it’s never seen before. • Has it learnt a linguistic rule?

Is this psychologically plausible? • We need an error signal • Where does this error signal come from? • Possibilities: • A teacher • Reinforcement • The outcome of some prediction: • e.g. what’s the next word? • what’s the past tense of this verb?

Summary • Modelling tests theories • Computer modelling appropriate for complex systems • Language evolution involves several complex systems • Neural nets are one approach to modelling learning • Networks can be made to adapt to data through error-driven learning • Next lecture: how to model acquisition of syntax

Modelling Language Evolution Lecture 1: Introduction to Learning