9.012 Brain and Cognitive Sciences II

9.012Brain andCognitive Sciences II Part VIII: Intro to Language & Psycholinguistics - Dr. Ted Gibson

Presented by Liu Lab Fighting for Freedom with Cultured Neurons

Distributed Representations, Simple Recurrent Networks, And Grammatical Structure Jeffrey L. Elman (1991) Machine Learning Nathan Wilson

Distributed Representations/ Neural Networks • are meant to capture the essence of neural computation: many small, independent units calculating very simple functions in parallel.

Distributed Representations/ Neural Networks: EXPLICIT RULES?

Distributed Representations/ Neural Networks: EXPLICIT RULES? EMERGENCE!

Distributed Representations/ Neural Networks • are meant to capture the essence of neural computation: many small, independent units calculating very simple functions in parallel.

FeedForward Neural Network (from Sebastian’s Teaching)

Don’t forget the nonlinearity!

Recurrent Network (also from Sebastian)

Why Apply Network / Connectionist Modeling to Language Processing? • Connectionist Modeling is Good at What it Does • Language is a HARD problem

What We Are Going to Do

What We Are Going to Do • Build a network

What We Are Going to Do • Build a network • Let it learn how to “read”

What We Are Going to Do • Build a network • Let it learn how to “read” • Then test it!

What We Are Going to Do • Build a network • Let it learn how to “read” • Then test it! • Give it some words in a reasonably grammatical sentence • Let it try to predict the next word, • Based on what it knows about grammar

What We Are Going to Do • Build a network • Let it learn how to “read” • Then test it! • Give it some words in a reasonably grammatical sentence • Let it try to predict the next word, • Based on what it knows about grammar • BUT: We’re not going to tell it any of the rules

What We Are Going to Do • Build a network

Methods > Network Implementation > Structure 0000000000001 OUTPUT 100100100100100100100100 HIDDEN 1000000000000 INPUT

Methods > Network Implementation > Training 1. Encode Each Word with Unique Activation Pattern

Methods > Network Implementation > Training 1. Encode Each Word with Unique Activation Pattern • - boy => 000000000000000000000001 • girl => 000000000000000000000010 • feed => 000000000000000000000100 • -sees => 000000000000000000001000 • . . . • who => 010000000000000000000000 • End sentence => • 100000000000000000000000

Methods > Network Implementation > Training 1. Encode Each Word with Unique Activation Pattern • - boy => 000000000000000000000001 • girl => 000000000000000000000010 • feed => 000000000000000000000100 • -sees => 000000000000000000001000 • . . . • who => 010000000000000000000000 • End sentence => • 100000000000000000000000 2. Feed these words sequentially to the network (only feed words in sequences that make good grammatical sense!)

Methods > Network Implementation > Structure INPUT

Methods > Network Implementation > Structure 1000000000000 INPUT

Methods > Network Implementation > Structure HIDDEN 1000000000000 INPUT

Methods > Network Implementation > Structure 100100100100100100100100 HIDDEN 1000000000000 INPUT

Methods > Network Implementation > Structure OUTPUT 100100100100100100100100 HIDDEN 1000000000000 INPUT

Methods > Network Implementation > Training 1. Encode Each Word with Unique Activation Pattern • - boy => 000000000000000000000001 • girl => 000000000000000000000010 • feed => 000000000000000000000100 • -sees => 000000000000000000001000 • . . . • who => 010000000000000000000000 • End sentence => • 100000000000000000000000 2. Feed these words sequentially to the network (only feed words in sequences that make good grammatical sense!)

Methods > Network Implementation > Structure 0000000000001 OUTPUT 100100100100100100100100 HIDDEN If learning word relations, need some sort of memory from word to word! 1000000000000 INPUT

Recurrent Network (also from Sebastian)

Methods > Network Implementation > Structure 0000000000001 OUTPUT 100100100100100100100100 HIDDEN 1000000000000 100100100100100100100100 INPUT CONTEXT

Methods > Network Implementation > Structure BACKPROP! 0000000000001 OUTPUT 100100100100100100100100 HIDDEN 1000000000000 100100100100100100100100 INPUT CONTEXT

What We Are Going to Do • Build a network • Let it learn how to “read” • Then test it! • Give it some words in a reasonably grammatical sentence • Let it try to predict the next word, • Based on what it knows about grammar • BUT: We’re not going to tell it any of the rules

Results > Emergent Properties of Network > Subject-Verb Agreement • After Hearing: • “boy….” • Network SHOULD predict next word is: • “chases” • NOT: • “chase” • Subject and verb should agree!

Results > Emergent Properties of Network > Noun-Verb Agreement • After Hearing: • “boy….” • Network SHOULD predict next word is: • “chases” • NOT: • “chase” • Subject and verb should agree!

Results > Emergent Properties of Network > Noun-Verb Agreement boy….. End of Sentence “Who” Plural Verb, DO Impossible Plural Verb, DO Required Plural Verb, DO Optional What Word Network Predicts is Next Single Verb, DO Impossible Single Verb, DO Required Single Verb, DO Optional Plural Noun Single Noun 0.0 0.2 0.4 0.6 0.8 1.0 Activation

9.012 Brain and Cognitive Sciences II