secrets of neural network models l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Secrets of Neural Network Models PowerPoint Presentation
Download Presentation
Secrets of Neural Network Models

Loading in 2 Seconds...

play fullscreen
1 / 174

Secrets of Neural Network Models - PowerPoint PPT Presentation


  • 357 Views
  • Uploaded on

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Secrets of Neural Network Models' - elina


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
secrets of neural network models

Note: These slides have been provided online for the convenience of students attending the 2003 Merck summer school, and for individuals who have explicitly been given permission by Ken Norman. Please do not distribute these slides to third parties without permission from Ken (which is easy to get… just email Ken at knorman@princeton.edu).

Secrets of Neural Network Models

Ken Norman

Princeton University

July 24, 2003

slide2

The Plan, and Acknowledgements

  • The Plan:
  • I will teach you all of the the secrets of neural network models in 2.5 hours
  • Lecture for the first half
  • Hands-on workshop for the second half
  • Acknowledgements:
  • Randy O’Reilly
  • my lab: Greg Detre, Ehren Newman, Adler Perotte, and Sean Polyn
slide3

The Big Question

  • How does the gray glop in your head give rise to cognition?
  • We know a lot about the brain, and we also know a lot about cognition
  • The real challenge is to bridge between these two levels
slide4

Complexity and Levels of Analysis

  • The brain is very complex: billions of neurons, trillions of synapses, all changing every nanosecond
  • Each neuron is a very complex entity unto itself
  • We need to abstract away from this complexity!
  • Is there some simpler, higher level for describing what the brain does during cognition?
slide5

We want to draw on neurobiology for ideas about how the brain performs a particular kind of task

  • Our models should be consistent with what we know about how the brain performs the task
  • But at the same time, we want to include only aspects of neurobiology that are essential for explaining task performance
slide6

Learning and Development

  • Neural network models provide an explicit, mechanistic account of how the brain changes as a function of experience
  • Goals of learning:
  • To acquire an internal representation (a model) of the world that allows you to predict what will happen next, and to make inferences about “unseen” aspects of the environment
  • The system must be robust to noise/degradation/damage
  • Focus of workshop: Use neural networks to explore how the brain meets these goals
slide7

Outline of Lecture

  • What is a neural network?
  • Principles of learning in neural networks:
  • Hebbian learning: Simple learning rules that are very good at extracting the statistical structure of the environment (i.e., what things are there in the world, and how are they related to one another)
  • Shortcomings of Hebbian learning: It’s good at acquiring coarse category structure (prototypes) but it’s less good at learning about atypical stimuli and arbitrary associations
  • Error-driven learning: Very powerful rules that allow networks to learn from their mistakes
slide8

Outline, Continued

  • The problem of interference in neocortical networks, and how the hippocampus can help alleviate this problem
  • Brief discussion of PFC and how networks can support active maintenance in the face of distracting information
  • Background information for the “hands-on” portion of the workshop
slide9

Overall Philosophy

  • The goal is to give you a good set of intuitions for how neural networks function
  • I will simplify and gloss over lots of things.
  • Please ask questions if you don’t understand what I’m saying...
slide10

What is a neural network?

  • Neurons measure how much input they receive from other neurons; they “fire” (send a signal) if input exceeds a threshold value
  • Input is a function of firing rate and connection strength
  • Learning in neural networks involves adjusting connection strength
slide11

What is a neural network?

  • Key simplifications:
  • We reduce all of the complexity of neuronal firing to a single number, the activity of the neuron, that reflects how often the neuron is spiking
  • We reduce all of the complexity of synaptic connections between neurons to a single number, the synaptic weight, that reflects how strong the connection is
slide12

Neurons are Detectors

  • Each neuron is detecting some set of conditions (e.g., smoke detector). Representation is what is detected.
slide14

Detector Model

  • Neurons feed on each other’s outputs; layers of ever more complicated detectors
  • Things can get very complex in terms of content, but each neuron is still carrying out the basic detector function
two layer attractor networks
Two-layer Attractor Networks

Hidden Layer (Internal Representation)

Input/Output Layer

  • Model of processing in neocortex
  • Circles = units (neurons); lines = connections (synapses)
  • Unit brightness = activity; line thickness = synaptic weight
  • Connections are symmetric
two layer attractor networks16

I

Two-layer Attractor Networks

Hidden Layer (Internal Representation)

Input/Output Layer

  • Units within a layer compete to become active.
  • Competition is enforced by inhibitoryinterneurons that sample the amount of activity in the layer and send back a proportional amount of inhibition
  • Inhibitory interneurons prevent epilepsy in the network
  • Inhibitory interneurons are not pictured in subsequent diagrams
two layer attractor networks17

I

Two-layer Attractor Networks

Hidden Layer (Internal Representation)

Input/Output Layer

  • These networks are capable of sustaining a stable pattern of activity on their own.
  • “Attractor” = a fancy word for “stable pattern of activity”
  • Real networks are much larger than this, also > 1 unit is active in the hidden layer...
slide18

Properties of Two-Layer Attractor Networks

  • I will show that these networks are capable of meeting the “learning goals” outlined
  • Given partial information (e.g., seeing something that has wings and features), the networks can make a “guess” about other properties of that thing (e.g., it probably flies)
  • Networks show graceful degradation
slide28

Learning: Overview

  • Learning = changing connection weights
  • Learning rules: How to adjust weights based on local information (presynaptic and postsynaptic activity) to produce appropriate network behavior
  • Hebbian learning: building a statistical model of the world, without an explicit teacher...
  • Error-driven learning: rules that detect undesirable states and change weights to eliminate these undesirable states...
slide29

Building a Statistical Model of the World

  • The world is inhabited by things with relatively stable sets of features
  • We want to wire detectors in our brains to detect these things. How can we do this?
  • Answer: Leverage correlation
    • The features of a particular thing tend to appear together, and to disappear together; a thing is nothing more than a correlated cluster of features
    • Learning mechanisms that are sensitive to correlation will end up representing useful things
slide30

Hebbian Learning

  • How does the brain learn about correlations?
  • Donald Hebb proposed the following mechanism:
  • When the pre-synaptic neuron and post-synaptic neuron are active at the same time, strengthen the connection between them
    • “neurons that fire together, wire together”
slide34

Hebbian Learning

  • Proposed by Donald Hebb
  • When the pre-synaptic (sending) neuron and post-synaptic (receiving) neuron are active at the same time, strengthen the connection between them
    • “neurons that fire together, wire together”
  • When two neurons are connected, and one is active but the other is not, reduce the connections between them
    • “neurons that fire apart, unwire”
slide37

Biology of Hebbian Learning:

NMDA-Mediated Long-Term Potentiation

slide38

Biology of Hebbian Learning:

  • Long-Term Depression
  • When the postsynaptic neuron is depolarized, but presynaptic activity is relatively weak, you get weakening of the synapse
slide39

What Does Hebbian Learning Do?

  • Hebbian learning tunes units to represent correlated sets of input features.
  • Here is why:
  • Say that a unit has 1,000 inputs
  • In this case, turning on and off a single input feature won’t have a big effect on the unit’s activity
  • In contrast, turning on and off a large cluster of 900 input features will have a big effect on the unit’s activity
hebbian learning42
Hebbian Learning
  • Because small clusters of inputs do not reliably activate the receiving unit, the receiving unit does not learn much about these inputs
hebbian learning46
Hebbian Learning

Big clusters of inputs reliably activate the receiving unit, so the network learns more about big (vs. small) clusters

(the “gang effect”).

hebbian learning47
Hebbian Learning

Big clusters of inputs reliably activate the receiving unit, so the network learns more about big (vs. small) clusters

(the “gang effect”).

slide48

What Does Hebbian Learning Do?

  • Hebbian learning finds the thing in the world that most reliably activates the unit, and tunes the unit to like that thing even more!
hebbian learning49
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning50
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning51
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning52
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning53
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning54
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning55
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning56
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

hebbian learning57
Hebbian Learning

scaly

slithers

wings

beak

feathers

flies

slide58

What Does Hebbian Learning Do?

  • Hebbian learning finds the thing in the world that most reliably activates the unit, and tunes the unit to like that thing even more!
  • The outcome of Hebbian learning is a function of how well different inputs activate the unit, and how frequently they are presented
slide59

Self-Organizing Learning

  • One detector can only represent one thing (i.e., pattern of correlated features)
  • Goal: We want to present input patterns to the network and have different units in the network “specialize” for different things, such that each thing is represented by at least one unit
  • Random weights (different initial receptive fields) and competition are important for achieving this goal
  • What happens without competition ...
no competition
No Competition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

no competition61
No Competition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

no competition62
No Competition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

no competition63
No Competition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

no competition64
No Competition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

no competition65
No Competition

wings

beak

feathers

flies

scaly

slithers

lives

under

water

Without competition, all units end up representing the same “gang” of features; other, smaller correlations get ignored

competition is important
Competition is important

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important67
Competition is important

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important68
Competition is important

inhibition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important69
Competition is important

inhibition

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important70
Competition is important

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important71
Competition is important

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important72
Competition is important

scaly

slithers

lives

under

water

wings

beak

feathers

flies

competition is important73

striped

orange

sharp

teeth

furry

yellow

chirps

lives

under

water

Competition is important

When units have different initial “receptive fields” and they compete to represent input patterns, units end up representing different things

slide74

Hebbian Learning: Summary

  • Hebbian learning finds the thing in the world that most reliably activates the unit, and tunes the unit to like that thing even more
  • When:
    • There are multiple hidden units competing to represent input patterns
    • Each hidden unit starts out with a distinct receptive field
  • Then:
    • Hebbian learning will tune these units so that each thing in the world (i.e., each cluster of correlated features) is represented by at least one unit
problems with penguins
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins76
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins77
Problems with Penguins

inhibition

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins78
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins79
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins80
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins81
Problems with Penguins

inhibition

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins82
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins83
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

problems with penguins84
Problems with Penguins

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

slide85

Problems with Hebb, and Possible Solutions

  • Self-organizing Hebbian learning is capable of discovering the “high-level” (coarse) categorical structure of the inputs
  • However, it sometimes collapses across more subtle (but important) distinctions, and the learning rule does not have any provisions for fixing these errors once they happen
slide86

Problems with Hebb, and Possible Solutions

  • In the penguin problem, if we want the network to remember that typical birds fly, but penguins don’t, then penguins and typical birds need to have distinct (non-identical) hidden representations
  • Hebbian learning assigns the same hidden unit to penguins and typical birds
  • We need to supplement Hebbian learning with another learning rule that is sensitive to when the network makes an error(e.g., saying that penguins fly) and corrects the error by pulling apart the hidden representations of penguins vs. typical birds.
slide87

What is an error, exactly?

  • One common way of conceptualizing error is in terms of predictions and outcomes
  • If you give the network a partial version of a studied pattern, the network will make a prediction as to the missing features of that pattern (e.g., given something that has “feathers”, the network will guess that it probably flies)
  • Later, you learn what the missing features are (the outcome). If the network’s guess about the missing features is wrong, we want the network to be able to change its weights based on the difference between the prediction and the outcome.
  • Today, I will present the GeneRec error-driven learning rule developed by Randy O’Reilly.
error driven learning
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning89
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning90
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning91
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning92
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning93
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning94
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.
  • Outcome phase:
  • Present the full pattern
  • Let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning95
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.
  • Outcome phase:
  • Present the full pattern
  • Let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning96
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.
  • Outcome phase:
  • Present the full pattern
  • Let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning97
Error-Driven Learning
  • Prediction phase:
  • Present a partial pattern
  • The network makes a guess about the missing features.
  • Outcome phase:
  • Present the full pattern
  • Let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning98
Error-Driven Learning
  • We now need to compare these two activity patterns and figure out which weights to change.

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

slide99

Motivating the Learning Rule

  • The goal of error-driven learning is to discover an internal representation for the item that activates the correct answer.
  • Basically, we want to find hidden units that are associated with the correct answer (in this case, “waddles”).
  • The best way to do this is to examine how activity changes when “waddles” is clamped on during the “outcome” phase.
  • Hidden units that are associated with “waddles” should show an increase in activity in the outcome (vs. prediction) phase.
  • Hidden units that are not associated with “waddles” should show a decrease in activity in the outcome phase (because of increased competition from other units that are associated with “waddle”).
slide100

Motivating the Learning Rule

  • Hidden units that are associated with “waddle” should show an increase in activity in the outcome (vs. prediction) phase.
  • Hidden units that are not associated with “waddle” should show a decrease in activity in the outcome phase
  • Here is the learning role:
  • If a hidden unit shows increased activity (i.e., it’s associated with the correct answer), increase its weights to the input pattern
  • If a hidden unit should decreased activity (i.e., it’s not associated with the correct answer), reduce its weights to the input pattern
error driven learning101
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning102
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning103
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning104
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning105
Error-Driven Learning
  • Hebb and error have opposite effects on weights here!
  • Error increases the extent to which penguin is linked to the right-hand unit, whereas Hebb reinforced penguin’s tendency to activate the left-hand unit

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning106
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning107
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning108
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning109
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning110
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning111
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning112
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning113
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning114
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning115
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

error driven learning116
Error-Driven Learning

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

slide117

Catastrophic Interference

  • If you change the weights too strongly in response to “penguin”, then the network starts to behave like all birds waddle. New learning interferes with stored knowledge...
  • The best way to avoid this problem is to make small weight changes, and to interleave “penguin” learning trials with “typical bird” trials
  • The “typical bird” trials serve to remind the network to retain the association between wings/feathers/beak and “flies”...
interleaved training
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training119
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training120
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training121
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training122
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training123
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training124
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training125
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training126
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training127
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

interleaved training128
Interleaved Training

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

slide129

Gradual vs. One-Trial Learning

  • Problem: It appears that the solution to the catastrophic interference problem is to learn slowly.
  • But we also need to be able to learn quickly!
slide130

Gradual vs. One-Trial Learning

  • Put another way: There appears to be a trade-off between learning rate and interference in the cortical network
  • Our claim is that the brain avoids this trade-off by having two separate networks:
  • A slow-learning cortical network that gradually develops internal representations that support generalization, prediction, categorization, etc.
  • A fast-learning hippocampal network that is specialized for rapid memorization (but does not support generalization, categorization, etc.)
slide131

hippo-

campus

CA3

CA1

Dentate Gyrus

Entorhinal Cortex input

Entorhinal Cortex output

neo-

cortex

lower-level cortex

slide132

Interactions Between Hippo and Cortex

  • According to the Complementary Learning Systems theory (McClelland et al., 1995), hippocampus rapidly memorizes patterns of cortical activity.
  • The hippocampus manages to learn rapidly without suffering catastrophic interference because it has a built-in tendency to assign distinct, minimally overlapping representations to input patterns, even when they are very similar. Of course this hurts its ability to categorize.
slide133

Interactions Between Hippo and Cortex

  • The theory states that, when you are asleep, the hippocampus “plays back” stored patterns in an interleaved fashion, thereby allowing cortex to weave new facts and experiences into existing knowledge structures.
  • Even if something just happens once in the real world, hippocampus can keep re-playing it to cortex, interleaved with other events, until it sinks in...
  • Detailed theory:
  • slow-wave sleep = hippo playback to cortex
  • REM sleep = cortex randomly activates stored representations; this strengthens pre-existing knowledge and protects it against interference
role of the hippocampus
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus135
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus136
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus137
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus138
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus139
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus140
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

role of the hippocampus141
Role of the Hippocampus

hippocampus

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

slide142

Error-Driven Learning: Summary

  • Error-driven learning algorithms are very powerful: So long as the learning rate is small, and training patterns are presented in an interleaved fashion, algorithms like GeneRec can learn internal representations that support good “pattern completion” of missing features.
  • Error-driven learning is not meant to be a replacement for Hebbian learning: The two algorithms can co-exist!
  • Hebbian learning actually improves the performance of GeneRec by ensuring that hidden units represent meaningful clusters of features
slide143

Error-Driven Learning: Summary

  • Theoretical issues to resolve with error-driven learning: The algorithm requires that the network “know” whether you are in a “prediction” phase or an “outcome” phase, how does the network know this?
  • For that matter, the whole “phases” idea is sketchy
  • GeneRec based on “prediction/outcome” differences is not the only way to do error-driven learning...
    • Backpropagation
    • Learning by reconstruction
    • Adaptive Resonance Theory (Grossberg & Carpenter)
slide144

Learning by Reconstruction

  • Instead of doing error-driven learning by comparing predictions and outcomes, you can also do error-driven learning as follows:
  • First, you clamp the correct, full pattern onto the network and let it settle.
  • Then, you erase the input pattern and see whether the network can reconstruct the input pattern based on its internal representation
  • The algorithm is basically the same, you are still comparing two phases...
learning by reconstruction
Learning by Reconstruction
  • Clamp the to-be-learned pattern onto the input and let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

learning by reconstruction146
Learning by Reconstruction
  • Clamp the to-be-learned pattern onto the input and let the network settle
  • Next, wipe the input layer clean (but not the hidden layer) and let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

learning by reconstruction147
Learning by Reconstruction
  • Clamp the to-be-learned pattern onto the input and let the network settle
  • Next, wipe the input layer clean (but not the hidden layer) and let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

learning by reconstruction148
Learning by Reconstruction
  • Clamp the to-be-learned pattern onto the input and let the network settle
  • Next, wipe the input layer clean (but not the hidden layer) and let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

learning by reconstruction149
Learning by Reconstruction
  • Clamp the to-be-learned pattern onto the input and let the network settle
  • Next, wipe the input layer clean (but not the hidden layer) and let the network settle

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

learning by reconstruction150
Learning by Reconstruction
  • Compare hidden activity in the two phases and adjust weights accordingly (i.e., if activation was higher with the correct answer clamped, increase weights; if activation was lower, decrease wts)

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

learning by reconstruction151
Learning by Reconstruction
  • Compare hidden activity in the two phases and adjust weights accordingly (i.e., if activation was higher with the correct answer clamped, increase weights; if activation was lower, decrease wts)

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

lives in

Antarctica

wad-

dles

slithers

wings

beak

feathers

flies

adaptive resonance theory
Adaptive Resonance Theory

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory153
Adaptive Resonance Theory

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory154
Adaptive Resonance Theory

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory155
Adaptive Resonance Theory

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory156
Adaptive Resonance Theory

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory157
Adaptive Resonance Theory

MISMATCH!

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory158
Adaptive Resonance Theory

MISMATCH!

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory159
Adaptive Resonance Theory

MISMATCH!

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

adaptive resonance theory160
Adaptive Resonance Theory

MISMATCH!

lives in

Antarctica

waddles

slithers

wings

beak

feathers

flies

slide161

Spreading Activation vs. Active Maintenance

  • Spreading activation is generally very useful... it lets us make predictions/inferences/etc.
  • But sometimes you just want to hold on to a pattern of activation without letting activation spread (e.g., a phone number, or a person’s name).
  • How do we maintain specific patterns of activity in the face of distraction?
slide162

Spreading Activation vs. Active Maintenance

  • As you will see in the “hands-on” part of the workshop, the networks we have been discussing are not very robust to noise/distraction.
  • Thus, there appears to be another tradeoff:
    • Networks that are good at generalization/prediction are lousy at holding on to phone numbers/plans/ideas in the face of distraction
slide163

Spreading Activation vs. Active Maintenance

  • Solution: We have evolved a network that is optimized for active maintenance: Prefrontal cortex! This complements the rest of cortex, which is good at generalization but not so good at active maintenance.
  • PFC uses isolated representations to prevent spread of activity...
  • Evidence for isolated stripes in PFC
slide164

Tripartite Functional Organization

  • PC = posterior perceptual & motor cortex
  • FC = prefrontal cortex
  • HC = hippocampus and related structures
slide165

Tripartite Functional Organization

  • PC = incremental learning about the structure of the environment
  • FC = active maintenance, cognitive control
  • HC = rapid memorization
  • Roles are defined by functional tradeoffs…
slide166

Key Trade-offs

  • Extracting what is generally true (across events) vs. memorizing specific events
  • Inference (spreading activation) vs. robust active maintenance
slide167

Hands-On Exercises

  • The goal of the hands-on part of the workshop is to get a feel for the kinds of representations that are acquired by Hebbian vs. error-driven learning, and for network dynamics more generally.
slide168

Here is the network that we will be using:

  • Activity constraints: Only 10% of hidden units can be strongly active at once; in the input layer, only one unit per row
  • Think of each row in the input as a feature dimension (e.g., shape) and the units in that row are mutually exclusive features along that dimension (square, circle, etc.)
slide169

This diagram illustrates the connectivity of the network:

  • Each hidden unit is connected to 50% of the input units; there are also recurrent connections from each hidden unit to all of the other hidden units
  • Weights are symmetric
  • Initial weight values were set randomly
slide170

I trained up the network on the following 8 patterns:

Typical Bird

Number 1

Typical Bird

Number 2

Typical Fish

Number 2

Typical Fish

Number 1

Atypical Fish

(flying fish)

Typical Bird

Number 3

Atypical Bird

(duck)

Typical Fish

Number 3

  • In each pattern, the bottom 16 rows encode prototypical features that tend to be shared across patterns within a category; the top 8 rows encode item-specific features that are unique to each pattern.
  • Each category has 3 “typical” items and one “atypical” item
  • During training, the network studied typical patterns 90% of the time and it studied atypical patterns 10% of the time
slide171

To save time, the networks you will be using have been pre-trained on the 8 patterns (by presenting them repeatedly, in an interleaved fashion)

  • For some of the simulations, you will be using a network that was trained with (purely) Hebbian learning
slide172

For other simulations, you will be using a network that was trained with a combination of error-driven (GeneRec) and Hebbian learning. Training of this network use a three-phase design:

    • First, there was a “prediction” (minus) phase where a partial pattern was presented
    • Second, there was an “outcome” (plus) phase where the full version of the pattern was presented
    • Finally, there was a nothing phase where the input pattern was erased (but not the hidden pattern)
    • Error-driven learning occurred based on the difference in activity between the minus and plus patterns, and based on the differenced in activity between the plus and nothing patterns
slide173

When you get to the computer room, the simulation should already be open on the computer (some of you may have to double-up, I think there are slightly fewer computers than students) and there will be a handout on the desk explaining what to do

  • You can proceed at your own pace
  • I will be there to answer questions (about the lecture and about the computer exercises) and my two grad students Ehren Newman and Sean Polyn will also be there to answer questions.
slide174

Your Helpers

Ehren Sean me