itti cs564 brain theory and artificial intelligence university of southern california l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Itti: CS564 - Brain Theory and Artificial Intelligence University of Southern California PowerPoint Presentation
Download Presentation
Itti: CS564 - Brain Theory and Artificial Intelligence University of Southern California

Loading in 2 Seconds...

play fullscreen
1 / 120

Itti: CS564 - Brain Theory and Artificial Intelligence University of Southern California - PowerPoint PPT Presentation


  • 187 Views
  • Uploaded on

Itti: CS564 - Brain Theory and Artificial Intelligence University of Southern California. Lecture 28. Overview & Summary Reading Assignment: TMB2 Section 8.3 Supplementary reading: Article on Consciousness in HBTNN. You said “brain” theory??. First step: let’s get oriented!.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Itti: CS564 - Brain Theory and Artificial Intelligence University of Southern California' - sissy


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
itti cs564 brain theory and artificial intelligence university of southern california
Itti: CS564 - Brain Theory and Artificial IntelligenceUniversity of Southern California
  • Lecture 28. Overview & Summary
  • Reading Assignment:
  • TMB2 Section 8.3
  • Supplementary reading: Article on Consciousness in HBTNN
you said brain theory
You said “brain” theory??
  • First step: let’s get oriented!
major functional areas
Major Functional Areas
  • Primary motor: voluntary movement
  • Primary somatosensory: tactile, pain, pressure, position, temp., mvt.
  • Motor association: coordination of complex movements
  • Sensory association: processing of multisensorial information
  • Prefrontal: planning, emotion, judgement
  • Speech center (Broca’s area): speech production and articulation
  • Wernicke’s area: comprehen-
  • sion of speech
  • Auditory: hearing
  • Auditory association: complex
  • auditory processing
  • Visual: low-level vision
  • Visual association: higher-level
  • vision
slide7

http://www.radiology.wisc.edu/Med_Students/neuroradiology/fmri/http://www.radiology.wisc.edu/Med_Students/neuroradiology/fmri/

limbic system
Limbic System
  • Cortex “inside” the brain.
  • Involved in emotions, sexual behavior, memory, etc
  • (not very well known)
some general brain principles
Some general brain principles
  • Cortex is layered
  • Retinotopy
  • Columnar organization
  • Feedforward/feedback
layered organization of cortex
Layered Organization of Cortex
  • Cortex is 1 to 5mm-thick, folded at the surface of the brain
  • (grey matter), and organized as 6 superimposed layers.
  • Layer names:
  • 1: Molecular layer
  • 2: External granular layer
  • 3: External pyramidal layer
  • 4: internal granular layer
  • 5: Internal pyramidal layer
  • 6: Fusiform layer
  • Basic layer functions:
  • Layers 1/2: connectivity
  • Layer 4: Input
  • Layers 3/5: Pyramidal cell bodies
  • Layers 5/6: Output
retinotopy
Retinotopy
  • Many visual areas are organized as retinotopic maps: locations next
  • to each other in the outside world are represented by neurons close
  • to each other in cortex.
  • Although the topology is thus preserved, the mapping typically is highly non-linear (yielding large deformations in representation).
  • Stimulus shown on screen… and corresponding activity in cortex!
columnar organization
Columnar Organization
  • Very general principle in cortex: neurons processing similar “things” are grouped together in small patches, or “columns,” or cortex.
  • In primary visual cortex… as in higher (object recognition) visual areas…
  • and in many, non-visual, areas as well (e.g., auditory, motor, sensory, etc).
interconnect
Interconnect

Felleman & Van Essen, 1991

neurons
Neurons???
  • Abstracting from biological neurons to neuron models
the basic biological neuron
The "basic" biological neuron
  • The soma and dendrites act as the input surface; the axon carries the outputs.
  • The tips of the branches of the axon form synapses upon other neurons or upon effectors (though synapses may occur along the branches of an axon as well as the ends). The arrows indicate the direction of "typical" information flow from inputs to outputs.
transmenbrane ionic transport
Transmenbrane Ionic Transport
  • Ion channels act as gates that allow or block the flow of
  • specific ions into and out of the cell.
action potential and ion channels
Action Potential and Ion Channels
  • Initial depolarization due to opening sodium (Na+) channels
  • Repolarization due to opening potassium (K+) channels
  • Hyperpolarization happens because K+ channels stay open longer than Na+ channels (and longer than necessary to exactly come back to resting potential).
warren mcculloch and walter pitts 1943
Warren McCulloch and Walter Pitts (1943)
  • A McCulloch-Pitts neuronoperates on a discrete time-scale, t = 0,1,2,3, ... with time tick equal to one refractory period
  • At each time step, an input or output is
  • on or off — 1 or 0, respectively.
  • Each connection or synapse from the output of one neuron to the input of another, has an attached weight.
from logical neurons to finite automata
From Logical Neurons to Finite Automata

1

Brains, Machines, and

1.5

Mathematics, 2nd Edition,

1987

AND

1

Boolean Net

1

®

X Y

0.5

OR

1

X

NOT

Finite

0

Automaton

-1

Y

Q

leaky integrator neuron
Leaky Integrator Neuron
  • The simplest "realistic" neuron model is a continuous time model based on using
  • the firing rate (e.g., the number of spikes traversing the axon in the most recent 20 msec.)
  • as a continuously varying measure of the cell's activity
  • The state of the neuron is described by a single variable, the membrane potential.
  • The firing rate is approximated by a sigmoid, function of membrane potential.
leaky integrator model
Leaky Integrator Model

m(t)

m(t)

m(t)

  • t = - m(t) + h
  • has solution m(t) = e-t/t m(0) + (1 - e-t/t)h
  •  h for time constant t > 0.
  • We now add synaptic inputs to get the
  • Leaky Integrator Model:
  • t = - m(t) + i wi Xi(t) + h
  • where Xi(t) is the firing rate at the ith input.
  • Excitatory input (wi > 0) will increase
  • Inhibitory input (wi < 0) will have the opposite effect.
models of what
Models of what?
  • We need data to constrain the models
  • Empirical data comes from various experimental techniques:
  • Physiology
  • Psychophysics
  • Various imaging
  • Etc.
electrode setup
Electrode setup
  • drill hole in cranium under anesthesia
  • install and seal “recording chamber”
  • - allow animal to wake up and heal
  • because there are no pain receptors
  • in brain, electrodes can then
  • be inserted & moved in chamber
  • with no discomfort to animal.
example yes no task
Example: yes/no task

+

time

Example of contrast discrimination using yes/no paradigm.

  • subject fixates cross.
  • subject initiates trial by pressing space bar.
  • stimulus appears at random location, or may not appear at all.
  • subject presses “1” for “stimulus present” or “2” for “stimulus absent.”
  • if subject keeps giving correct answers, experimenter decreases contrast of stimulus (so that it becomes harder to see).
staircase procedure
Staircase procedure
  • Staircase procedure is a method for adjusting stimulus to each observer such as to find the observer’s threshold. Stimulus is parametrized, and parameter(s) are adjusted during experiment depending on responses.
  • Typically:
  • - start with a stimulus that is very easy to see.
  • - 4 consecutive correct answers make stimulus more difficult to see by a fixed amount.
  • - 2 consecutive incorrect answers make stimulus easier to see by a fixed amount.
bold contrast
BOLD contrast

The magnetic properties of blood change with

the amount of oxygenation

resulting in small signal changes

with oxygenated

with deoxygenated

blood

blood

Bo

Bo

vascular system
Vascular System

arteries

arterioles

capillaries

venules

veins

(<0.1mm)

(<0.1mm)

oxygen consumpsion
Oxygen consumpsion

The exclusive source of metabolic energy

of the brain is glycolysis:

C6H12O6

+ 6 O2

6 H2O

+ 6 CO2

bold contrast41
BOLD Contrast

stimulation

neuronal activation

metabolic changes

hemodynamic changes

local susceptibility changes

MR-signal changes

signal detection

data processing

functional image

example of blocked paradigm
Example of Blocked paradigm

Gandhi et al., 1999

first bold effect experiment
First BOLD-effect experiment
  • Kwong and colleagues at Mass. General Hospital (Boston).
  • Stimulus: flashing light.
case study vision
Case study: Vision
  • Vision is the most widely studied brain function
  • Our goals:
  • analyze fundamental issues
  • Understand basic algorithms that may address those issues
  • Look at computer implementations
  • Look at evidence for biological implementations
  • Look at neural network implementations
origin of center surround
Origin of Center-Surround
  • Neurons at every location receive inhibition from neurons at neighboring locations.
origin of orientation selectivity
Origin of Orientation Selectivity
  • Feedforward model of Hubel & Wiesel: V1 cells receive inputs from LGN cells arranged along a given orientation.
oriented rfs
Oriented RFs

Gabor function:

product of a grating and

a Gaussian.

Feedforward model:

equivalent to convolving

input image by sets of

Gabor filters.

cortical hypercolumn
Cortical Hypercolumn
  • A hypercolumn
  • represents one visual
  • location, but many
  • visual attributes.
  • Basic processing “module”
  • in V1.
  • “Blobs”: discontinuities
  • in the columnar structure.
  • Patches of neurons concerned
  • mainly with color vision.
from neurons to mind
From neurons to mind
  • A good conceptual intermediary between patterns of neural activity and mental events is provided by the schema theory
slide55

The Famous

Duck-Rabbit

From Schemas

to Schema

Assemblages

bringing in context
Bringing in Context

For Further Reading:

TMB2:

Section 5.2 for the VISIONS system for schema-based interpretation of visual scenes.

HBTNN:

Visual Schemas in Object Recognition and Scene Analysis

a first useful network
A First “Useful” Network
  • Example of fully-engineered neural net that performs
  • Useful computation: the Didday max-selector
  • Issues:
  • how can we design a network that performs a given task
  • How can we analyze non-linear networks
winner take all networks
Winner-take-all Networks
  • Goal: given an array of inputs, enhance the strongest (or strongest few) and suppress the others

No clear strong input yields

global suppression

Strongest input is enhanced

and suppresses other inputs

didday s model
Didday’s Model

= inhibitory inter-neurons

retinotopic

input

= copy of input

= receives excitation

from foodness layer

and inhibition from

S-cells

nn physics
NN & Physics
  • Perceptrons = layered networks, weights tuned to learn
  • A given input/output mapping
  • Winner-take-all = specific recurrent architecture for specific purpose
  • Now: Hopfield nets = view neurons as physical entities and analyze network using methods inspired from statistical physics
hopfield networks
Hopfield Networks
  • AHopfield net(Hopfield 1982) is a net of such units subject to the asynchronous rule for updating one neuron at a time:
  • "Pick a unit i at random.
  • If wijsj qi, turn it on.
    • Otherwise turn it off."
  • Moreover, Hopfield assumes symmetric weights:
  • wij= wji
energy of a neural network
“Energy” of a Neural Network
  • Hopfield defined the “energy”:
  • E = - ½  ij sisjwij + i siqi
  • If we pick unit i and the firing rule (previous slide) does not change its si, it will not change E.
s i 0 to 1 transition
si: 0 to 1 transition
  • If si initially equals 0, and  wijsj qi
  • then si goes from 0 to 1 with all other sj constant,
  • and the "energy gap", or change in E, is given by
  • DE = - ½ j (wijsj + wjisj) + qi
  • = - (j wijsj - qi) (by symmetry)
  •  0.
s i 1 to 0 transition
si: 1 to 0 transition
  • If si initially equals 1, and  wijsj < qi
  • then si goes from 1 to 0 with all other sj constant
  • The "energy gap," or change in E, is given, for symmetric wij, by:
  • DE = j wijsj - qi < 0
  • On every updating we have DE  0
minimizing energy
Minimizing Energy
  • On every updating we have DE  0
  • Hence the dynamics of the net tends to move E toward a minimum.
  • We stress that there may be different such states — they are local minima. Global minimization is not guaranteed.
attractors
Attractors
  • 1. The state vector comes to rest, i.e. the unit activations stop changing. This is called afixed point. For given input data, the region of initial states which settles into a fixed point is called itsbasin of attraction.
  • 2. The state vector settles into a periodic motion, called alimit cycle.

For all recurrent networks of interest (i.e., neural networks comprised of leaky integrator neurons, and containing loops), giveninitial state and fixed input, there are just three possibilities for the asymptotic state:

strange attractors
Strange attractors
  • 3.Strange attractorsdescribe such complex paths through the state space that, although the system is deterministic, a path which approaches the strange attractor gives every appearance of being random.
  • Two copies of the system which initially have nearly identical states will grow more and more dissimilar as time passes.
  • Such a trajectory has become the accepted mathematical model ofchaos,and is used to describe a number of physical phenomena such as the onset of turbulence in weather.
the traveling salesman problem 1
The traveling salesman problem 1
  • There are n cities, with a road of length lij joining
  • city i to city j.
  • The salesman wishes to find a way to visit the cities that
  • isoptimal in two ways: each city is visited only once, and
  • the total route is as short as possible.
  • This is an NP-Complete problem: the only known algorithms (so far) to solve it have exponential complexity.
associative memories
Associative Memories
  • http://www.shef.ac.uk/psychology/gurney/notes/l5/l5.html
  • Idea: store:
  • So that we can recover it if presented
  • with corrupted data such as:
associative memory with hopfield nets
Associative memory with Hopfield nets
  • Setup a Hopfield net such that local minima correspond
  • to the stored patterns.
  • Issues:
  • -because of weight symmetry, anti-patterns (binary reverse) are stored as well as the original patterns (also spurious local minima are created when many patterns are stored)
  • -if one tries to store more than about 0.14*(number of neurons) patterns, the network exhibits unstable behavior
  • - works well only if patterns are uncorrelated
learning
Learning
  • All this is nice, but finding the synaptic weights that achieve a given computation is hard (e.g., as shown in the TSP example or the Didday example).
  • Could we learn those weights instead?
simple vs general perceptrons
Simple vs. General Perceptrons
  • The associator units are not interconnected, and so
  • the simple perceptron has no short-term memory.
  • If cross-connections are present between units, the perceptron is called cross-coupled - it may then have multiple layers, and loops back from an “earlier” to a “later” layer.
linear separability
Linear Separability
  • A linear function of the form
  • f(x) = w1x1+ w2x2+ ... wdxd+wd+1 (wd+1 = - q)
  • is atwo-category pattern classifier.
  • f(x) = 0  w1x1+ w2x2+ ... wdxd+wd+1 = q
  • gives ahyperplaneas the decision surface
  • Training involves adjusting the coefficients (w1,w2,...,wd,wd+1) so that the decision surface produces an acceptable separation of the two classes.
  • Two categories arelinearly
  • separablepatterns if in fact
  • an acceptable setting of such
  • linear weights exists.
classic models for adaptive networks
Classic Models for Adaptive Networks
  • The two classic learning schemes for McCulloch-Pitts
  • formal neurons Si wixi q
  •  Hebbian Learning (The Organization of Behaviour 1949)
    • — strengthen a synapse whose activity coincides with the firing of the postsynaptic neuron
    • [cf. Hebbian Synaptic Plasticity, Comparative and Developmental Aspects (HBTNN)]
  •  The Perceptron (Rosenblatt 1962)
    • — strengthen an active synapse if the efferent neuron fails to fire when it should have fired;
    • — weaken an active synapse if the efferent neuron fires when it should not have fired.
hebb s rule
Hebb’s Rule

xj

yi

The simplest formalization of Hebb’s rule is to increase wij by: wij = k yi xj (1)

  • where synapse wij connects a presynaptic neuron with firing rate xj to a postsynaptic neuron with firing rate yi.
  • Peter Milner noted the saturation problem
  • von der Malsburg 1973 (modeling the development of oriented edge detectors in cat visual cortex [Hubel-Wiesel: simple cells])
  • augmented Hebb-type synapses with:
  • - a normalization ruleto stop all synapses "saturating"
  • S wi = Constant
  • - lateral inhibitionto stop the first "experience" from "taking over" all "learning circuits:” it prevents nearby cells from acquiring the same patternthus enabling the set of neurons to "span the feature space"
perceptron learning rule
Perceptron Learning Rule
  • The best known perceptron learning rule
    • strengthens an active synapse if the efferent neuron fails to

fire when it should have fired, and

    • weakens an active synapse if the neuron fires when it should not have:
  • Dwij = k (Yi - yi) xj (2)
  • As before, synapse wij connects a neuron with firing rate xj to a neuron with firing rate yi, but now
    • Yi is the "correct" output supplied by the "teacher."
  • The rule changes the response to xj in the right direction:
      • If the output is correct, Yi = yi and there is no change, Dwij = 0.
      • If the output is too small, then Yi - yi > 0, and the change in wij will add Dwij xj = k (Yi - yi) xj xj > 0 to the output unit's response to (x1, . . ., xd).
      • If the output is too large, Dwij will decrease the output unit's response.
back propagation
Back-Propagation
  • Backpropagation: a method for training a loop-free network which has three types of unit:
  • input units;
  • hidden units carrying an internal representation;
  • output units.
example face recognition
Example: face recognition
  • Here using the 2-stage approach:
non associative and associative reinforcement learning
Non-Associative and Associative Reinforcement Learning
  • Non-associative reinforcement learning, the only input to the learning system is the reinforcement signal
  • Objective: find the optimal action
  • Associative reinforcement learning, the learning system also receives information about the process and maybe more.
  • Objective: learn an associative mapping that produces the optimal action on any trial as a function of the stimulus pattern present on that trial.

[Basically B, but with new labels]

self organizing feature maps
Self-Organizing Feature Maps
  • Localized competition & cooperation yield emergent
  • global mapping
capabilities and limitations of layered networks
Capabilities and Limitations of Layered Networks
  • To approximate a set of functions of the inputs by
  • A layered network with continuous-valued units and
  • Sigmoidal activation function…
  • Cybenko, 1988: … at most two hidden layers are necessary, with arbitrary accuracy attainable by adding more hidden units.
  • Cybenko, 1989: one hidden layer is enough to approximate any continuous function.
  • Intuition of proof: decompose function to be approximated into a sum of localized “bumps.” The bumps can be constructed with two hidden layers.
  • Similar in spirit to Fourier decomposition. Bumps = radial basis functions.
optimal network architectures
Optimal Network Architectures
  • How can we determine the number of hidden units?
  • genetic algorithms: evaluate variations of the network, using a metric that combines its performance and its complexity. Then apply various mutations to the network (change number of hidden units) until the best one is found.
  • Pruning and weight decay:

- apply weight decay (remember reinforcement learning) during training

- eliminate connections with weight below threshold

- re-train

- How about eliminating units? For example, eliminate units with total synaptic input weight smaller than threshold.

large network example
Large Network Example
  • Example of network with many cooperating brain areas:
  • Dominey & Arbib
  • Issues:
  • how to use empirical data to design overall architecture?
  • How to implement?
  • How to test?
slide84

Filling in the Schemas: Neural Network Models Based on Monkey NeurophysiologyPeter Dominey & Michael Arbib: Cerebral Cortex, 2:153-175 Develop hypotheses onNeural Networksthat yield an equivalent functionality: mappingschemas (functions)to the cooperative cooperation of sets ofbrain regions (structures)

low level processing
Low-Level Processing
  • Remember: Vision as a change in representation.
  • At the low-level, such change can be done by fairly streamlined mathematical transforms:
  • - Fourier transform
  • - Wavelet transform
  • these transforms yield a simpler but more organized image of the input.
  • Additional organization is obtained through multiscale representations.
laplacian edge detection
Laplacian Edge Detection
  • Edges are defined as zero-crossings of the second derivative (Laplacian if more than one-dimensional) of the signal.
  • This is very sensitive to image noise; thus typically we first blur the image to reduce noise. We then use a Laplacian-of-Gaussian filter to extract edges.

Smoothed signal

First derivative (gradient)

illusory contours
Illusory Contours
  • Some mechanism is responsible for our illusory perception of contours where there are none…
correspondence problem
Correspondence problem
  • Segment & recognize objects in each eye separately first,
  • then establish correspondence?
  • No! (at least not only): Julesz’ random-dot stereograms
higher visual function
Higher Visual Function
  • Examine components of mid/high-level vision:
  • Attention
  • Object recognition
  • Gist
  • Action recognition
  • Scene understanding
  • Memory & consciousness
challenges of object recognition
Challenges of Object Recognition
  • The binding problem: binding different features (color, orientation, etc) to yield a unitary percept. (see next slide)
  • Bottom-up vs. top-down processing: how
  • much is assumed top-down vs. extracted
  • from the image?
  • Perception vs. recognition vs. categorization: seeing an object vs. seeing is as something. Matching views of known objects to memory vs. matching a novel object to object categories in memory.
  • Viewpoint invariance: a major issue is to recognize objects irrespectively of the viewpoint from which we see them.
eye movements
Eye Movements
  • 1) Free examination
  • 2) estimate material
  • circumstances of family
  • 3) give ages of the people
  • 4) surmise what family has
  • been doing before arrival
  • of “unexpected visitor”
  • 5) remember clothes worn by
  • the people
  • 6) remember position of people
  • and objects
  • 7) estimate how long the “unexpected
  • visitor” has been away from family
several problems
Several Problems…
  • with the “progressive visual buffer hypothesis:”
  • Change blindness:
  • Attention seems to be required for us to perceive change in images, while these could be easily detected in a visual buffer!
  • Amount of memory required is huge!
  • Interpretation of buffer contents by high-level vision is very difficult if buffer contains very detailed representations (Tsotsos, 1990)!
the world as an outside memory
The World as an Outside Memory
  • Kevin O’Regan, early 90s:
  • why build a detailed internal representation of the world?
  • too complex…
  • not enough memory…
  • … and useless?
  • The world is the memory. Attention and the eyes are a look-up tool!
the attention hypothesis
The “Attention Hypothesis”
  • Rensink, 2000
  • No “integrative buffer”
  • Early processing extracts information up to “proto-object” complexity in massively parallel manner
  • Attention is necessary to bind the different proto-objects into complete objects, as well as to bind object and location
  • Once attention leaves an object, the binding “dissolves.” Not a problem, it can be formed again whenever needed, by shifting attention back to the object.
  • Only a rather sketchy “virtual representation” is kept in memory, and attention/eye movements are used to gather details as needed
gist of a scene
Gist of a Scene
  • Biederman, 1981:
  • from very brief exposure to a scene (120ms or less), we can already extract a lot of information about its global structure, its category (indoors, outdoors, etc) and some of its components.
  • “riding the first spike:” 120ms is the time it takes the first spike to travel from the retina to IT!
  • Thorpe, van Rullen:
  • very fast classification (down to 27ms exposure, no mask), e.g., for tasks such as “was there an animal in the scene?”
one lesson
One lesson…
  • From 50+ years of research…
  • Solving vision in general is impossible!
  • But solving purposive vision can be done. Example: vision for action.
grip selectivity in a single aip cell
Grip Selectivity in a Single AIP Cell

A cell that is selective for side opposition (Sakata)

fars fagg arbib rizzolatti sakata model overview
FARS (Fagg-Arbib-Rizzolatti-Sakata) Model Overview

AIP

Dorsal

Stream:

Affordances

Ways to grab this “thing”

Task Constraints

(F6)

Working Memory

(46?)

Instruction Stimuli

(F2)

Ventral

Stream:

Recognition

“It’s a mug”

IT

PFC

AIP extracts the set of affordances for an attended object.These affordances highlight the features of the object relevant to physical interaction with it.

at and df how versus what
AT and DF: "How" versus "What"

reach programming

Parietal

Cortex

How (dorsal)

grasp programming

Visual

Cortex

What (ventral)

Inferotemporal

Cortex

  • Lesson: Even schemas that seem to be normally under conscious control can in fact proceed without our being conscious of their activity.
  • “What” versus “How”:

AT: Goodale and Milner: object parameters for grasp (How) but not for saying or pantomiming

DF: Jeannerod et al.: saying and pantomiming (What) but no “How” except for familiar objects with specific sizes.