Connectionism consciousness
1 / 108

Connectionism & Consciousness - PowerPoint PPT Presentation

  • Uploaded on

Connectionism & Consciousness. Connectionism & Consciousness. This week’s question: How have connectionism, AI and dynamical systems influenced cognitive information processing accounts and what issues do they raise with regard to conscious and unconscious processing?.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Connectionism & Consciousness' - wilton

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Connectionism consciousness
Connectionism &Consciousness

Connectionism consciousness1
Connectionism & Consciousness

  • This week’s question:

    How have connectionism, AI and dynamical

    systems influenced cognitive information

    processing accounts and what issues do they

    raise with regard to conscious and unconscious


Connectionism consciousness2
Connectionism & Consciousness

  • Serial vs. parallel processing

  • Computer metaphor

  • Information processing approach

Serial to parallel processing models example memory
Serial to parallel processing models example: memory

How is memory organised?

If memory used a library addressing system, memory errors

would be unpredictable. BUT memory errors tend to be near

misses that are related in terms of meaning.

If asked a question that demands information you have

not encoded directly, your memory system often pulls related

information that allows you to make an inference to answer

the question.So the question is:

How does the memory system ‘know’ the right information to


The hierarchical theory (Collins & Quillan, 1972)

  • Memory consists of nodes and links.

  • Nodes become activated when concepts that they represent

    are present in the environment.

  • Links represent the relationship between concepts and in a

    hierarchical structure can provide property descriptions of concepts.

    (animal to bird to chicken) :

    true or false?

    1,000msec A canary is a canary

    1,160msec A canary is a bird

    1,240msec A canary is an animal

    BUT did not always hold:

    sometimes faster at ‘a chicken is an animal’ than ‘a chicken is a bird’

  • The hierarchical theory did not therefore appear to have cognitive economy [cognitive system that conserves resources].

Spreading activation theories Collin & Loftus (1975)

Links represent associations between semantically related concepts.

Stimulation form the environment activates nodes that send some

activation to linked nodes which can also become active.

Rumelhart, Hinton & McClelland (1986) proposed 6 properties of

a semantic network:

  • A set of units (each represents a concept)

  • Each unit has its own state of activation

  • Output function – units pass activation to one another

  • Pattern of connectivity – links of different strengths (weights)

  • Activation rule that determines how the input activations to a node should be combined

  • Learning rules to change weights

Evidence for Spreading activation theories

Concepts in memory become active and when activity

surpasses a threshold they enter awareness.

Repetition priming.

Semantic priming.


Vagueness about how to determine location & strength of links

If one item primes another - assume it is linked (becomes circular)

How far does the activation spread

Mediated priming –lion primes stripes (presumably through tiger)

Radcliff & MacKoon (1994) if each word spreads activation to 20

words in 3 steps have 8,000 concepts – spreading activation becomes

pointless if most concepts active most of the time in ordinary

conversation – (does this point to the important of context?)

Parallel Distribution Processing – distributed representation

Concept is not represented with a local representation but is

distributed over a number of nodes simultaneous.

If part of the system is damaged it does not shut down but performance

gets worse (degradation).

Degradation also seems to be a feature of human memory –minimal

memory damage causes minimal memory loss (unlike computer

memory where minimal damage can be catastrophic).

Learning ability- automatically finds prototypes & exceptions to


Generalisation –responds to new stimulus the way responded to old

stimulus – a key property of human memory.

Criticisms of Parallel Distribution Processing representation

Catastrophic interference if learns one set and then another set of

associations – not a feature of human memory.

Piker & Prince aspects of childrens’ learning that are not accounted for

by the model and that require implementation of rules that are difficult

for PDP models.

Computer metaphor
Computer metaphor representation

  • Mind as information processor

  • Brain processes symbols and stores them in LTM

  • Cognitive processes take time – use of experiments to measure reaction time and infer processes

  • Mind has a limited processing capacity

  • Symbol system has a neurological basis (brain)

Early serial theories
Early serial theories representation

  • Emphasis on a sequence of stages

  • Example Atkinson & Shiffrin’s (1968) serial theory of memory




Short term


Long term


Displaced information

Serial theory of speech production
Serial theory of speech production representation

Thought (meaning)

Grammatical structure

Selection of words

Order of words

Selection of phonological code

Serial theory of speech production1
Serial theory of speech production representation

Can be demonstrated experimentally

Supply the appropriate words for the following:

  • A tyrant, absolute ruler

  • Large black beetle used in Egyptian hieroglyphs

  • Large hairy elephant that lived in the Pleistocene

  • Having leaves that fall in autumn





Serial theory of speech production2
Serial theory of speech production representation

Tip-of-the-tongue phenomenon

Results suggest that word meaning is accessed first

(feeling of knowing)

Then the sound form (phonological) of the word is accessed

(but may fail!)

Results argue for 2-stage process of lexical retrieval.

Alternative parallel processing
Alternative: parallel processing representation

  • Rather than speech production involving a sequence of stages

  • Perhaps activation spreads to many stages of processing simultaneously

Thought (meaning)

Order of words

Grammatical structure)

Selection of words

Selection of phonological code

Alternative parallel processing connectionism rumelhart mcclelland 1986
Alternative parallel processing: representationConnectionism: (Rumelhart & McClelland, 1986)

  • Brain metaphor

  • Millions of interconnected neurons

  • Activity flows along connections rather than being stored in one distinct location

  • Stored in simple nodes and connections as a pattern of activity

Alternative parallel processing connectionism rumelhart mcclelland 19861
Alternative parallel processing: representationConnectionism: (Rumelhart & McClelland, 1986)

Pictures of neurons





pattern of activation

Holistic distributed debate associative semantic memory debate
Holistic / Distributed Debate representationAssociative/ Semantic Memory Debate

Associative priming:

nodes and connection strength

More frequent activation - stronger connection strength

Nodes as entities

Associative strength data bases

Associative strength data bases representation

Ask 100 people : what is the first word that comes to mind when I say

‘salt’ most say ‘pepper’ but associative strength is not equivalently

bidirectional because if presented with the word ‘pepper’ most say ‘corn’

Note theses are associative-semantic pairs because as well as strong

associative strengths all are types are food and therefore also have a

semantic relationship.

Associative only pairs = traffic jam – no semantic connection between the

words apart from the association that is a symbol for a 3rd and completely

separate concept.

High associative strength >.1 (tomato-sauce)

Low associative strength = 0 ! – absent from tables! (tiger – neuron)

Alternative parallel processing connectionism
Alternative parallel processing: representationConnectionism

Semantic priming:

nodes and connection strength

More frequent activation - stronger connection strength

Nodes as properties NOT entities

A pattern of activation within the network results in the

concept being activated.

Red, round, can be eaten, grows on trees, juicy = ?

Semantic distance data bases

>.3 = high semantic distance score ( bag – box)

< .1 = low semantic distance score (chatter – box)

Top down and bottom up processing
Top down and bottom up processing representation

  • Higher level processes (top-down) affect more basic level processing (bottom-up)

  • Semantic priming

  • Read this

The procedure is quite simple. First you arrange things into

two different groups. Of course one pile may be sufficient

depending on how much there is to do. If you have to go

somewhere else due to lack of facilities, that is the next step;

otherwise you are pretty well set. It is important not to

overdo things. That is, it is better to do fewer things at once

rather than to many. In the short run this might not seem

important, but complications can easily arise. A mistake can

be expensive as well. At first the whole procedure will seem

complicated. Soon, however, it will just become just another

facet of life. After the procedure is completed, one arranges

the material into different groups again. Then they can be put

into their appropriate places. Eventually they will be used

once more and the whole procedure will be repeated.

However this is part of life.

Top down and bottom up processing1
Top down and bottom up processing into

  • Higher level processes (top-down) affect more basic level processing (bottom-up)

  • Semantic priming

  • Lack of title inhibits reading and comprehension

Automatic controlled processing
Automatic & Controlled processing into


  • Automatic processing requires little processing capacity

  • Automatic processing occurs without deliberate thought (outside of awareness)

  • Automatic processing as effortless and spontaneous?

    e.g in reading simple common words (drink)


  • Controlled processing requires lots of capacity

  • Controlled processing requires awareness

  • Controlled processing as effortful, requires time.

    e.g. in reading complex uncommon words (hemidecortication)

Key issues for critique of experimental approach
Key issues for critique of experimental approach into

  • External validity: can we generalise to a wider population, beyond the context of testing, over time?

  • Speed & accuracy measurements are indirect evidence about internal processes

  • If the ultimate aim is to understand brain processes then ERP & fMRI methods will require to be involved in experimental research (beginning!)

  • Computer simulations can only be created with precision and detail and make theories less vague

  • Experimental methods do not generally take heed of individual difference.

  • Research tends to generate theories with a narrow focus that do not tend to explain the ‘cognitive system’.

Connectionism approach
CONNECTIONISM approach into

  • A class of models that all have in common the principle that processing occurs through the action of many simple interconnected units

  • Concept 1: many simple processing units connected together

  • Concept 2: activation spreads around the network in a way that is determined by the strength of the connections between the units.

Connectionism approach1
CONNECTIONISM approach into

  • Distinction between connectionist models that do and do not learn.

  • Interactive Activation & Competition model (IAC)

    example of connectionist model that does not learn

    (McClelland & Rumelhart, 1981)

  • Back-propagation

    models that learn through back-activation training

Connectionism approach2
CONNECTIONISM approach into

  • Distinction between

  • architecture of a network (describes the layout: number of units & how they are connected)

  • Algorithm determines how the activation spreads around the network

  • Learning Rule specifies how the network learns (e.g. Hebbian learning)

Iac model
IAC model into

  • Many simple processing units arranged in 3 levels.

    input level – visual feature unit level

    units correspond to individual letters

    output level – each unit corresponds to a word

  • Each unit is connected to the unit immediately before and after it. Each of these connections is either facilitatory (excitatory, positive) or inhibitatory (negative). Facilitatory connections make the units at the end > active, inhibitory connections make them < active.

  • When a unit becomes activated activation is sent simultaneously along connections to all connected units(positive or negative activation)

Iac model1
IAC model into

Fragment of a IAC model of word recognition – draw inhilitory (o) and facilitatory ( ) connections!












Iac model2
IAC model into

  • The letter ‘T’ would excite the word units ‘TAKE’ and ‘TASK’ in the level above, but would inhibit ‘CAKE’ and ‘CASK’.

  • Element of Competition is introduced.

  • ‘T’ will mean ‘TIME’, ‘TAKE’ & ‘TASK’ will be activated but a the same time words that do not begin with ‘T’ will be inhibited ( ‘CAKE’, ‘COKE’ & ‘CASK’).

  • Activation from word level to letter level will mean all words beginning with ‘T’ will be slightly activated and ‘easier to see’

  • Letters in the context of a word receive activation from the word units explains the word superiority effect - ‘T’ in a word easier to see than in isolation (no top-down activation).

  • If the next letter is an ‘A’ activates‘TAKE’ & ‘TASK’, inhibits ‘TIME’ which will also be inhibited within the word level by ‘TAKE’ & ‘TASK’. ‘A’ will also activate (some way behind words beginning in ‘T’) ‘CASK’ & ‘CAKE’ but if the next letter is ‘K’ the clear leader will be ‘TAKE’.

Iac model3
IAC model into

  • Over time the pattern of activation settles down into a stable configuration so only ‘TAKE’ remains active and the word is ‘seen’/ recognised

  • Because this model of letter and word recognition is highly interactive, evidence that places a restriction on the role of context is problematic to the IAC model.

  • In more recent models connection strengths are learnt

    Back-propagation being the most widely used connectionist learning rule

Back propagation
Back-propagation into

  • Enables networks to learn to associate input patterns with output patterns

  • Error reduction learning is an algorithm that enables networks to be trained to reduce the error between what the network actually outputs given a particular input, and what it should output.

  • The simplest net architecture that can be trained by back propagation has 3 layers (levels): input, hidden & output.

Back propagation1
Back-propagation into

  • Connections all start with random weights.

  • In the case of ‘DOG’ the pattern of activation is distributed over input units with no single unit corresponding to a single letter. For example units 1 and 3 might be on and unit 2 off.

  • Activation is then passed on to hidden unitsaccording to the values of the connections between the input and hidden units. In the hidden layer activation is summed by each unit.

  • The output of a unit is a complex function of it’s input and is non-linear (logistic function)

Back propagation2
Back-propagation into

  • Each unit has an individual threshold (bias) that can be learnt like any other weight and activation is passed onto the output units so that they eventually have an activation value.

  • Output activation values are unlikely to be the correct ones as the input unit weights were random.

  • The learning rule then modifies the connections in the network so that the output will be a bit more like it should be.

  • The difference between the actual and the target outputs is computed and the values of all the weights from the hidden to the output units are adjusted to make the difference smaller. This process is then back-propagated to change the weights between the hidden and input units.

Back propagation3
Back-propagation into

  • The whole process can then be repeated for a different input-output pair.

  • Eventually the weights of the network converge on values that give the best output averaged across all input-out-put pairs

  • Common modification is to introduce recurrent connections between the hidden layer and a context layer that stores the past state of the hidden layer. The network can then learn to encode sequential information.

  • Most interest is in the behaviour of the trained network rather than the training.

Connectionism ai dynamic systems
Connectionism, AI & Dynamic Systems into

  • Dramatic new theoretical framework (Kuhn: paradigm shift)

  • Recasts old problems in new terms

  • May discover new solutions obscured by prior ways of thinking

  • Redefines what the problems in cognition are

1970s: Metaphor of the brain as a digital computer into .

  • Rationalised cognition: possible to study cognition in an explicit formal manner.

  • When cognition was first thought of in computational terms the digital computer was used as a framework to understanding: processing carried out by discrete processor operations that could be described as rules and were executed in serial order. The memory component was distinct from the processor.

1980s: difference between the brain & a digital computer may be important.

  • Connectionist approaches came into mainstream cognitive psychology

1 connectionism
1. Connectionism may be important

Connectionism may be important

  • Response function is non-linear and has important consequences for processing. Nonlinearity allows crisp categorical behaviour and in other circumstances graded continuous responses.

  • What the system knows is captured by the pattern of connections & the weights associated with the connections.

  • Connectionist systems consist of patterns of activation across different units rather than using symbolic systems.

Connectionism may be important

  • Key question: who determines the weights?

  • Key development: learning algorithms that allowed the network to learn the values for weights.

  • The style of learning was inductive – exposed to sample target behaviour through learning the network would adjust weights in small incremental steps so that over time response accuracy would improve.

  • Ideal the network would also be able to generalise performance to novel stimuli, demonstrating that it had learnt the underlying function that relates output to input rather than just memorising trained examples.

Connectionism issues controversies
Connectionism: Issues & Controversies may be important

  • Computer metaphor: use of rules for analysis replaced the pattern recognition (template-matching) of early models

  • Information processing was bottom-up: perceptual features first that yielded a representation that was passed on to successively higher levels of processing

  • Bottom-up processing was challenged by the influence of context. A letter flashed on a screen was identified better if it appeared in a real word rather than appearing in isolation or appearing embedded in a non-word. This seems to be an example of top-down processing – higher processing influencing supposedly lower processes.

Connectionism issues controversies1
Connectionism: Issues & Controversies may be important

  • Growing evidence that the cognitive system was able to process at multiple levels in parallel rather than being restricted to executing a single instruction at a time. The word perception model of McClelland & Rumelhart (1981) interactive activation model (IAC) was the first to depart from the digital framework.

Connectionism issues controversies2
Connectionism: Issues & Controversies may be important

  • How do we account for the regularity of human behaviour is a key question to cognitive science.

  • There would be nothing to explain if it was entirely random or limited to a fixed repertoire that could be memorised.

  • But behaviour is both patterned and often productive when these patterns are generalised to novel circumstances.

  • Do we use rules or associations to account for the regularity?

Connectionism issues controversies3
Connectionism: Issues & Controversies may be important

  • Assumption: underlying behaviour is a set of rules = explanation for the patterned nature of human cognition

  • Many children initially only know a small number of verbs and produce both correct regular & irregular forms. Later they go on to make mistakes giving irregular verbs regular endings (‘goed’ instead of ‘gone’). Children then learn which follow the rule (regular) and which have to be memorised (irregular). Development is therefore ‘U-shaped’ (good – worse – good again).

  • A rule-based account of this phenomena is initial memorising, discovery of past tense regularity rule, use and overgeneralisation, before learning which verbs are regular & irregular.

Connectionism issues controversies4
Connectionism: Issues & Controversies may be important

  • Rule-based explanation runs into difficulty because:

    some irregular verbs are unique (is – was, go – went)

    some group in terms of phonological similarity (sing, ring, catch, teach – sang,rang, caught,taught) that is generalised to a novel word (pling – plang [analogy to ring-rang])

    Rumelhart & McClelland (1986) produced a connectionist learning

    simulation that produced the same U-shaped performance over time as

    that of the children, and therefore performance may not necessarily

    arise from explicit rules.

    Pinker & Prince (1988) disagree and state that the qualitative

    difference between irregular & regular verbs means that the latter are

    still rule-based.

Connectionism may be important

Connectionist models

  • are powerful induction engines

  • Learn by example

  • Use statistics of those examples to drive learning

  • Although learning is statistically driven, the outcome of the learning process is a system whose knowledge can be generalised to novel instances


Connectionism may be important


Connectionist models:

  • As disembodied intelligence ignore the role of bodies, behaviour as manifest in cognitive behaviour that is tightly coupled to bodies; the way we think about the world depends on how we experience it and can be vastly different (e.g. disabled, stigmatised)

  • Passive vs. Active: importance of goals. Most connectionist models are reactive (or determined by the programmer!). Biological organisms have an agenda whereas neural networks are passive learners.

  • Social vs. Asocial cognition. Almost all connectionist models view cognition as an essentially individual phenomena when it is a social phenomena. Cognitive capacity depends on physical and social structures we create to solve problems that cannot be solved by one person alone.

2 artificial intelligence
2. Artificial Intelligence may be important

Artificial intelligence ai
Artificial Intelligence (AI) may be important

Braitenberg (1984)- Vehicles: Experiments in Synthetic Psychology

The book consisted of 12 short thought experiments

Reader invited to imagine different primitive vehicles

Each was a block of wood with wheels at the rear and sensors (for

headlights) and connections between sensors and the motor that drove

each wheel.

The nature of the sensors and their connection to the motor varied

between vehicles.

Artificial intelligence ai1
Artificial Intelligence (AI) may be important

Braitenberg then considered how each vehicle might behave when

placed on a surface with other vehicles and exposed to a stimulus

(light source).

Some moved o the light and then veered off

Some sped aggressively towards the light and crashed into it

Some circled the light

Each vehicle chapter heading bore a name “love”, “hate”, “values”,

“logic” – it was easy to imagine them as animated and motivated by

anger, affection or complex reasoning BUT the circuits inside were

quite simple.

Artificial intelligence ai2
Artificial Intelligence (AI) may be important

Braitenberg’s point

  • simple systems can give rise to complex behaviour

  • Attribution problem: attributing more than is warranted to a mechanism when we have preconceived notions about what mechanisms underlie a given behaviour.

Artificial intelligence ai3
Artificial Intelligence (AI) may be important


  • Emergent properties that result from the collective behaviour of the system’s components rather than from the actions of any single component

  • Unanticipated, unplanned, unprogrammed

  • Complex interpersonal dynamics within an autocratic social organisation give rise to group behaviours that cannot predicted in advance.

Artificial intelligence ai emergentism
Artificial Intelligence (AI): Emergentism may be important

Cellular automata

2-D grid of cells.

Cell is either on (alive) or off (dead)

At the tick of the clock a cell may change its state according to the


If the cell is alive and has exactly 2 or 3 neighbours which are also

alive, it survives to the next cycle.

If the cell is dead but has exactly 3 alive neighbours it is born.

In all other cases the cell remains dead or dies.

Artificial intelligence ai emergentism1
Artificial Intelligence (AI): Emergentism may be important

Cellular automata

Imagine the shaded ‘on’ cells are part of a much larger grid

GLIDER: looks like biological behaviour - Over time the pattern of

‘on’ cells changes which looks as if it is falling and deforming and in

the process gliding down to the right.

Investigated the way in which such glider systems can solve

computational problems!

Artificial intelligence ai4
Artificial Intelligence (AI) may be important


Biological systems evolve, AI systems are built!

Biological change has a random element (genetic variation) and a

quasi-directed element (some better adapted) that alters the genetic

makeup of succeeding generations. Holland (1975) proposed a

“genetic algorithm” (GA) that has much in common with natural

evolution. Powerful when there are higher-order interactions between

sub-parts of the problem- widely used in conjunction with neural


There are multiple solutions - depend on how different questions are answered,

GA models this as an artificial chromosome (vector of 1 & 0). How well it does is it’s

fitness, preferential replication of these solutions constructs a new generation but at the same

time a random switching of 1 & 0 occurs . The new generation is tested and it’s fitness passed

to a third generation and so on until the best solution is found.

Artificial intelligence ai5
Artificial Intelligence (AI) may be important

Goals Behaviours are usually adaptive and not taught to us.

Connectionist model are disturbingly passive.

Nolfi et al (1994)

Goal-directed behaviour in a neural network, 10x10 grid with a small

number of cells containing food

Tested 100 organisms (simple neural network) in the grid by input of

direction & strength (smell of food) connected to hidden units

(organism’s brain), projected to 2 output motors (allowed forward,

left, right, pause). Random weights (connection strength) in intial set

up - behaviour disorganised (some stood for whole life! some marched

and then fell off). If it stumbled on food could be cloned. Random

weight changes wee introduced too. After 50 generations very

different individuals evolved (looked like – goal directed to food – but

remember Braitenberg’s warning!!!!).

Artificial intelligence ai6
Artificial Intelligence (AI) may be important


  • Young approach

  • Corrective to connectionist approach

  • Emphasis on emergentism

    role of the environment

    importance of the organism’s body

    social nature of intelligence

  • By trying to understand life as it could be – broadens what might count as intelligent behaviour and fresh ways of thinking about biological behaviour

  • Difficulty of crossing bridge between toy models and realistic models

3 dynamical systems
3. Dynamical Systems may be important

Dynamical systems
Dynamical Systems may be important

The digital framework ‘saw’ mind as computer

Connectionism in part ‘sees’ mind as brain


Some researchers studying the dynamical quality of motor

activity state it is difficult to ignore the complex manner in

which behaviours change over time. These researchers do

not use the digital framework but instead use the dynamical

systems framework as alternative to thinking about


Dynamical systems1
Dynamical Systems may be important

A dynamical system changes over time according to some

lawful rule.

  • A formal characterisation of how the system (state space)

    changes over time uses differential equations.

  • Differential equations capture the way in which the variables’ values evolve over time and in relation to each other.

  • An attractor is a state towards which a dynamical system will tend to move under normal conditions (may not actually get there).

Dynamical systems2
Dynamical Systems may be important


Reasons why one should think of cognition as a dynamical system

Dynamical systems3
Dynamical Systems may be important

  • Cognition & Time

  • Cognitive behaviour is not atemporal, exist and unfold over time.

  • Goal of dynamical systems is to specify how changes in states occur.

  • Useful accounts of cognitive behaviour - need to explain temporal changes .

    2. Continuity in state

  • Natural cognitive systems change in a continuous manner, states are not discretely separated from the next. [ Compare with serial processing models].

  • Sentence meaning unfolds gradually, not all at once at the end!

  • Sometimes there never is an end state!

Dynamical systems4
Dynamical Systems may be important

3. Multiple simultaneous interactions

  • Problem of conceptual thinking about many things interacting in complex ways. Digital computers carry out few instructions at the same time.

  • Increase in system complexity, increase of interactions exponential. Impossible to do by digital computer. Dynamical systems focus precisely on how simultaneous interactions in a system affect overall behaviour.

    4. Self-organisation & emergence of structure

  • Emergence as the ability to develop a structure on its own through its own natural behaviour.

All 3 approaches play closer attention to how may be important

natural systems might elucidate cognition

None are complete.

Together they complement each other.

3 approaches attempt to deal with the

shortcomings of cognitive models

Focus of connectionism
Focus of Connectionism may be important

Concern with biological implementation

Concern with problems in learning

Concern with problems of representation

Can an inductive learning procedure discover abstract

generalisations, using only examples (associations?), rather than

explicitly formulated instructions (categorical?)?

How do the resulting knowledge bases capture generalisations?

Are there important difference between traditional rule systems and

the ways in which networks represent generalisations?

Focus of artificial intelligence
Focus of Artificial Intelligence may be important

Rejects cognition as highly developed mental activity (chess)

Emphasis is on intelligence for survival

Role of evolution in achieving adaptation

Role of Adaptation

Emergence of structures and behaviours not designed

Emergence of structures and behaviours as an outgrowth of

complex interactions.

Focus of dynamical systems
Focus of Dynamical Systems may be important

Concern with interaction & emergence

Mathematical framework for understanding emergence &

high-order interaction found in connectionist and AI models.

Deeper commitment to incorporation of the importance of

time into models.

Connectionist approaches to semantics
Connectionist Approaches to Semantics may be important

  • In connectionist models semantic representations do not correspond to particular semantic units but to a pattern of activation across semantic units (microfeatures).

  • A microfeature is an individual active unit involved in low lvel processing rather than symbolic processing (Hinton, 1989).

  • Semantic microfeatures mediate between perception, action & language and do not necessarily have straightforward linguistic counterparts. The interface between perceptual and conceptual systems (Jackendoff, 1987).

  • No reason to assume a microfeature will map onto a linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

  • Loss of specific semantic information has been shown to affect a set of related concepts (Gainotti et al, 1996). Semantic microfeatures may therefore encode knowledge at a very low level of semantic representation .

  • Encoding of visual information by some of the semantic microfeatures would predict that lesions to the semantic system to result in visual errors, difficulty in perceptual processing & naming difficulties in dementia.

Connectionist attractor semantic network
Connectionist Attractor Semantic network linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

Hinton & Shallice (1991)

  • Mutually excluded features inhibit each other and only one can be activated at the same time (e.g. an object cannot be both ‘hard’ & ‘soft’).

  • ‘Clean-up’ units modulate the activation by allowing combination of units to influence each other.

  • Visualising a semantic network as a series of hills & valleys, words that have a similar meaning will be in valleys that are close together.

  • A word will cause an initial particular pattern of activation but this may be very different from its ultimate semantic representation.

  • Valley bottoms correspond to particular word meanings (attractors) and if you start somewhere along the sides of the valley you will eventually find the bottom.

If linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

meanings are represented by a pattern of

activation distributed over many microfeatures


a general degradation of performance will be

resultant from the loss of microfeatures and not

individual items

Semantic microfeature loss hypothesis dementia
Semantic microfeature loss hypothesis: Dementia linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

  • Loss of semantic microfeatures distorts semantic space so that some attractors are lost whilst others may become inaccessible on some tasks due to the erosion of attractor basin boundaries.

  • Damage to a subset of microfeatures will lead to probabilistic decline in performance.

  • Pattern of observed performance will vary from patient to patient depending on the importance of the microfeature lost to a particular item in a particular patient.

  • Task performance will also vary because each task will provide differing amounts of residual activation to the damaged system

  • When tested experimentally the permanent loss of microfeatures may therefore sometimes look like the loss of information and at other times like a difficulty of accessing information.

Example linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

The clearest indication of item loss is taken as response consistency

If ‘vampire’ as a unit is lost then the meaning of the word will be

unavailable. The same would apply to ‘bites’

If a unit is lost that is not easily encoded linguistically then

the consequences may not be obvious in a linguistic task, but it

may mean that higher level linguistically encoded units become

permanently unavailable or are less easily accessed..

Connectionist models are sensitive to multiple constraints.

The availability of items will depend to the degree the task provides constraints

Alzheimer's Disease patients perform relatively well on highly constrained tasks.

Latent semantic analysis laudauer et al 1998
Latent Semantic Analysis (Laudauer et al, 1998) linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

  • Connectionist models are good at picking out statistical regularities.

  • In LSA makes explicit use of co-occurrence information in acquiring knowledge (learning) that neds no prior linguistic knowledge. A mathematical procedure abstracts dimensions of similarity from a large corpus of items bsed upon an analysis of the context in which the words are found. Context provides a powerful constraint on word meaning and LSA makes use of this context to acquire knowledge about words.

  • LSA learns about interrelations through induction – infe the meaning of new words from the context.

Example linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

We can reach agreement on the usage of words without

any external referent

It would explain how we acquire words describing private

mental experiences


How do you know that I mean the same thing by

‘I’m sad today’ as you do?


In the context in which these words repeatedly do and do not occur.

Semantic priming
Semantic priming linguistic equivalent (hidden units in a connectionist network do not always acquire an easily identifiable specific function)

  • Test the effect of an independent variable on the dependent variable

  • Independent variable semantic relationship between word pairs

  • Dependent variable reaction time and accuracy.

  • Priming is said to have occurred if reaction time is faster and accuracy is greater for a target (count) preceded by a related prime (multiple) compared to those preceded by an unrelated prime (facial).

  • Previous research confounded the semantic and associative properties of words. Beach (in preparation) separated associative & semantic properties of words in a half-visual-field paradigm.

  • In the RVF (left hemisphere), at a short SOA (automatic processing), both categorical (semantic) and associative primed targets show a speed for accuracy trade off , but were NOT primed because reaction times to unrelated primes were faster than to related primes. Both purely associative and purely semantic meaning appeared to interfere with word recognition in the LH – inhibition rather than facilitation as previously reported.

  • No priming occurred in the LVF (RH) despite the much slower recognition of targets preceded by unrelated words compared to the LH.

    Accuracy for associative word pairs was high, this may reflect the time needed to

    visually recognise the word pair as a separate distinct compound (e.g. pollen and

    count as the compound .pollen-count

    Semantic word pairs were not primed in the LVF (RH) and in addition accuracy

    was poor.

  • TASK is superficial and does not require the properties of words. Beach (in preparation) separated associative & semantic properties of words in a half-visual-field paradigm.meaning of the word to be accessed. Lexical decision task: is the word a real word (count) or a nonword (bount)?

  • INSTRUCTIONS are important: the prime word is presented before the target word BUT participants are only told that the prime word is to warn them that the word they are to respond to is about to appear – the experimenter does not use the word ‘prime’.

  • CONTROLLING for confounding variables is also important: words are matched on frequency so that targets are not responded to faster just because they are more common. Non-words are matched on word length. Semantic distance & associative strength of all word pairs are carefully controlled using databases.

  • EXPERIMENTAL DESIGN is important: prime-target word pairs are counterbalanced across participants so that all target words are preceded by both unrelated and related primes.

  • Series of 36 trials presented twice (64) so that each prime type is presented once to a ‘word’ target and once to a ‘nonword’ target.

  • 36 word pairs are divided into 2 lots of 18

  • 2 participant groups – both prime types presented to all targets.

    related prime unrelated prime target

    1-18 1-18 1-18

    19-36 19-36 19-36

    Participant group 1: 1-18 related, 19-36 unrelated primes

    Participant group 2 : 19-36 related, 1-18 unrelated primes

    Trials presented in random order .

All participants

See exactly

the same

36 target words

multiply type is presented once to a ‘word’ target and once to a ‘nonword’ target.


Prime word presented for 100 msec

Interstimulus cross

Presented for 150 msec

Target presented



Response –

Blank screen

for 2000msec between trials




SOA (stimulus onset asynchrony)

250msec (100 + 150) =

time between presentation of

the prime and target word

Critical thinking type is presented once to a ‘word’ target and once to a ‘nonword’ target.


representation & computation

Representation computation
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Representations are ‘things’ that stand for something.

When you visually perceive a cat you don’t have a cat in

your head, but a mental image (percept) of the cat.

To shout ‘cat’ when you have this percept requires that

you have some idea of what a cat is (cat concept).

Percepts, ideas & concepts are necessarily entities that

stand for something – internal representations.

Representation computation1
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

What is the ontological status of internal representations?

What representation means varies from discipline to

discipline; from theory to theory.

It is almost universally assumed that all cognitive

processes are computational processes that require

internal representations as the medium of computation.

Representation computation2
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Are intelligent systems computational systems?

Is a symbolic computational framework plausible for

explaining biological cognitive processing?

Do computer simulations explain how the mind/brain


OR is the status of internal representations problematic?

Representation computation3
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

What makes something a representation?

What is the relation between computationalism,

representationalism & the medium of computation?

Questioning the notion that computation and symbolic-

digital processing are the same thing.

Are all internal representations either symbols or


Representation computation4
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Cognition without representation?

Intrinsic internal representations are required if a system

trafficks in entities whose content is separate from our


BUT mental images of past experience are intrinsic


By contrast extrinsic representations (arbitrary rocks as

content-entities) do depend on our descriptions /


Representation computation5
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Stufflebeam (2002)

“If brains implement symbolic-digital processing, we should be

able to structurally decompose brains into their discrete rules and

symbols. But we cannot. Whilst we can describe the brain as if it

implemented symbolic-digital processing, that is a far cry from the

brain actually doing it”

Representation computation6
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Stufflebeam (2002)

Connectionist (distributed networks) and anti-computationalists

take the absence of discrete rules & symbols as evidence that the

brain is not a symbolic-digital computer.

Brains could implement non-symbolic-analog processing; there are

important difference between artificial networks & real

populations of neurons in the brain. Anolog quantities do the

computational work in brains, but in artificial neural nets it is the

activation pattern among processing units. BOTH of which ar poor

candidates for being representations, instead they are the


Representation computation7
Representation & computation type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Stufflebeam (2002)

Given the right sort of interpretation - analog-quantities or

distributed patterns of activation could berepresentations.


It is the interpretation process that makes them representations and

therefore at best they are extrinsic representations that may be

descriptively useful constructs, but they are not internal


How much computational labour do biologically intelligent

systems off-load to their environment, thus minimising the need for

internal representations?

Computer simulations
Computer simulations type is presented once to a ‘word’ target and once to a ‘nonword’ target.

  • How are decisions represented in each domain?

  • Does the content play a role in determining the type of representation?

Kintsch 1998 construction integration model
Kintsch (1998) type is presented once to a ‘word’ target and once to a ‘nonword’ target.Construction-integration Model

Hybrid symbolic/connectionist model

  • Computer simulation: takes discourse in the form of propositions and “reads it”: creating a network of connections between concepts in memory

  • Network can then be probed to answer questions about the text.

  • Words associated with main concepts in the text are primed

Kintsch 1998 construction integration model1
Kintsch (1998) type is presented once to a ‘word’ target and once to a ‘nonword’ target.Construction-integration Model

  • Nodes are connected by links

  • Each node can be differentially activated

  • Each link has a strength: specifies relation between nodes

  • Nodes represent propositions “boy hits girl” hit[boy,girl]

  • In a simulation propositions derived:

    (i) text (ii) reader LTM

  • Links created when nodes

    (i) share an argument (ii) meaningful relationship

  • Information derived from the text: text-base

  • elaboration by background knowledge: situation model

Kintsch (1998) Construction-integration Model type is presented once to a ‘word’ target and once to a ‘nonword’ target.Representations of a text are created in a 2 stage process

Stage 1: Construction

Propositions are generated from

all text propositions & all relevant LTM propositions

( therefore information in excess of relevance at this stage)

Related propositions are linked

Hit[boy,girl] linked to Run[girl,home] – girl shared

Information from LTM imported [girl,female]

Inhibition [-ve weighted links]: nodes inhibit one another

Process is BOTTOM-UP : simple local rules

Representation is comprehensive > coherent

Kintsch 1998 construction integration model representations of a text created in a 2 stage process
Kintsch (1998) Construction-integration Model type is presented once to a ‘word’ target and once to a ‘nonword’ target.Representations of a text created in a 2 stage process

Stage 2: Integration

Constructed representation converted into a coherent one

Emphasis = important information = develop situation model

Accomplished by spreading activation between nodes

Nodes with strong connection strengths share activation

Nodes with more activation spread > to neighbours


  • Normalised: most activation = 1, -ve activations = 0

  • All links & nodes assume to start with same activation

    BETTER CONNECTED nodes gain most of the activation,

    less well connected –little/no activation

    coherent> comprehensive

Kintsch 1998 construction integration model representations of a text created in a 2 stage process1
Kintsch (1998) Construction-integration Model type is presented once to a ‘word’ target and once to a ‘nonword’ target.Representations of a text created in a 2 stage process

Critical: Comprehension is cyclic

Information not currently in use ⃕ LT store

Strength LT proposition = accrued activation in cycles

Nodes kept active > one cycle accrue > activation ⃕ LTM

Leads to a situation model:

The most important nodes [as measured by connectedness]

gain most of the activation

Model predicts reader’s memory will be best for

best-connected propositions = model of the story.

Kintsch 1998 construction integration model2
Kintsch (1998) Construction-integration Model type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Kintsch: comprehension as a broad framework for cognition

Networks show same conjunction errors as Kahneman & Tversky (1982)

Framework can be used to model preference choices

Model is agnostic to probability theory – no numerical data

only valence (+/-) of proposition

If a Construction-integration Model is able to model participant decision

And if it’s success depends on underlying text representations

Then strong argument for the role of content in preferential judgement

Rettinger hastie 2003
Rettinger & Hastie (2003) type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Belief: some of the differences in choices caused by

framing & content manipulations may be due to how the

problem elements are organised in a representation of the


3 types of representation

  • Narrative-story

    (ii) Decision tree structure

  • Tabular - Spreadsheet – list of numbers

Rettinger hastie 20031
Rettinger & Hastie (2003) type is presented once to a ‘word’ target and once to a ‘nonword’ target.

  • Narrative-story representation

    Temporally ordered sequence of events linked by antecedent –

    consequent (causal) relationships dominate over a hierarchical

    structure. Decision problem with several courses of action, each

    represented as a story line

    contingent future paths subordinate to main sequence in each storyline.

    Empirical signatures

  • Temporal ordering in recall

  • High recall of non-numerical data

  • High recall of causal relationships

Rettinger hastie 20032
Rettinger & Hastie (2003) type is presented once to a ‘word’ target and once to a ‘nonword’ target.

  • Decision tree representations

    Format of traditional decision problem

    Several alternative courses of action

    Alternative courses conditioned on chance/cause nodes

    Leads to outcomes with consequences

    Empirical Signatures:

  • More balanced than narrative representation

  • Would not subordinate alternative future branches to a single dominant story line

  • numerical data more prominently preserved

Rettinger hastie 2003 in each content domain at least 1 model may be ruled out
Rettinger & Hastie (2003) type is presented once to a ‘word’ target and once to a ‘nonword’ target.In each content domain at least 1 model may be ruled out


content-rich domains tabular model utterly fails

gamblesnarrative model fails

legal good fit for narrative & decision tree models

stock good fit for decision tree

grade fit for narrative & decision tree

2. MEMORY RECALLCritical to whether the simulation capturing

Representations. Models do not fit this criterion equally.

legal: narrative better at predicting what parts of the story are remembered. DISCARD decision tree.

gamble : decision tree is excellent at predicting what parts of the story

are remembered. Use more formal representations. DISCARD narrative.

grade : no consensus representation, both moderate correlations

stock : neither model does well.

Rettinger hastie 20033
Rettinger & Hastie (2003) type is presented once to a ‘word’ target and once to a ‘nonword’ target.


Narrative representations are more appropriate to legal

stories whereas decision trees are more appropriate for


Despite an identical underlying decision structure

they produce different mental representations of that

structure in participants.

Rettinger hastie 20034
Rettinger & Hastie (2003) type is presented once to a ‘word’ target and once to a ‘nonword’ target.


Decision making relies on a variety of cognitive processes

Domain content is associated with

  • the size of a framing effect

  • changes in decision strategy

  • changes in the mental representation


    do not generalise to medical, legal or financial decision contexts.

    Context-specific information

    is critical in determining decisions

    If the Context changes: change in the way we think about decisions

    interacting with personal & moral relevance so that the information

    inputs to the decision process are represented differently.

Future research
Future Research type is presented once to a ‘word’ target and once to a ‘nonword’ target.

Information processing theories that integrate

Cognition computational models & decision making processes

Combining CT model & models of LTM

Construction-integration model simulates on-line processing

but the relationship with the structure of LTM is unknown:

May allow stronger predictions of decisions.

Individual Difference

  • knowledge base

  • decision preference

  • mental representation

    Use CT :Assess what knowledge is used in making a decision

    :Specification of the mental representation