Causal Cognition 1: learning

Causal Cognition 1:learning David Lagnado University College London

Causal knowledge • Causality is ‘the cement of the universe’ • Mirrored by our cognitive system • Causal knowledge binds our concepts and shapes our reasoning • Fundamental to prediction, control, explanation, attribution • How do people acquire this knowledge? • How does it influence their reasoning?

Internal models in learning • People construct internal models to represent the causal texture of the environment (Tolman & Brunswik, 1935) • Subsequent developments favoured probabilistic models (e.g., associative networks; regression models; connectionist models)

Internal models in learning Recent emphasis on causal models Inspired by formal work on causality in AI and statistics Pearl, 2000: people encode stable aspects of their experience in terms of qualitative causal relations Inverts common view that probabilistic relations are primary

Sources of causal knowledge • Instruction • Analogy (both require prior causal knowledge) • Direct perception? • Michotte’s launching effect • Agency • Induction from patterns of experience

Causal learning • Infer causal relations from patterns of data • Often difficult because: • Probabilistic and incomplete data • Small samples • Different models can generate same data • How do people do it?

Methods • Variety of experimental paradigms • Physical set-ups • Exposure to data in on-line tasks • Presentation of summary statistics • Verbal scenarios • Measured responses • Explicit (e.g., judgments of structure or strength) • Implicit (e.g., performance on control or prediction task)

Models • Aim: construct a descriptive model of causal learning • Computational level (what & why) • Algorithmic/Process level (how) • Implementation level • Status of normative models • As standard of appraisal • As guide/framework for developing computational models • Ideal learner

E.g., Does MMR jab cause autism? MMR Autism Structure before strength • Structure • Does a causal link exist? • Strength • To what extent does MMR cause autism? • Conceptually question of structure is primary • Need to posit a causal link before estimating its strength • But most psychological research concerned with strength

Learning about causal strength • Covariation theories • Associative (e.g., Shanks & Dickinson, 1989) • Rule-based (e.g., power PC: Cheng, 1997) • People estimate the strength of causal relations on the basis of covariation between events

Associative theories • People form associations between event representations • Strength of association determined by contingency between events • Associations updated via Rescorla-Wagner learning rule • incremental, error-driven • Equivalent to delta learning rule in neural networks • computes delta P at asymptote • Delta P as normative index of degree of contingencydelta P = P(E|C) – P(E|~C)

Power PC • People assume that objects have hidden causal powers (generative or preventative) • Strength of causal powers inferred from observed frequencies • Normatively derived given certain independence assumptions Power p = delta P/(1 – P(E|~C)) • Corresponds to Noisy-or gate in Causal Bayes Net

Typical experimental paradigm • Subjects given a cover story that identifies a potential cause C and a potential effect E (e.g., drugs and recovery) • Learning phase • Exposure to numerous trials in which C is present (or absent) and effect E present (or absent) E.g., C = Drug taken E = Recovery • Test phase • Subjects judge the effectiveness of the cause E ~E C ~C

Comparison of models

Summary • Neither associative nor power models give a complete account of the empirical data on people’s causal strength judgments • Perhaps no unitary model for strength estimates • People use various learning strategies according to context and probe questions

Incompleteness of covariation-based models • Focus on simple models where potential causes and effects pre-sorted (by time order, prior knowledge, instructions) • But people often confronted with more complex structures (many variables, different functional relations etc), and have to infer structure (what is a cause, what an effect)

Two approaches to causal learning • Data-driven • Focus on how people make causal judgments from patterns of covariation (Cheng, Shanks & Dickinson) • Events pre-sorted as potential causes and effects • Estimate strength of causal links • Hypothesis-driven • Learning guided by prior knowledge or assumptions about structure (Waldmann & Hagmayer) • But neither approach tells us how structure is learned

Causal Bayesian networks • Normative framework • Spirtes, Glymour & Schienes, 1993; Pearl, 2000 • Clarifies relationship between probabilistic data and causal structure • Formalizes notion of intervention • Distinguishes between observation and intervention • Development of various structure learning algorithms • Constraint-based • Bayesian

Strong claim • Causal Bayes nets as model for representation, inference and learning • Adults and children use causal maps (Gopnik, Glymour et al., 2004) • Represent causal structure in terms of causal Bayes nets • Predictions via Bayesian updating (with special rules for interventions) • Use formal learning procedures to discover causal structure

Problems? • People often make causal judgments on basis of a few trials but structure learning algorithms require large sample sizes • Structure learning models both over- and under-estimate human capabilities: • Memory and processing limitations • People are immersed in a spatiotemporal environment with various other cues to causal structure (time order, spatial contiguity, …) • Experimental evidence lacking • Gopnik et al.’s data with children admit of alternative explanations (and child can’t tell you!)

Cues to causal structure • Multiple fallible cues (cf. Einhorn & Hogarth, 1986) • Statistical covariation • Temporal order • Intervention • Proximity (space & time) • Similarity… • These can cohere or conflict • In natural environment cues often correlated • Statistical covariation is focus of most research • But covariation alone is insufficient to infer unique causal structure

Two central cues • Temporal order (study 1) • Previous work focuses on how time delays affect strength estimates • Not on structure questions • Intervention (study 2) • Previous work does not fully distinguish intervention from observation • (Both studies from Lagnado & Sloman, 2006 JEP:LMC)

MMR Autism Temporal order • Temporal order of events provides a basic cue to causal structure • Causes occur before their effects • Suggests simple heuristic: use temporal order as cue to causal order • If MMR jabs are reliably followed by autism, infer that MMR causes autism • But temporal order is a fallible cue

A B Structure suggested by temporal order (A infects B) Temporal order can be misleading Order of appearance of Virus A B A B TIME

B Temporal order can be misleading Order of appearance of Virus A B A B C A Another possible structure (A and B infected by common cause C) TIME

A B Temporal order can be misleading Order of appearance of Virus A B A B Yet another possible structure (B infects A) TIME

Study 1 • Pits covariation against temporal order • 2 main questions • How does temporal order influence causal learning? • Do people make spurious inferences when temporal order is misleading?

Email virus task • Task • Participants send viruses to a small computer network • Must infer which connections are working • Vary time order of receipt of information • Participants told that there is variability in time delays • Transmission between computers • Between infection and appearance of virus

Learning phase C C D B 2. Observe which other computers receive virus 3. Infer which connections work B A A 1. Send Virus to A

C D B A Design of experiment • Participants complete four similar problems • All problems have same underlying network structure • Each problem displays viruses in a different temporal order • Response mode – binary choice for each link Links only work 80% of the time No spontaneous viruses Initial intervention always to send virus to A 100 test trials

C D B A Frequencies of patterns Note: C and D are conditionally independent given B

Simultaneous Time order ABDC C C C C D D D D B B B B A A A A Time order AB[CD] Time order ADCB FOUR TIME CONDITIONS within-subject

Which connections are working? C D C D B B A Simultaneous Time order ABDC A C D C D B B A A Time order AB[CD] Time order ADCB Links endorsed by > 50% of subjects Significantly > 50%

CHOICE: Use B or D to send message to C? C C D D B B A A C D C D B B A A Time order ADCB 75% 62% Simultaneous Time order ABDC 21% 75% Time order AB[CD]

C D B A Conclusions • Subjects use time order to hypothesize causal links • They confirm or revise these links through patterns of covariation data • Revision not optimal Time order ABDC

Conclusions • Subjects use time order to hypothesize causal links • They confirm or revise these links through patterns of covariation data • Revision not optimal C D B 51% of trials Confirming pattern of data A

Conclusions • Subjects use time order to hypothesize causal links • They confirm or revise these links through patterns of covariation data • Revision not optimal C D B 13% of trials Disconfirming pattern of data A

Conclusions • Subjects use time order to hypothesize causal links • They confirm or revise these links through patterns of covariation data • Revision not optimal C D B Add an extra link to account for data A

Conclusions • Temporal order cue overrides covariation information • Can lead to spurious causal inferences & memory distortions • Hypothesis-driven learning • Temporal order used to generate initial causal hypotheses • Revised in light of covariational data • Sequential testing of individual models rather than full Bayesian updating

Intervention • Manipulating a variable in the system • Conducting an experiment • Often critical to establishing causal relations • demo • Causal models as ‘oracles for intervention’ (Pearl, 2000) • Prediction of consequences of actions (even those you have never tried) • A key benefit of intervention is that it can discriminate between ‘Markov equivalent’ models

Country Music Suicide Suicide Country Music Does listening to country music cause suicide? Strong correlation between air-time dedicated to country music and suicide rates (across districts) Covariational data alone cannot distinguish between these models Stack & Gundlach, 1992

I I Country Music Suicide Suicide Country Music Does listening to country music cause suicide? Intervene by banning Country music Do suicide rates go down? If so, model on left is correct Graph surgery (Pearl, 2000; Spirtes, Glymour & Schienes, 1993) Formal representation of intervention

Benefits of intervention • Recent studies suggest that adults & children (& rats?) can distinguish models by making appropriate interventions • Blaisdell et al., 2006; Gopnik et al., 2004; Lagnado & Sloman, 2002, 2004; Sobel, 2003; Steyvers et al., 2003 • But intervention and temporal order are often confounded • Interventions occur before their effects • Learners might use temporal order rather than intervention per se

Temporal order based heuristic? • Infer that changes that occur after one’s intervention are effects of the intervened-on variable • Inference involves no explicit representation of how interventions can modify structure • Interveners benefit from ‘surgery’ without representing it

Study 2 • Assess separate effects of intervention and temporal order • How do these cues combine, and what happens when they conflict? • Demo of slider study

Design • Learning status crossed with temporal order • Participants each completed 6 problems • Response mode – binary choice for each link within between

A A B B C C A A B B D chain long chain simple C C A B common cause B A chain + common cause C common effect B Causal models D

Results • Intervention (active or yoked) better than observation • Inconsistent time reduces learning in yoked intervention & observation • But no effect on active intervention

Follow-up study • Why aren’t interveners affected by time reversal? • Do interveners overcome inconsistent temporal order by figuring out that variable information is reversed? • Use randomized rather than reverse temporal order

Results Intervention2 shows similar decline in performance with inconsistent temporal order

Causal Cognition 1: learning