1 / 63

The causal matrix: Learning the background knowledge that makes causal learning possible

The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL). Collaborators. Tom Griffiths. Noah Goodman. Vikash Mansinghka. Charles Kemp.

Download Presentation

The causal matrix: Learning the background knowledge that makes causal learning possible

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The causal matrix: Learning the background knowledge that makes causal learning possible Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL)

  2. Collaborators Tom Griffiths Noah Goodman Vikash Mansinghka Charles Kemp

  3. Learning causal relations Goal: Computational models that explain how people learn causal relations from data. Structure Data

  4. A Bayesian approach Data d Causal hypotheses h X3 X3 X4 X4 X1 X2 X1 X2 1. What is the most likely network h given observed data d ? 2. How likely is there to be a link X4X2 ? (e.g., Griffiths & Tenenbaum, 2005; Steyvers et al 2003)

  5. What’s missing from this account? • Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these classes • Causal variables: constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding: domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention

  6. The approach • What we want to understand: How are these different aspects of background knowledge represented, used to support causal learning, and themselves acquired? • Abstract domain-specific frameworks or causal schemas • Causal variables grounded in sensorimotor experience • Domain-general causal understanding • What we need to answer these questions: • Bayesian inference in probabilistic generative models. • Probabilities defined over structured representations: graphs, grammars, predicate logic. • Hierarchical probabilistic models, with inference at multiple levels of abstraction. • Flexible representations, growing in response to observed data.

  7. Outline • Framework theories or causal schemas: domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these concepts • Causal variables:constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding:domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention

  8. See this? It’s a blicket machine. Blickets make it go. Let’s put this one on the machine. Oooh, it’s a blicket! Causal Machines(Gopnik, Sobel, Schulz et al.)

  9. A B “Backward blocking” (Sobel, Tenenbaum & Gopnik, 2004) • Initially: Nothing on detector – detector silent (A=0, B=0, E=0) • Trial 1: A B on detector – detector active (A=1, B=1, E=1) • Trial 2: A on detector – detector active (A=1, B=0, E=1) • 4-year-olds judge if each object is a blicket A: a blicket (100% say yes) B: probably not a blicket (34% say yes) A Trial AB Trial A B ? ? E

  10. Possible hypotheses? A B A B A B A B A B A B A B A B E E E E E E E E A B A B A B A B A B A B A B A B E E E E E E E E A B A B A B A B A B A B A B A B E E E E E E E E

  11. Bayesian causal learning With a uniform prior on hypotheses, generic parameterization: Probability of being a blicket: A B 0.32 0.32 0.34 0.34

  12. A stronger hypothesis space generated by abstract domain knowledge • Links can only exist from blocks to detectors. • Blocks are blickets with prior probability q. • Blickets always activate detectors, detectors never activate on their own (i.e., deterministic OR parameterization, no hidden causes). P(h00) = (1 – q)2 P(h01) = (1 – q) q P(h10) = q(1 – q) P(h11) = q2 A B A B A B A B E E E E P(E=1 | A=0, B=0): 0 0 0 0 P(E=1 | A=1, B=0): 0 0 1 1 P(E=1 | A=0, B=1): 0 1 0 1 P(E=1 | A=1, B=1): 0 1 1 1

  13. Manipulating prior probability(Tenenbaum, Sobel, Griffiths, & Gopnik) A Trial Initial AB Trial

  14. Inferences from ambiguous data I. Pre-training phase: Blickets are rare . . . . After each trial, adults judge the probability that each object is a blicket. II. Two trials: A B detector, B C detector Trial 2 A B C Trial 1

  15. Same domain theory generates hypothesis space for 3 objects: B B A C A C E E • Hypotheses: h000 = h100 = h010 = h001 = h110 = h011 = h101 = h111 = • Likelihoods: B B A C A C E E B B A C A C E E B B A C A C E E if A = 1 and AE exists, or B = 1 and BE exists, or C = 1 and CE exists, else 0. P(E=1| A, B, C; h) = 1

  16. “Rare” condition: First observe 12 objects on detector, of which 2 set it off.

  17. A B 4-year-olds (w/ Dave Sobel) I. “Backward blocking” “Is this a blicket?” 100% 25% (Rare) 100% 81% (Common) Trial 2 Trial 1 II. Two trials: A B detector, B C detector Trial 2 A B C Trial 1 “Is this a 87% 56% 56% blicket?”

  18. Formalizing framework theories Framework theory Causal structure Event data

  19. Grammar Phrase structure You shot the wumpus. Utterance Formalizing framework theories Framework theory Causal structure Event data

  20. A framework theory for detectors: probabilistic first-order logic

  21. Formalizing framework theories Framework theory Causal structure Event data

  22. Alternative framework theories Classes = {C} Laws = {C C} Classes = {R,D, S} Laws = {R D, D S} Classes = {R, D, S} Laws = {S D}

  23. And rules out others: The abstract theory constrains possible hypotheses: • Allows strong inferences about causal structure • from very limited data. • Very different from conventional Bayes net learning.

  24. Learning with a uniform prior on network structures: True network Sample 75 observations… attributes (1-12) observed data patients

  25. z 1 2 3 4 5 6 7 8 0.8 0.0 0.01 Learning a block-structured prior on network structures: (Mansinghka et al. 2006) h 0.0 0.0 0.75 9 1011 12 0.0 0.0 0.0 True network Sample 75 observations… attributes (1-12) observed data patients

  26. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 True structure of graphical model G: # of samples: 20 80 1000 Graph G edge (G) Data D Classes Z 1 2 3 4 5 6 … 7 8 9 10 11 12 13 14 15 16 … class (z) Abstract Theory c1 … c2 c1 c2 h 0.4 c1 0.0 … c2 0.0 0.0 … edge (G) Graph G Data D (Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)

  27. Human learning of abstract causal frameworks • Lien & Cheng (2000) • Shanks & Darby (1998) • Tenenbaum & Niyogi (2003) • Schulz, Goodman, Tenenbaum & Jenkins (submitted) • Kemp, Goodman & Tenenbaum (in progress)

  28. G F O W C L A C The causal blocks world(Tenenbaum and Niyogi, 2003)

  29. ? x Learning curves ? Model predictions

  30. G F O W C L A C Animal learning of abstract causal frameworks? Framework theory Causal structure Event data

  31. Outline • Framework theories or causal schemas:domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these concepts • Causal variables: constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding:domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention

  32. The problem ? • Option 1: Variables are innate. • Option 2 (“clusters than causes”): Variables are learned first, independent of causal relations, through a kind of bottom-up perceptual clustering. • Option 3: Variables are learned together with causal relations. A child learns that petting the cat leads to purring, while pounding leads to growling. But what are the origins of these symbolic event concepts (“variables”) over which causal links are defined?

  33. A hierarchical Bayesian framework for learning grounded causal models(Goodman, Mansinghka & Tenenbaum, CogSci 07) Hypotheses: Data: … Time t Time t’

  34. “Alien control panel” experiment Condition A Condition B Condition C

  35. Mean responses vs. model Blue bars: human proportion of responses Red bars: model posterior probability

  36. Outline • Framework theories or causal schemas:domain-specific constraints on “natural” causal hypotheses • Abstract classes of variables and mechanisms • Causal laws defined over these concepts • Causal variables:constituents of causal hypotheses • Which variables are relevant • How variables ground out in perceptual and motor experience • Causal understanding: domain-general properties of causal models • Directionality • Locality (sparsity, minimality) • Intervention

  37. Causal Bayesian networks (BNs + interventions) Bayesian networks: minimal structure fitting conditional dependencies. Correlations Temporally directed associative strenghts Domain-general causal understanding y a b World: x z c Possible alternative models: y a b y a b x z x z c c y a b y a b x z x z c c

  38. W A Domain-general causal understanding W A W A An abstract schema for causal learning in any domain. Essentially equivalent to Pearl- style learning for CBNs. System 1 System 2 System 3 W A System X

  39. W A Some alternatives W A W A V V V … V V V

  40. Some alternatives A A W W A W W A W A W A W A W A W A

  41. V Can a Bayesian learner infer the correct domain-general properties of causality, using data from multiple systems, while simultaneously learning how each system works? W A , , V , W A , W A , W A System 1 System 2 System N ... Sample 1 Sample 2 Sample 1 (Goodman & Tenenbaum) ... ... Sample 3 ...

  42. Yes.

  43. Summary • What we want to understand: How are different aspects of background knowledge represented, used to support causal learning, and themselves acquired? • Abstract domain-specific frameworks or causal schemas • Causal variables grounded in sensorimotor experience • Domain-general causal understanding • What we need to answer these questions: • Bayesian inference in probabilistic generative models. • Probabilities defined over structured representations: graphs, grammars, predicate logic. • Hierarchical probabilistic models, with inference at multiple levels of abstraction. • Flexible representations, growing in response to observed data.

  44. Insights • Aspects of background knowledge which have been either taken for granted or presumed to be innate could in fact be learned from data by rational inferential means, together with specific causal relations. • Domain-specific frameworks or schemas and domain-general properties of causality could be learned by similar means. • Abstract causal knowledge can in some cases be learned more quickly and more easily than specific concrete causal relations (the “blessing of abstraction”).

  45. M1 p(D = d | M ) M2 All possible data sets d Bayesian Occam’s Razor (MacKay, 2003; Ghahramani tutorials) For any model M, • Law of “conservation of belief”: A model that can predict many possible data sets must assign each of them low probability.

  46. Learning causation from contingencies C present (c+) C absent (c-) e.g., “Does injecting this chemical cause mice to express a certain gene?” a c E present (e+) d b E absent (e-) Subjects judge the extent C to which causes E (rate on a scale from 0 to 100)

  47. Learning more complex structures • Tenenbaum et al., Griffiths & Sobel: detectors with more than two objects and noisy mechanisms • Steyvers et al., Sobel & Kushnir: active learning with interventions (c.f. Tong & Koller, Murphy) • Lagnado & Sloman: learning from interventions on continuous dynamical systems

  48. Inferring hidden causes Common unobserved cause 4 x 2 x 2 x Independent unobserved causes 1 x 2 x 2 x 2 x 2 x One observed cause The “stick ball” machine 2 x 4 x (Kushnir, Schulz, Gopnik, & Danks, 2003)

More Related