Bayesian models of human       learning and reasoning
Download
1 / 87

ucsd07 - PowerPoint PPT Presentation


  • 247 Views
  • Uploaded on

Bayesian models of human learning and reasoning Josh Tenenbaum MIT Department of Brain and Cognitive Sciences Computer Science and AI Lab (CSAIL). Collaborators. Tom Griffiths. Charles Kemp. Noah Goodman. Chris Baker. Amy Perfors. Vikash Mansinghka. Lauren Schmidt. Pat Shafto.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ucsd07' - victoria


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Slide1 l.jpg

Bayesian models of human learning and reasoning

Josh Tenenbaum

MIT

Department of Brain and Cognitive Sciences

Computer Science and AI Lab (CSAIL)


Slide2 l.jpg

Collaborators

Tom Griffiths

Charles Kemp

Noah Goodman

Chris Baker

Amy Perfors

Vikash Mansinghka

Lauren Schmidt

Pat Shafto


The probabilistic revolution in ai l.jpg
The probabilistic revolution in AI

  • Principled and effective solutions for inductive inference from ambiguous data:

    • Vision

    • Robotics

    • Machine learning

    • Expert systems / reasoning

    • Natural language processing

  • Standard view: no necessary connection to how the human brain solves these problems.


Bayesian models of cognition l.jpg
Bayesian models of cognition

Visual perception [Weiss, Simoncelli, Adelson, Richards, Freeman, Feldman, Kersten, Knill, Maloney, Olshausen, Jacobs, Pouget, ...]

Language acquisition and processing [Brent, de Marken, Niyogi, Klein, Manning, Jurafsky, Keller, Levy, Hale, Johnson, Griffiths, Perfors, Tenenbaum, …]

Motor learning and motor control [Ghahramani, Jordan, Wolpert, Kording, Kawato, Doya, Todorov, Shadmehr,…]

Associative learning [Dayan, Daw, Kakade, Courville, Touretzky, Kruschke, …]

Memory [Anderson, Schooler, Shiffrin, Steyvers, Griffiths, McClelland, …]

Attention [Mozer, Huber, Torralba, Oliva, Geisler, Movellan, Yu, Itti, Baldi, …]

Categorization and concept learning [Anderson, Nosfosky, Rehder, Navarro, Griffiths, Feldman, Tenenbaum, Rosseel, Goodman, Kemp, Mansinghka, …]

Reasoning [Chater, Oaksford, Sloman, McKenzie, Heit, Tenenbaum, Kemp, …]

Causal inference [Waldmann, Sloman, Steyvers, Griffiths, Tenenbaum, Yuille, …]

Decision making and theory of mind [Lee, Stankiewicz, Rao, Baker, Goodman, Tenenbaum, …]


Everyday inductive leaps l.jpg
Everyday inductive leaps

How can people learn so much about the world from such limited evidence?

  • Learning concepts from examples

“horse”

“horse”

“horse”


Learning concepts from examples l.jpg

“tufa”

“tufa”

“tufa”

Learning concepts from examples


Everyday inductive leaps7 l.jpg
Everyday inductive leaps

How can people learn so much about the world from such limited evidence?

  • Kinds of objects and their properties

  • The meanings of words, phrases, and sentences

  • Cause-effect relations

  • The beliefs, goals and plans of other people

  • Social structures, conventions, and rules


Modeling goals l.jpg
Modeling Goals

  • Principled quantitative models of human behavior, with broad coverage and a minimum of free parameters and ad hoc assumptions.

  • Explain how and why human learning and reasoning works, in terms of (approximations to) optimal statistical inference in natural environments.

  • A framework for studying people’s implicit knowledge about the structure of the world: how it is structured, used, and acquired.

  • A two-way bridge to state-of-the-art AI and machine learning.


The approach from statistics to intelligence l.jpg
The approach: from statistics to intelligence

  • How does background knowledge guide learning from sparsely observed data?

    Bayesian inference:

    2. What form does background knowledge take, across different domains and tasks?

    Probabilities defined over structured representations: graphs, grammars, predicate logic, schemas, theories.

    3. How is background knowledge itself acquired?

    Hierarchical probabilistic models, with inference at multiple levels of abstraction. Flexible nonparametric models in which complexity grows with the data.


Outline l.jpg
Outline

  • Predicting everyday events

  • Learning concepts from examples

  • The big picture


Basics of bayesian inference l.jpg
Basics of Bayesian inference

  • Bayes’ rule:

  • An example

    • Data: John is coughing

    • Some hypotheses:

      • John has a cold

      • John has lung cancer

      • John has a stomach flu

    • Likelihood P(d|h) favors 1 and 2 over 3

    • Prior probability P(h) favors 1 and 3 over 2

    • Posterior probability P(h|d) favors 1 over 2 and 3


Bayesian inference in perception and sensorimotor integration l.jpg
Bayesian inference in perception and sensorimotor integration

(Weiss, Simoncelli & Adelson 2002)

(Kording & Wolpert 2004)


Everyday prediction problems griffiths tenenbaum 2006 l.jpg
Everyday prediction problems integration (Griffiths & Tenenbaum, 2006)

  • You read about a movie that has made $60 million to date. How much money will it make in total?

  • You see that something has been baking in the oven for 34 minutes. How long until it’s ready?

  • You meet someone who is 78 years old. How long will they live?

  • Your friend quotes to you from line 17 of his favorite poem. How long is the poem?

  • You meet a US congressman who has served for 11 years. How long will he serve in total?

  • You encounter a phenomenon or event with an unknown extent or duration, ttotal, at a random time or value of t <ttotal.What is the total extent or duration ttotal?


Bayesian analysis l.jpg
Bayesian analysis integration

P(ttotal|t)  P(t|ttotal) P(ttotal)

 1/ttotal P(ttotal)

Assume

random

sample

(for 0 < t < ttotal

else = 0)

Form of P(ttotal)?

e.g., uninformative (Jeffreys) prior  1/ttotal


Bayesian analysis15 l.jpg
Bayesian analysis integration

P(ttotal|t)  1/ttotal 1/ttotal

posterior

probability

Random

sampling

“Uninformative”

prior

P(ttotal|t)

ttotal

t

Best guess for ttotal:

t* such that P(ttotal > t*|t) = 0.5

Yields Gott’s Rule:

Guess t*= 2t


Evaluating gott s rule l.jpg
Evaluating Gott’s Rule integration

  • You read about a movie that has made $78 million to date. How much money will it make in total?

    • “$156 million” seems reasonable.

  • You meet someone who is 35 years old. How long will they live?

    • “70 years” seems reasonable.

  • Not so simple:

    • You meet someone who is 78 years old. How long will they live?

    • You meet someone who is 6 years old. How long will they live?


Slide17 l.jpg

Priors integration P(ttotal) based on empirically measured durations or magnitudes for many real-world events in each class:

Median human judgments of the total duration or magnitude ttotal of events in each class, given that they are first observed at a duration or magnitude t, versus Bayesian predictions (median of P(ttotal|t)).


Slide18 l.jpg

You learn that in ancient integration

Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?


Slide19 l.jpg

You learn that in ancient integration

Egypt, there was a great flood in the 11th year of a pharaoh’s reign. How long did he reign?

How long did the typical

pharaoh reign in ancient

egypt?


Summary prediction l.jpg
Summary: prediction integration

  • Predictions about the extent or magnitude of everyday events follow Bayesian principles.

  • Contrast with Bayesian inference in perception, motor control, memory: no “universal priors” here.

  • Predictions depend rationally on priors that are appropriately calibrated for different domains.

    • Form of the prior (e.g., power-law or exponential)

    • Specific distribution given that form (parameters)

    • Non-parametric distribution when necessary.

  • In the absence of concrete experience, priors may be generated by qualitative background knowledge.


Learning concepts from examples21 l.jpg
Learning concepts from examples integration

“tufa”

“tufa”

  • Word learning

“tufa”

  • Property induction

Cows have T9 hormones.

Seals have T9 hormones.

Squirrels have T9 hormones.

All mammals have T9 hormones.

Cows have T9 hormones.

Sheep have T9 hormones.

Goats have T9 hormones.

All mammals have T9 hormones.


The computational problem c f semi supervised learning l.jpg
The computational problem integration (c.f., semi-supervised learning)

?

Horse

Cow

Chimp

Gorilla

Mouse

Squirrel

Dolphin

Seal

Rhino

Elephant

?

?

?

?

?

?

?

?

New property

Features

(85 features from Osherson et al. E.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘quadrapedal’,…)


Slide23 l.jpg

Similarity-based models integration

Human judgments

of argument strength

Model predictions

Cows have property P.

Elephants have property P.

Horses have property P.

All mammals have property P.

Gorillas have property P.

Mice have property P.

Seals have property P.

All mammals have property P.


Beyond similarity based induction l.jpg
Beyond similarity-based induction integration

Poodles can bite through wire.

German shepherds can bite through wire.

  • Reasoning based on dimensional thresholds:(Smith et al., 1993)

  • Reasoning based on causal relations:(Medin et al., 2004; Coley & Shafto, 2003)

Dobermans can bite through wire.

German shepherds can bite through wire.

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon carry E. Spirus bacteria.


Slide25 l.jpg

Horses have T9 hormones integration

Rhinos have T9 hormones

Cows have T9 hormones

}

X

Y

Hypotheses h

Horse

Cow

Chimp

Gorilla

Mouse

Squirrel

Dolphin

Seal

Rhino

Elephant

?

?

?

?

?

?

?

?

...

...

Prior P(h)


Slide26 l.jpg

Horses have T9 hormones integration

Rhinos have T9 hormones

Cows have T9 hormones

}

X

Y

Hypotheses h

Prediction P(Y | X)

Horse

Cow

Chimp

Gorilla

Mouse

Squirrel

Dolphin

Seal

Rhino

Elephant

?

?

?

?

?

?

?

?

...

...

Prior P(h)


Where does the prior come from l.jpg
Where does the prior come from? integration

Horse

Cow

Chimp

Gorilla

Mouse

Squirrel

Dolphin

Seal

Rhino

Elephant

...

...

Prior P(h)

Why not just enumerate all logically possible hypotheses

along with their relative prior probabilities?


Different sources for priors l.jpg

Chimps have T9 hormones. integration

Gorillas have T9 hormones.

Taxonomic similarity

Poodles can bite through wire.

Dobermans can bite through wire.

Jaw strength

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Food web relations

Different sources for priors


Hierarchical bayesian framework l.jpg

mouse integration

P(form)

squirrel

chimp

gorilla

P(structure | form)

P(data | structure)

Hierarchical Bayesian Framework

F: form

Tree with species at leaf nodes

Background knowledge

S: structure

F1

F2

F3

F4

Has T9

hormones

mouse

squirrel

chimp

gorilla

?

?

?

D: data



Hierarchical bayesian framework31 l.jpg

mouse integration

squirrel

chimp

gorilla

Hierarchical Bayesian Framework

F: form

Tree with species at leaf nodes

S: structure

F1

F2

F3

F4

Has T9

hormones

mouse

squirrel

chimp

gorilla

?

?

?

D: data

Property induction


P d s how the structure constrains the data of experience l.jpg
P integration (D|S): How the structure constrains the data of experience

  • Define a stochastic process over structure S that generates hypotheses h.

    • Intuitively, properties should vary smoothly over structure.

Smooth: P(h) high

Not smooth: P(h) low


P d s how the structure constrains the data of experience33 l.jpg
P integration (D|S): How the structure constrains the data of experience

S

Gaussian Process

(~ random walk,

diffusion)

[Zhu, Ghahramani & Lafferty 2003]

y

Threshold

h


P d s how the structure constrains the data of experience34 l.jpg
P integration (D|S): How the structure constrains the data of experience

S

Gaussian Process

(~ random walk,

diffusion)

[Zhu, Lafferty & Ghahramani 2003]

y

Threshold

h


Slide35 l.jpg

Structure integration S

Data D

Species 1

Species 2

Species 3

Species 4

Species 5

Species 6

Species 7

Species 8

Species 9

Species 10

Features

85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…



Slide38 l.jpg

Structure integration S

Data D

Species 1

Species 2

Species 3

Species 4

Species 5

Species 6

Species 7

Species 8

Species 9

Species 10

?

?

?

?

?

?

?

?

Features

New property

85 features for 50 animals (Osherson et al.): e.g., for Elephant: ‘gray’, ‘hairless’, ‘toughskin’, ‘big’, ‘bulbous’, ‘longleg’, ‘tail’, ‘chewteeth’, ‘tusks’, ‘smelly’, ‘walks’, ‘slow’, ‘strong’, ‘muscle’, ‘fourlegs’,…


Slide39 l.jpg

Cows have property P. integration

Elephants have property P.

Horses have property P.

Tree

2D

Gorillas have property P.

Mice have property P.

Seals have property P.

All mammals have property P.


Testing different priors l.jpg

Inductive bias integration

Correct

bias

Wrong

bias

No

bias

Too strong

bias

Testing different priors


A connectionist alternative rogers and mcclelland 2004 l.jpg
A connectionist alternative integration (Rogers and McClelland, 2004)

Species

Features

Emergent structure:

clustering on hidden

unit activation vectors


Reasoning about spatially varying properties l.jpg
Reasoning about spatially integration varying properties

“Native American artifacts” task


Slide43 l.jpg

Property type integration

“has T9 hormones” “can bite through wire” “carry E. Spirus bacteria”

Theory Structure

taxonomic tree directed chain directed network

+ diffusion process + drift process + noisy transmission

Class A

Class B

Class C

Class D

Class E

Class F

Class G

Class D

Class D

Class A

Class A

Class F

Class E

Class C

Class C

Class B

Class G

Class E

Class B

Class F

Hypotheses

Class G

Class A

Class B

Class C

Class D

Class E

Class F

Class G

. . .

. . .

. . .


Reasoning with two property types l.jpg
Reasoning with two property types integration

“Given that X has property P, how likely is it that Y does?”

Herring

Biological

property

Tuna

Mako shark

Sand shark

Dolphin

Human

Disease

property

Kelp

Tree

Web

Sand shark

(Shafto, Kemp, Bonawitz,

Coley & Tenenbaum)

Mako shark

Human

Herring

Tuna

Kelp

Dolphin


Summary so far l.jpg
Summary so far integration

  • A framework for modeling human inductive reasoning as rational statistical inference over structured knowledge representations

    • Qualitatively different priors are appropriate for different domains of property induction.

    • In each domain, a prior that matches the world’s structure fits people’s judgments well, and better than alternative priors.

    • A language for representing different theories: graph structure defined over objects + probabilistic model for the distribution of properties over that graph.

  • Remaining question: How can we learn appropriate theories for different domains?


Hierarchical bayesian framework46 l.jpg

chimp integration

mouse

gorilla

squirrel

squirrel

chimp

gorilla

mouse

Hierarchical Bayesian Framework

F: form

Chain

Tree

Space

mouse

squirrel

S: structure

gorilla

chimp

F1

F2

F3

F4

D: data

mouse

squirrel

chimp

gorilla


Discovering structural forms l.jpg

Snake integration

Turtle

Crocodile

Robin

Ostrich

Bat

Orangutan

Discovering structural forms

Snake

Turtle

Bat

Crocodile

Robin

Orangutan

Ostrich

Ostrich

Robin

Crocodile

Snake

Turtle

Bat

Orangutan


Discovering structural forms48 l.jpg

Snake integration

Turtle

Crocodile

Robin

Ostrich

Bat

Orangutan

Discovering structural forms

“Great chain of being”

Snake

Turtle

Bat

Crocodile

Robin

Plant

Rock

Angel

Orangutan

Ostrich

God

Linnaeus

Ostrich

Robin

Crocodile

Snake

Turtle

Bat

Orangutan


People can discover structural forms l.jpg
People can discover structural forms integration

Tree structure for biological species

  • Periodic structure for chemical elements

  • Scientific discoveries

  • Children’s cognitive development

    • Hierarchical structure of category labels

    • Clique structure of social groups

    • Cyclical structure of seasons or days of the week

    • Transitive structure for value

“great chain of being”

Systema Naturae

Kingdom Animalia

 Phylum Chordata   Class Mammalia     Order Primates       Family Hominidae        Genus Homo          Species Homo sapiens

(1837)

(1735)

(1579)


Typical structure learning algorithms assume a fixed structural form l.jpg
Typical structure learning algorithms assume a fixed structural form

Flat Clusters

Line

Circle

K-Means

Mixture models

Competitive learning

Guttman scaling

Ideal point models

Circumplex models

Grid

Tree

Euclidean Space

Hierarchical clustering

Bayesian phylogenetics

Self-Organizing Map

Generative topographic mapping

MDS

PCA

Factor Analysis


The ultimate goal l.jpg
The ultimate goal structural form

“Universal Structure Learner”

K-Means

Hierarchical clustering

Factor Analysis

Guttman scaling

Circumplex models

Self-Organizing maps

···

Data

Representation


A universal grammar for structural forms l.jpg
A “universal grammar” for structural forms structural form

Form

Process

Form

Process


Hierarchical bayesian framework53 l.jpg

mouse structural form

squirrel

Favors simplicity

chimp

gorilla

Favors smoothness

[Zhu et al., 2003]

Hierarchical Bayesian Framework

F: form

S: structure

F1

F2

F3

F4

D: data

mouse

squirrel

chimp

gorilla


Model fitting l.jpg
Model fitting structural form

  • Evaluate each form in parallel

  • For each form, heuristic search over structures based on greedy growth from a one-node seed:


Slide56 l.jpg

Structural forms from relational data structural form

Dominance

hierarchy Tree Cliques Ring

Primate troop Bush administration Prison inmates Kula islands

“x beats y” “x told y” “x likes y” “x trades with y”



Beyond nativism versus empiricism l.jpg
Beyond “Nativism” versus “Empiricism” structural form

  • “Nativism”: Explicit knowledge of structural forms for core domains is innate.

    • Atran (1998): The tendency to group living kinds into hierarchies reflects an “innately determined cognitive structure”.

    • Chomsky (1980): “The belief that various systems of mind are organized along quite different principles leads to the natural conclusion that these systems are intrinsically determined, not simply the result of common mechanisms of learning or growth.”

  • “Empiricism”: General-purpose learning systems without explicit knowledge of structural form.

    • Connectionist networks (e.g., Rogers and McClelland, 2004).

    • Traditional structure learning in probabilistic graphical models.


Summary learning from examples l.jpg
Summary: learning from examples structural form

F: form

Bayesian inference over hierarchies of structured representations provides a framework to understand core questions of human learning:

  • What is the content and form of human knowledge, at multiple levels of abstraction?

  • How does abstract domain knowledge guide learning of new concepts?

  • How is abstract domain knowledge learned? What must be built in?

mouse

squirrel

S: structure

chimp

gorilla

F1

F2

F3

F4

mouse

squirrel

chimp

gorilla

D: data

  • How can domain-general learning mechanisms acquire domain-specific representations? How can probabilistic inference work together with symbolic, flexibly structured representations?


Slide60 l.jpg

Learning word meanings structural form

Bayesian inference over tree-structured hypothesis space:

(Xu & Tenenbaum;

Schmidt & Tenenbaum)

“tufa”

“tufa”

“tufa”


Learning word meanings l.jpg
Learning word meanings structural form

Shape bias

Taxonomic principle

Contrast principle

Basic-level bias

Representative examples

Principles

Structure

Data


Learning causal relations l.jpg
Learning causal relations structural form

Abstract

Principles

Structure

Data

(Griffiths, Tenenbaum, Kemp et al.)


Causal learning with prior knowledge griffiths sobel tenenbaum gopnik l.jpg
Causal learning with prior knowledge structural form(Griffiths, Sobel, Tenenbaum & Gopnik)

“Backwards blocking”

paradigm:

A Trial

Initial

AB Trial



Learning causal relations65 l.jpg
Learning causal relations structural form

Structure

Data

conditions

has(patient,condition)

patients


Abstract causal theories l.jpg
Abstract causal theories structural form

Classes = {C}

Laws = {C C}

Classes = {R,D, S}

Laws = {R D, D S}

Classes = {R, D, S}

Laws = {S D}


Learning causal relations67 l.jpg
Learning causal relations structural form

R: working in factory, smoking, stress, high fat diet, …

D: flu, bronchitis, lung cancer, heart disease, …

S: headache, fever, coughing, chest pain, …

Abstract

Principles

Classes = {R,D, S}

Laws = {R D, D S}

Structure

Data

conditions

has(patient,condition)

patients


Slide68 l.jpg

1 structural form

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

True structure of

graphical model G:

# of samples: 20 80 1000

Graph G

edge

(G)

Data D

Classes Z

1 2 3 4 5 6

7 8 9 10 11 12 13

14 15 16

class

(z)

Abstract

Theory

c1

c2

c1

c2

h

0.4

c1

0.0

c2

0.0

0.0

edge

(G)

Graph G

Data D

(Mansinghka, Kemp, Tenenbaum, Griffiths UAI 06)


Slide69 l.jpg

“Universal Grammar” structural form

Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

P(grammar | UG)

Grammar

P(phrase structure | grammar)

Phrase structure

P(utterance | phrase structure)

Utterance

P(speech | utterance)

Speech signal

(Jurafsky; Levy & Jaeger; Klein & Manning; Perfors et al., ….)


Vision as probabilistic parsing l.jpg
Vision as probabilistic parsing structural form

“Analysis by

Synthesis”

(Han & Zhu, 2006)


Goal directed action production and comprehension l.jpg
Goal-directed action (production and comprehension)

(Wolpert et al., 2003)


Understanding goal directed actions l.jpg
Understanding goal-directed actions comprehension)

  • Heider and Simmel

  • Csibra & Gergely

Constraints

Goals

Principle of rationality: An intentional agent plans actions to achieve its goals most efficiently given its environmental constraints.

Actions


Goal inference as inverse probabilistic planning l.jpg

Constraints comprehension)

Goals

Goal inference as inverseprobabilistic planning

Rational planning

(PO)MDP

(Baker, Tenenbaum & Saxe)

Actions

human judgments

model predictions


The big picture l.jpg
The big picture comprehension)

  • What we need to understand: the mind’s ability to build rich models of the world from sparse data.

    • Learning about objects, categories, and their properties.

    • Causal inference

    • Language comprehension and production

    • Scene understanding

    • Understanding other people’s actions, plans, thoughts, goals

  • What do we need to understand these abilities?

    • Bayesian inference in probabilistic generative models

    • Hierarchical models, with inference at all levels of abstraction

    • Structured representations: graphs, grammars, logic

    • Flexible representations, growing in response to observed data


Open directions and challenges l.jpg
Open directions and challenges comprehension)

  • Effective methods for learning structured knowledge

    • How to balance expressiveness and learnability? … flexibility and constraint?

  • More precise relation to psychological processes

    • To what extent do mental processes implement boundedly rational methods of approximate inference?

  • Relation to neural computation

    • How to implement structured representations in brains?

  • Understanding failure cases

    • Are these simply “not Bayesian”, or are people using a different model? How do we avoid circularity?


The standard model of learning in neuroscience l.jpg
The “standard model” of learning in neuroscience comprehension)

Supervised

Unsupervised


Learning grounded causal models goodman mansinghka tenenbaum l.jpg
Learning grounded causal models comprehension)(Goodman, Mansinghka & Tenenbaum)

A child learns that petting the cat leads to purring, while pounding

leads to growling. But how to learn these symbolic event concepts

over which causal links are defined?

a b c

a b c

a b c

a

b

c


The chicken and egg problem of structure learning and feature selection l.jpg
The chicken-and-egg problem of structure learning and feature selection

A raw data matrix:


The chicken and egg problem of structure learning and feature selection80 l.jpg
The chicken-and-egg problem of structure learning and feature selection

Conventional clustering (CRP mixture):


Slide81 l.jpg
Learning multiple structures to explain different feature subsets (Shafto, Kemp, Mansinghka, Gordon & Tenenbaum, 2006)

CrossCat:

System 1

System 2

System 3


The nonparametric safety net l.jpg
The “nonparametric safety-net” subsets

12

11

1

True structure of

graphical model G:

10

2

9

3

8

4

7

5

6

# of samples: 40 100 1000

Graph G

edge

(G)

Data D

edge

(G)

Abstract theory Z

Graph G

class

(z)

Data D


Bayesian prediction l.jpg
Bayesian prediction subsets

P(ttotal|tpast)  1/ttotal P(tpast)

posterior

probability

Random

sampling

Domain-dependent

prior

What is the best guess for ttotal?

Compute t such that P(ttotal > t|tpast) = 0.5:

P(ttotal|tpast)

We compared the median

of the Bayesian posterior

with the median of subjects’

judgments… but what about the distribution of subjects’ judgments?

ttotal


Sources of individual differences l.jpg
Sources of individual differences subsets

  • Individuals’ judgments could by noisy.

  • Individuals’ judgments could be optimal, but with different priors.

    • e.g., each individual has seen only a sparse sample of the relevant population of events.

  • Individuals’ inferences about the posterior could be optimal, but their judgments could be based on probability (or utility) matching rather than maximizing.


Individual differences in prediction l.jpg
Individual differences in prediction subsets

P(ttotal|tpast)

ttotal

Proportion of judgments below predicted value

Quantile of Bayesian posterior distribution


Individual differences in prediction87 l.jpg
Individual differences in prediction subsets

P(ttotal|tpast)

ttotal

  • Average over all

  • prediction tasks:

  • movie run times

  • movie grosses

  • poem lengths

  • life spans

  • terms in congress

  • cake baking times


ad