part iii hierarchical bayesian models
Download
Skip this Video
Download Presentation
Part III Hierarchical Bayesian Models

Loading in 2 Seconds...

play fullscreen
1 / 114

Part III - PowerPoint PPT Presentation


  • 330 Views
  • Uploaded on

Part III Hierarchical Bayesian Models. Universal Grammar. Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG). Grammar. Phrase structure. Utterance. Speech signal. Vision. (Han and Zhu, 2006). Word learning. Whole-object principle Shape bias Taxonomic principle

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Part III' - Roberta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide2

Universal Grammar

Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

Grammar

Phrase structure

Utterance

Speech signal

vision
Vision

(Han and Zhu, 2006)

word learning
Word learning

Whole-object principle

Shape bias

Taxonomic principle

Contrast principle

Basic-level bias

Principles

Structure

Data

hierarchical bayesian models
Hierarchical Bayesian models
  • Can represent and reason about knowledge at multiple levels of abstraction.
  • Have been used by statisticians for many years.
hierarchical bayesian models6
Hierarchical Bayesian models
  • Can represent and reason about knowledge at multiple levels of abstraction.
  • Have been used by statisticians for many years.
  • Have been applied to many cognitive problems:
    • causal reasoning (Mansinghka et al, 06)
    • language (Chater and Manning, 06)
    • vision (Fei-Fei, Fergus, Perona, 03)
    • word learning (Kemp, Perfors, Tenenbaum,06)
    • decision making (Lee, 06)
outline
Outline
  • A high-level view of HBMs
  • A case study
    • Semantic knowledge
slide8

Universal Grammar

Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

P(grammar | UG)

Grammar

P(phrase structure | grammar)

Phrase structure

P(utterance | phrase structure)

Utterance

P(speech | utterance)

Speech signal

slide9

Hierarchical Bayesian model

U

Universal Grammar

P(G|U)

G

Grammar

P(s|G)

s1

s2

s3

s4

s5

s6

Phrase structure

P(u|s)

u1

u2

u3

u4

u5

u6

Utterance

slide10

Hierarchical Bayesian model

U

Universal Grammar

P(G|U)

G

Grammar

P(s|G)

s1

s2

s3

s4

s5

s6

Phrase structure

P(u|s)

u1

u2

u3

u4

u5

u6

Utterance

A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy:P({ui}, {si}, G | U)

= P ({ui} | {si}) P({si} | G) P(G|U)

knowledge at multiple levels
Knowledge at multiple levels
  • Top-down inferences:
    • How does abstract knowledge guide inferences at lower levels?
  • Bottom-up inferences:
    • How can abstract knowledge be acquired?
  • Simultaneous learning at multiple levels of abstraction
slide12

Top-down inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Given grammar G and a collection of utterances, construct a phrase structure for each utterance.

slide13

Top-down inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Infer {si} given {ui}, G:

P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G)

slide14

Bottom-up inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Given a collection of phrase structures, learn a grammar G.

slide15

Bottom-up inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Infer G given {si} and U:

P(G| {si}, U) α P( {si} | G) P(G|U)

slide16

Simultaneous learning at multiple levels

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Given a set of utterances {ui} and innate knowledge U, construct a grammar G and a phrase structure for each utterance.

slide17

Simultaneous learning at multiple levels

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

  • A chicken-or-egg problem:
    • Given a grammar, phrase structures can be constructed
    • Given a set of phrase structures, a grammar can be learned
slide18

Simultaneous learning at multiple levels

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Infer G and {si} given {ui} and U:

P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U)

slide19

Hierarchical Bayesian model

U

Universal Grammar

P(G|U)

G

Grammar

P(s|G)

s1

s2

s3

s4

s5

s6

Phrase structure

P(u|s)

u1

u2

u3

u4

u5

u6

Utterance

knowledge at multiple levels20
Knowledge at multiple levels
  • Top-down inferences:
    • How does abstract knowledge guide inferences at lower levels?
  • Bottom-up inferences:
    • How can abstract knowledge be acquired?
  • Simultaneous learning at multiple levels of abstraction
outline21
Outline
  • A high-level view of HBMs
  • A case study: Semantic knowledge
folk biology
Folk Biology

The relationships between living kinds are well described by tree-structured representations

R: principles

mouse

S: structure

squirrel

chimp

gorilla

D: data

“Gorillas have hands”

folk biology23
Folk Biology

R: principles

Structural form: tree

mouse

squirrel

S: structure

chimp

gorilla

D: data

outline24
Outline
  • A high-level view of HBMs
  • A case study: Semantic knowledge
    • Property induction
    • Learning structured representations
    • Learning the abstract organizing principles of a domain
property induction
Property induction

R: principles

Structural form: tree

mouse

squirrel

S: structure

chimp

gorilla

D: data

property induction26
Property Induction

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

Approach: work with the distribution P(D|S,R)

property induction27

Horses have T4 cells.

Elephants have T4 cells.

All mammals have T4 cells.

Horses have T4 cells.

Seals have T4 cells.

All mammals have T4 cells.

Property Induction

Previous approaches: Rips (75), Osherson et al (90),

Sloman (93), Heit (98)

slide30

Horses have T4 cells.

Elephants have T4 cells.

Cows have T4 cells.

}

D

C

choosing a prior

Chimps have T4 cells.

Gorillas have T4 cells.

Taxonomic similarity

Poodles can bite through wire.

Dobermans can bite through wire.

Jaw strength

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Food web relations

Choosing a prior
bayesian property induction32
Bayesian Property Induction
  • A challenge:
    • We have to specify the prior, which typically includes many numbers
  • An opportunity:
    • The prior can capture knowledge about the problem.
property induction33
Property Induction

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

biological properties
Biological properties
  • Structure:
    • Living kinds are organized into a tree
  • Stochastic process:
    • Nearby species in the tree tend to share properties
stochastic process
Stochastic Process
  • Nearby species in the tree tend to share properties.
  • In other words, properties tend to be smooth over the tree.

Smooth

Not smooth

generating a property
Generating a property

y

h

where y tends to be smooth over the tree:

threshold

the diffusion process
The diffusion process

where Ө(yi) is 1 if yi≥ 0 and 0 otherwise the covariance K encourages y to be smooth over the graph S

p y s r generating a property
p(y|S,R): Generating a property

Let yi be the feature value at node i

i

}

j

(Zhu, Lafferty, Ghahramani 03)

biological properties43
Biological properties

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

Approach: work with the distribution P(D|S,R)

slide44

Horses have T4 cells.

Elephants have T4 cells.

Cows have T4 cells.

}

D

C

results

Cows have property P.

Elephants have property P.

Horses have property P.

Dolphins have property P.

Seals have property P.

Horses have property P.

Results

Human

Model

(Osherson et al)

results46

Gorillas have property P.

Mice have property P.

Seals have property P.

All mammals have property P.

Results

Human

Model

Cows have property P.

Elephants have property P.

Horses have property P.

All mammals have property P.

spatial model
Spatial model

Structural form: 2D space

Stochastic process: diffusion

R: principles

squirrel

mouse

S: structure

gorilla

chimp

D: data

tree vs 2d
Tree vs 2D

Tree + diffusion

2D + diffusion

“horse”

“all mammals”

biological properties51
Biological Properties

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

three inductive contexts
Three inductive contexts

“can bite

through wire”

“carries E. Spirus bacteria”

“has T4 cells”

tree +

diffusion process

chain +

drift

process

network +

causal

transmission

R:

Class D

Class D

Class A

Class A

Class A

Class B

Class F

Class E

Class C

S:

Class C

Class C

Class B

Class D

Class G

Class E

Class E

Class B

Class F

Class F

Class G

Class G

threshold properties
Threshold properties
  • “can bite through wire”
  • “has skin that is more resistant to penetration than most synthetic fibers”

Doberman

Poodle

Collie

Hippo

Elephant

Cat

Lion

Camel

(Osherson et al; Blok et al)

threshold properties54
Threshold properties
  • Structure:
    • The categories can be organized along a single dimension
  • Stochastic process:
    • Categories towards one end of the dimension are more likely to have the novel property
results55
Results

“has skin that is more resistant to penetration than most synthetic fibers”

1D + drift

1D + diffusion

(Blok et al, Smith et al)

three inductive contexts56
Three inductive contexts

“can bite

through wire”

“carries E. Spirus bacteria”

“has T4 cells”

tree +

diffusion process

chain +

drift

process

network +

causal

transmission

R:

Class D

Class D

Class A

Class A

Class A

Class B

Class F

Class E

Class C

S:

Class C

Class C

Class B

Class D

Class G

Class E

Class E

Class B

Class F

Class F

Class G

Class G

causally transmitted properties

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Causally transmitted properties

Grizzly bear

Salmon

(Medin et al;

Shafto and Coley)

causally transmitted properties58
Causally transmitted properties
  • Structure:
    • The categories can be organized into a directed network
  • Stochastic process:
    • Properties are generated by a noisy transmission process
experiment disease properties
Experiment: disease properties

(Shafto et al)

Mammals

Island

results disease properties
Results: disease properties

Web +

transmission

Island

Mammals

three inductive contexts61
Three inductive contexts

“can bite

through wire”

“carries E. Spirus bacteria”

“has T4 cells”

tree +

diffusion process

chain +

drift

process

network +

causal

transmission

R:

Class D

Class D

Class A

Class A

Class A

Class B

Class F

Class E

Class C

S:

Class C

Class C

Class B

Class D

Class G

Class E

Class E

Class B

Class F

Class F

Class G

Class G

property induction62
Property Induction

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

Approach: work with the distribution P(D|S,R)

conclusions property induction
Conclusions : property induction
  • Hierarchical Bayesian models help to explain how abstract knowledge can be used for induction
outline64
Outline
  • A high-level view of HBMs
  • A case study: Semantic knowledge
    • Property induction
    • Learning structured representations
    • Learning the abstract organizing principles of a domain
structure learning
Structure learning

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

structure learning66
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R)

structure learning67
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)

structure learning68
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

The distribution

previously used for property induction

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)

generating features over the tree
Generating features over the tree

mouse

squirrel

chimp

gorilla

generating features over the tree70
Generating features over the tree

mouse

squirrel

chimp

gorilla

structure learning71
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)

p s r generating structures
P(S|R): Generating structures

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

Inconsistent with R

Consistent with R

p s r generating structures73
P(S|R): Generating structures

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

Simple

Complex

p s r generating structures74
P(S|R): Generating structures

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

  • Each structure is weighted by the number of nodes it contains:

if S inconsistent with R

otherwise

where is the number of nodes in S

structure learning75
Structure Learning

R: principles

P(S|D,R) will be high when:

  • The features in D vary smoothly over S
  • S is a simple graph (a graph with few nodes)

S: structure

D: data

Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R)

structure learning76
Structure Learning

R: principles

P(S|D,R) will be high when:

  • The features in D vary smoothly over S
  • S is a simple graph (a graph with few nodes)

S: structure

D: data

Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R)

structure learning example
Structure learning example
  • Participants rated the goodness of 85 features for 48 animals
  • E.g., elephant:

gray hairless toughskin

big bulbous longleg

tail chewteeth tusks

smelly walks slow

strong muscle quadrapedal

inactive vegetation grazer

oldworld bush jungle

ground timid smart

group

(Osherson et al)

biological data
Biological Data

Features

Animals

spatial model80
Spatial model

Structural form: 2D space

Stochastic process: diffusion

R: principles

squirrel

mouse

S: structure

gorilla

chimp

D: data

conclusions structure learning
Conclusions: structure learning
  • Hierarchical Bayesian models provide a unified framework for the acquisition and use of structured representations
outline83
Outline
  • A high-level view of HBMs
  • A case study: Semantic knowledge
    • Property induction
    • Learning structured representations
    • Learning the abstract organizing principles of a domain
learning structural form
Learning structural form

Structural form: tree

Stochastic process: diffusion

R: principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

which form is best

Snake

Turtle

Crocodile

Robin

Ostrich

Bat

Orangutan

Which form is best?

Snake

Turtle

Bat

Crocodile

Robin

Orangutan

Ostrich

structural forms
Structural forms

Order

Partition

Chain

Ring

Hierarchy

Tree

Grid

Cylinder

learning structural form87
Learning structural form

could be

tree,

2D space,

ring, ….

Structural form: F

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S,F that maximize P(S,F|D)

learning structural form88
Learning structural form

Structural form: F

Stochastic process: diffusion

R: principles

?

S: structure

Uniform distribution on the set of forms

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S)P(S|F) P(F)

learning structural form89
Learning structural form

Structural form: F

Stochastic process: diffusion

R: principles

The distribution used for property induction

?

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

learning structural form90
Learning structural form

Structural form: F

Stochastic process: diffusion

R: principles

The distribution used for structure learning

?

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

p s f generating structures from forms
P(S|F): Generating structures from forms

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

  • Each structure is weighted by the number of nodes it contains:

if S inconsistent with F

otherwise

where is the number of nodes in S

slide92

P(S|F): Generating structures from forms

  • Simpler forms are preferred

Chain

Grid

P(S|F)

All possible

graph structures S

A

B

C

D

learning structural form93
Learning structural form

?

F: form

?

S: structure

D: data

Goal: find S,F that maximize P(S,F|D)

learning structural form94
Learning structural form

F: form

  • P(S,F|D) will be high when:
    • The features in D vary smoothly over S
    • S is a simple graph (a graph with few nodes)
    • F is a simple form (a form that can generate only a few structures)

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

learning structural form95
Learning structural form

F: form

  • P(S,F|D) will be high when:
    • The features in D vary smoothly over F
    • S is a simple graph (a graph with few nodes)
    • F is a simple form (a form that can generate only a few structures)

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)

form learning biological data
Form learning: Biological Data
  • 33 animals, 110 features

Features

Animals

supreme court spaeth
Supreme Court (Spaeth)
  • Votes on 1600 cases (1987-2005)
outline100
Outline
  • A high-level view of HBMs
  • A case study: Semantic knowledge
    • Property induction
    • Learning structured representations
    • Learning the abstract organizing principles of a domain
slide102

Stochastic process: diffusion

mouse

squirrel

chimp

gorilla

slide103

Structural form: tree

Stochastic process: diffusion

mouse

squirrel

chimp

gorilla

slide104

Structural form: tree

Stochastic process: diffusion

mouse

squirrel

chimp

gorilla

where do structural forms come from
Where do structural forms come from?

Order

Partition

Chain

Ring

Hierarchy

Tree

Grid

Cylinder

node replacement graph grammars
Node-replacement graph grammars

Production

(Chain)

Derivation

node replacement graph grammars108
Node-replacement graph grammars

Production

(Chain)

Derivation

node replacement graph grammars109
Node-replacement graph grammars

Production

(Chain)

Derivation

when can we stop adding levels
When can we stop adding levels?
  • When the knowledge at the top level is simple or general enough that it can be plausibly assumed to be innate.
conclusions
Conclusions
  • Hierarchical Bayesian models provide a unified framework which can
    • Explain how abstract knowledge is used for induction
    • Explain how abstract knowledge can be acquired
learning abstract knowledge
Learning abstract knowledge

Applications of hierarchical Bayesian models at this conference:

  • Semantic knowledge: Schmidt et al.
    • Learning the M-constraint
  • Syntax: Perfors et al.
    • Learning that language is hierarchically organized
  • Word learning: Kemp et al.
    • Learning the shape bias
ad