Part iii hierarchical bayesian models
Download
1 / 114

Part III - PowerPoint PPT Presentation


  • 323 Views
  • Updated On :

Part III Hierarchical Bayesian Models. Universal Grammar. Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG). Grammar. Phrase structure. Utterance. Speech signal. Vision. (Han and Zhu, 2006). Word learning. Whole-object principle Shape bias Taxonomic principle

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Part III' - Roberta


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Part iii hierarchical bayesian models l.jpg

Part IIIHierarchical Bayesian Models


Slide2 l.jpg

Universal Grammar

Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

Grammar

Phrase structure

Utterance

Speech signal


Vision l.jpg
Vision

(Han and Zhu, 2006)


Word learning l.jpg
Word learning

Whole-object principle

Shape bias

Taxonomic principle

Contrast principle

Basic-level bias

Principles

Structure

Data


Hierarchical bayesian models l.jpg
Hierarchical Bayesian models

  • Can represent and reason about knowledge at multiple levels of abstraction.

  • Have been used by statisticians for many years.


Hierarchical bayesian models6 l.jpg
Hierarchical Bayesian models

  • Can represent and reason about knowledge at multiple levels of abstraction.

  • Have been used by statisticians for many years.

  • Have been applied to many cognitive problems:

    • causal reasoning (Mansinghka et al, 06)

    • language (Chater and Manning, 06)

    • vision (Fei-Fei, Fergus, Perona, 03)

    • word learning (Kemp, Perfors, Tenenbaum,06)

    • decision making (Lee, 06)


Outline l.jpg
Outline

  • A high-level view of HBMs

  • A case study

    • Semantic knowledge


Slide8 l.jpg

Universal Grammar

Hierarchical phrase structure grammars (e.g., CFG, HPSG, TAG)

P(grammar | UG)

Grammar

P(phrase structure | grammar)

Phrase structure

P(utterance | phrase structure)

Utterance

P(speech | utterance)

Speech signal


Slide9 l.jpg

Hierarchical Bayesian model

U

Universal Grammar

P(G|U)

G

Grammar

P(s|G)

s1

s2

s3

s4

s5

s6

Phrase structure

P(u|s)

u1

u2

u3

u4

u5

u6

Utterance


Slide10 l.jpg

Hierarchical Bayesian model

U

Universal Grammar

P(G|U)

G

Grammar

P(s|G)

s1

s2

s3

s4

s5

s6

Phrase structure

P(u|s)

u1

u2

u3

u4

u5

u6

Utterance

A hierarchical Bayesian model specifies a joint distribution over all variables in the hierarchy:P({ui}, {si}, G | U)

= P ({ui} | {si}) P({si} | G) P(G|U)


Knowledge at multiple levels l.jpg
Knowledge at multiple levels

  • Top-down inferences:

    • How does abstract knowledge guide inferences at lower levels?

  • Bottom-up inferences:

    • How can abstract knowledge be acquired?

  • Simultaneous learning at multiple levels of abstraction


Slide12 l.jpg

Top-down inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Given grammar G and a collection of utterances, construct a phrase structure for each utterance.


Slide13 l.jpg

Top-down inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Infer {si} given {ui}, G:

P( {si} | {ui}, G) α P( {ui} | {si} ) P( {si} |G)


Slide14 l.jpg

Bottom-up inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Given a collection of phrase structures, learn a grammar G.


Slide15 l.jpg

Bottom-up inferences

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Infer G given {si} and U:

P(G| {si}, U) α P( {si} | G) P(G|U)


Slide16 l.jpg

Simultaneous learning at multiple levels

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Given a set of utterances {ui} and innate knowledge U, construct a grammar G and a phrase structure for each utterance.


Slide17 l.jpg

Simultaneous learning at multiple levels

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

  • A chicken-or-egg problem:

    • Given a grammar, phrase structures can be constructed

    • Given a set of phrase structures, a grammar can be learned


Slide18 l.jpg

Simultaneous learning at multiple levels

U

Universal Grammar

G

Grammar

s1

s2

s3

s4

s5

s6

Phrase structure

u1

u2

u3

u4

u5

u6

Utterance

Infer G and {si} given {ui} and U:

P(G, {si} | {ui}, U) α P( {ui} | {si} )P({si} |G)P(G|U)


Slide19 l.jpg

Hierarchical Bayesian model

U

Universal Grammar

P(G|U)

G

Grammar

P(s|G)

s1

s2

s3

s4

s5

s6

Phrase structure

P(u|s)

u1

u2

u3

u4

u5

u6

Utterance


Knowledge at multiple levels20 l.jpg
Knowledge at multiple levels

  • Top-down inferences:

    • How does abstract knowledge guide inferences at lower levels?

  • Bottom-up inferences:

    • How can abstract knowledge be acquired?

  • Simultaneous learning at multiple levels of abstraction


Outline21 l.jpg
Outline

  • A high-level view of HBMs

  • A case study: Semantic knowledge


Folk biology l.jpg
Folk Biology

The relationships between living kinds are well described by tree-structured representations

R: principles

mouse

S: structure

squirrel

chimp

gorilla

D: data

“Gorillas have hands”


Folk biology23 l.jpg
Folk Biology

R: principles

Structural form: tree

mouse

squirrel

S: structure

chimp

gorilla

D: data


Outline24 l.jpg
Outline

  • A high-level view of HBMs

  • A case study: Semantic knowledge

    • Property induction

    • Learning structured representations

    • Learning the abstract organizing principles of a domain


Property induction l.jpg
Property induction

R: principles

Structural form: tree

mouse

squirrel

S: structure

chimp

gorilla

D: data


Property induction26 l.jpg
Property Induction

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

Approach: work with the distribution P(D|S,R)


Property induction27 l.jpg

Horses have T4 cells.

Elephants have T4 cells.

All mammals have T4 cells.

Horses have T4 cells.

Seals have T4 cells.

All mammals have T4 cells.

Property Induction

Previous approaches: Rips (75), Osherson et al (90),

Sloman (93), Heit (98)




Slide30 l.jpg

Horses have T4 cells.

Elephants have T4 cells.

Cows have T4 cells.

}

D

C


Choosing a prior l.jpg

Chimps have T4 cells.

Gorillas have T4 cells.

Taxonomic similarity

Poodles can bite through wire.

Dobermans can bite through wire.

Jaw strength

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Food web relations

Choosing a prior


Bayesian property induction32 l.jpg
Bayesian Property Induction

  • A challenge:

    • We have to specify the prior, which typically includes many numbers

  • An opportunity:

    • The prior can capture knowledge about the problem.


Property induction33 l.jpg
Property Induction

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data


Biological properties l.jpg
Biological properties

  • Structure:

    • Living kinds are organized into a tree

  • Stochastic process:

    • Nearby species in the tree tend to share properties




Stochastic process l.jpg
Stochastic Process

  • Nearby species in the tree tend to share properties.

  • In other words, properties tend to be smooth over the tree.

Smooth

Not smooth


Stochastic process38 l.jpg
Stochastic process

Hypotheses


Generating a property l.jpg
Generating a property

y

h

where y tends to be smooth over the tree:

threshold


Slide40 l.jpg

S


The diffusion process l.jpg
The diffusion process

where Ө(yi) is 1 if yi≥ 0 and 0 otherwise the covariance K encourages y to be smooth over the graph S


P y s r generating a property l.jpg
p(y|S,R): Generating a property

Let yi be the feature value at node i

i

}

j

(Zhu, Lafferty, Ghahramani 03)


Biological properties43 l.jpg
Biological properties

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

Approach: work with the distribution P(D|S,R)


Slide44 l.jpg

Horses have T4 cells.

Elephants have T4 cells.

Cows have T4 cells.

}

D

C


Results l.jpg

Cows have property P.

Elephants have property P.

Horses have property P.

Dolphins have property P.

Seals have property P.

Horses have property P.

Results

Human

Model

(Osherson et al)


Results46 l.jpg

Gorillas have property P.

Mice have property P.

Seals have property P.

All mammals have property P.

Results

Human

Model

Cows have property P.

Elephants have property P.

Horses have property P.

All mammals have property P.


Spatial model l.jpg
Spatial model

Structural form: 2D space

Stochastic process: diffusion

R: principles

squirrel

mouse

S: structure

gorilla

chimp

D: data




Tree vs 2d l.jpg
Tree vs 2D

Tree + diffusion

2D + diffusion

“horse”

“all mammals”


Biological properties51 l.jpg
Biological Properties

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data


Three inductive contexts l.jpg
Three inductive contexts

“can bite

through wire”

“carries E. Spirus bacteria”

“has T4 cells”

tree +

diffusion process

chain +

drift

process

network +

causal

transmission

R:

Class D

Class D

Class A

Class A

Class A

Class B

Class F

Class E

Class C

S:

Class C

Class C

Class B

Class D

Class G

Class E

Class E

Class B

Class F

Class F

Class G

Class G


Threshold properties l.jpg
Threshold properties

  • “can bite through wire”

  • “has skin that is more resistant to penetration than most synthetic fibers”

Doberman

Poodle

Collie

Hippo

Elephant

Cat

Lion

Camel

(Osherson et al; Blok et al)


Threshold properties54 l.jpg
Threshold properties

  • Structure:

    • The categories can be organized along a single dimension

  • Stochastic process:

    • Categories towards one end of the dimension are more likely to have the novel property


Results55 l.jpg
Results

“has skin that is more resistant to penetration than most synthetic fibers”

1D + drift

1D + diffusion

(Blok et al, Smith et al)


Three inductive contexts56 l.jpg
Three inductive contexts

“can bite

through wire”

“carries E. Spirus bacteria”

“has T4 cells”

tree +

diffusion process

chain +

drift

process

network +

causal

transmission

R:

Class D

Class D

Class A

Class A

Class A

Class B

Class F

Class E

Class C

S:

Class C

Class C

Class B

Class D

Class G

Class E

Class E

Class B

Class F

Class F

Class G

Class G


Causally transmitted properties l.jpg

Salmon carry E. Spirus bacteria.

Grizzly bears carry E. Spirus bacteria.

Causally transmitted properties

Grizzly bear

Salmon

(Medin et al;

Shafto and Coley)


Causally transmitted properties58 l.jpg
Causally transmitted properties

  • Structure:

    • The categories can be organized into a directed network

  • Stochastic process:

    • Properties are generated by a noisy transmission process


Experiment disease properties l.jpg
Experiment: disease properties

(Shafto et al)

Mammals

Island


Results disease properties l.jpg
Results: disease properties

Web +

transmission

Island

Mammals


Three inductive contexts61 l.jpg
Three inductive contexts

“can bite

through wire”

“carries E. Spirus bacteria”

“has T4 cells”

tree +

diffusion process

chain +

drift

process

network +

causal

transmission

R:

Class D

Class D

Class A

Class A

Class A

Class B

Class F

Class E

Class C

S:

Class C

Class C

Class B

Class D

Class G

Class E

Class E

Class B

Class F

Class F

Class G

Class G


Property induction62 l.jpg
Property Induction

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data

Approach: work with the distribution P(D|S,R)


Conclusions property induction l.jpg
Conclusions : property induction

  • Hierarchical Bayesian models help to explain how abstract knowledge can be used for induction


Outline64 l.jpg
Outline

  • A high-level view of HBMs

  • A case study: Semantic knowledge

    • Property induction

    • Learning structured representations

    • Learning the abstract organizing principles of a domain


Structure learning l.jpg
Structure learning

Structural form: tree

Stochastic process: diffusion

R: Principles

mouse

squirrel

S: structure

chimp

gorilla

D: data


Structure learning66 l.jpg
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R)


Structure learning67 l.jpg
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)


Structure learning68 l.jpg
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

The distribution

previously used for property induction

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)


Generating features over the tree l.jpg
Generating features over the tree

mouse

squirrel

chimp

gorilla


Generating features over the tree70 l.jpg
Generating features over the tree

mouse

squirrel

chimp

gorilla


Structure learning71 l.jpg
Structure learning

Structural form: tree

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S that maximizes P(S|D,R) α P(D|S,R) P(S|R)


P s r generating structures l.jpg
P(S|R): Generating structures

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

Inconsistent with R

Consistent with R


P s r generating structures73 l.jpg
P(S|R): Generating structures

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

Simple

Complex


P s r generating structures74 l.jpg
P(S|R): Generating structures

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

  • Each structure is weighted by the number of nodes it contains:

if S inconsistent with R

otherwise

where is the number of nodes in S


Structure learning75 l.jpg
Structure Learning

R: principles

P(S|D,R) will be high when:

  • The features in D vary smoothly over S

  • S is a simple graph (a graph with few nodes)

S: structure

D: data

Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R)


Structure learning76 l.jpg
Structure Learning

R: principles

P(S|D,R) will be high when:

  • The features in D vary smoothly over S

  • S is a simple graph (a graph with few nodes)

S: structure

D: data

Aim: find S that maximizes P(S|D,R) α P(D|S) P(S|R)


Structure learning example l.jpg
Structure learning example

  • Participants rated the goodness of 85 features for 48 animals

  • E.g., elephant:

gray hairless toughskin

big bulbous longleg

tail chewteeth tusks

smelly walks slow

strong muscle quadrapedal

inactive vegetation grazer

oldworld bush jungle

ground timid smart

group

(Osherson et al)


Biological data l.jpg
Biological Data

Features

Animals



Spatial model80 l.jpg
Spatial model

Structural form: 2D space

Stochastic process: diffusion

R: principles

squirrel

mouse

S: structure

gorilla

chimp

D: data



Conclusions structure learning l.jpg
Conclusions: structure learning

  • Hierarchical Bayesian models provide a unified framework for the acquisition and use of structured representations


Outline83 l.jpg
Outline

  • A high-level view of HBMs

  • A case study: Semantic knowledge

    • Property induction

    • Learning structured representations

    • Learning the abstract organizing principles of a domain


Learning structural form l.jpg
Learning structural form

Structural form: tree

Stochastic process: diffusion

R: principles

mouse

squirrel

S: structure

chimp

gorilla

D: data


Which form is best l.jpg

Snake

Turtle

Crocodile

Robin

Ostrich

Bat

Orangutan

Which form is best?

Snake

Turtle

Bat

Crocodile

Robin

Orangutan

Ostrich


Structural forms l.jpg
Structural forms

Order

Partition

Chain

Ring

Hierarchy

Tree

Grid

Cylinder


Learning structural form87 l.jpg
Learning structural form

could be

tree,

2D space,

ring, ….

Structural form: F

Stochastic process: diffusion

R: principles

?

S: structure

D: data

Goal: find S,F that maximize P(S,F|D)


Learning structural form88 l.jpg
Learning structural form

Structural form: F

Stochastic process: diffusion

R: principles

?

S: structure

Uniform distribution on the set of forms

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S)P(S|F) P(F)


Learning structural form89 l.jpg
Learning structural form

Structural form: F

Stochastic process: diffusion

R: principles

The distribution used for property induction

?

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)


Learning structural form90 l.jpg
Learning structural form

Structural form: F

Stochastic process: diffusion

R: principles

The distribution used for structure learning

?

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)


P s f generating structures from forms l.jpg
P(S|F): Generating structures from forms

mouse

squirrel

mouse

mouse

squirrel

squirrel

chimp

chimp

gorilla

gorilla

chimp

gorilla

  • Each structure is weighted by the number of nodes it contains:

if S inconsistent with F

otherwise

where is the number of nodes in S


Slide92 l.jpg

P(S|F): Generating structures from forms

  • Simpler forms are preferred

Chain

Grid

P(S|F)

All possible

graph structures S

A

B

C

D


Learning structural form93 l.jpg
Learning structural form

?

F: form

?

S: structure

D: data

Goal: find S,F that maximize P(S,F|D)


Learning structural form94 l.jpg
Learning structural form

F: form

  • P(S,F|D) will be high when:

    • The features in D vary smoothly over S

    • S is a simple graph (a graph with few nodes)

    • F is a simple form (a form that can generate only a few structures)

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)


Learning structural form95 l.jpg
Learning structural form

F: form

  • P(S,F|D) will be high when:

    • The features in D vary smoothly over F

    • S is a simple graph (a graph with few nodes)

    • F is a simple form (a form that can generate only a few structures)

S: structure

D: data

Aim: find S,F that maximize P(S,F|D) α P(D|S) P(S|F)P(F)


Form learning biological data l.jpg
Form learning: Biological Data

  • 33 animals, 110 features

Features

Animals



Supreme court spaeth l.jpg
Supreme Court (Spaeth)

  • Votes on 1600 cases (1987-2005)



Outline100 l.jpg
Outline

  • A high-level view of HBMs

  • A case study: Semantic knowledge

    • Property induction

    • Learning structured representations

    • Learning the abstract organizing principles of a domain



Slide102 l.jpg

Stochastic process: diffusion

mouse

squirrel

chimp

gorilla


Slide103 l.jpg

Structural form: tree

Stochastic process: diffusion

mouse

squirrel

chimp

gorilla


Slide104 l.jpg

Structural form: tree

Stochastic process: diffusion

mouse

squirrel

chimp

gorilla


Where do structural forms come from l.jpg
Where do structural forms come from?

Order

Partition

Chain

Ring

Hierarchy

Tree

Grid

Cylinder


Where do structural forms come from106 l.jpg
Where do structural forms come from?

Form

Process

Form

Process


Node replacement graph grammars l.jpg
Node-replacement graph grammars

Production

(Chain)

Derivation


Node replacement graph grammars108 l.jpg
Node-replacement graph grammars

Production

(Chain)

Derivation


Node replacement graph grammars109 l.jpg
Node-replacement graph grammars

Production

(Chain)

Derivation


Where do structural forms come from110 l.jpg
Where do structural forms come from?

Form

Process

Form

Process



When can we stop adding levels l.jpg
When can we stop adding levels?

  • When the knowledge at the top level is simple or general enough that it can be plausibly assumed to be innate.


Conclusions l.jpg
Conclusions

  • Hierarchical Bayesian models provide a unified framework which can

    • Explain how abstract knowledge is used for induction

    • Explain how abstract knowledge can be acquired


Learning abstract knowledge l.jpg
Learning abstract knowledge

Applications of hierarchical Bayesian models at this conference:

  • Semantic knowledge: Schmidt et al.

    • Learning the M-constraint

  • Syntax: Perfors et al.

    • Learning that language is hierarchically organized

  • Word learning: Kemp et al.

    • Learning the shape bias


ad