optimality in cognition and grammar
Download
Skip this Video
Download Presentation
Optimality in Cognition and Grammar

Loading in 2 Seconds...

play fullscreen
1 / 62

Optimality in Cognition and Grammar - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Optimality in Cognition and Grammar. Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures Cognitive architecture: Symbols & optimization in neural networks Optimization in grammar: HG  OT From numerical to algebraic optimization in grammar

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Optimality in Cognition and Grammar' - scott-massey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
optimality in cognition and grammar
Optimality in Cognition and Grammar

Paul Smolensky

Cognitive Science Department, Johns Hopkins University

Plan of lectures

  • Cognitive architecture: Symbols & optimization in neural networks
  • Optimization in grammar: HG  OTFrom numerical to algebraic optimization in grammar
  • OT and nativismThe initial state & neural/genomic encoding of UG
  • ?
the ics hypothesis
The ICS Hypothesis

The Integrated Connectionist/Symbolic Cognitive Architecture (ICS)

  • In higher cognitive domains, representations and fuctions are well approximated by symbolic computation
  • The Connectionist Hypothesis is correct
  • Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation
the ics architecture
ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

σ

k

k

k

k

æ

æ

t

t

æ

æ

t

t

The ICS Architecture
representation
σ

σ

k

k

æ

t

æ

t

σ/rε

k/r0

æ/r01

t/r11

[σ k [æ t]]

Representation
tensor product representations

Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1)

Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1)

i

i

i, j, k∊{A, B, X, Y}

i, j, k∊{A, B, X, Y}

jk

jk

Depth 0

Depth 1

Depth 1

Filler vectors:A, B, X, Y

Filler vectors:A, B, X, Y

Tensor Product Representations
  • Representations:

Depth 0

local tree realizations
Local tree realizations
  • Representations:
the ics isomorphism
F

Aux

V

by

G

B

Passive

LF

Patient

D

C

Output

V

P

A

P

A

Input

B

D

C

Aux

F

by

Patient

E

G

W

Agent

The ICS Isomorphism

Tensor product representations

Tensorial networks

binding by synchrony
recipient

giver

give-obj

John

Mary

book

=

Filler

Formal Role

Binding by Synchrony = 

s = r1 [fbook + fgive-obj]+ r3 [fMary + frecipient] + r2 [fgiver + fJohn]

r1 [fbook + fgive-obj]

time

give(John, book, Mary)(Shastri & Ajjanagadde 1993)

[Tesar & Smolensky 1994]

the ics architecture1
ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture
two fundamental questions
Two Fundamental Questions

Harmony maximization is satisfaction of parallel, violable constraints

2. What are the constraints?

Knowledge representation

Prior question:

1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?

representation1
σ

σ

k

k

æ

t

æ

t

σ/rε

k/r0

æ/r01

t/r11

[σ k [æ t]]

Representation
two fundamental questions1
Two Fundamental Questions

Harmony maximization is satisfaction of parallel, violable constraints

2. What are the constraints?

Knowledge representation

Prior question:

1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?

constraints
σ

k

æ

t

*violation

‘cat’

W

a[σk [æ t ]] *

Constraints

NOCODA: A syllable has no coda [Maori/French/English]

* H(a[σk [æ t]]) = –sNOCODA < 0

the ics architecture2
ƒ

G

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture

kæt

[σk[æt]]

A

the ics architecture3
ƒ

G

Constraint Interaction ??

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture

kæt

[σk[æt]]

A

constraint interaction i
Constraint Interaction I
  • ICS  Grammatical theory
    • Harmonic Grammar
      • Legendre, Miyata, Smolensky 1990 et seq.
constraint interaction i1
σ

H

= H

k

æ

t

=

H(k ,σ)

> 0

H(σ, t)

< 0

NOCODACoda/t

ONSETOnset/k

=

Constraint Interaction I

The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths

Any formal language can be so generated.

the ics architecture4
ƒ

Constraint Interaction I: HG

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture

G

kæt

[σk[æt]]

A

harmonic grammar parser
Top-down

X Y

X Y

X Y

Bottom-up

A B

B A

A B

B A

A B

B A

Harmonic Grammar Parser
  • Simple, comprehensible network
  • Simple grammar G
    • X → A B Y → B A
  • Language

Processing: Completion

simple network parser
WSimple Network Parser
  • Fully self-connected, symmetric network
  • Like previously shown network …

… Except with 12 units; representations and connections shown below

harmonic grammar parser1
Harmonic Grammar Parser

H(Y, —A) > 0H(Y, B—) > 0

  • Weight matrix for Y → B A
harmonic grammar parser2
Harmonic Grammar Parser
  • Weight matrix for X → A B
harmonic grammar parser3
Harmonic Grammar Parser
  • Weight matrix for entire grammar G
scaling up
Scaling up
  • Not yet …
  • Still conceptual obstacles to surmount
explaining productivity
Explaining Productivity
  • Approaching full-scale parsing of formal languages by neural-network Harmony maximization
  • Have other networks (like PassiveNet) that provably compute recursive functions

!productive competence

  • How to explain?
proof of productivity
= Proof of Productivity
  • Productive behavior follows mathematically from combining
    • the combinatorial structure of the vectorial representations encoding inputs & outputs

and

    • the combinatorial structure of the weight matrices encoding knowledge
explaining productivity i
Functions Semantics

+

+

PSA

Processes

Processes

Explaining Productivity I

PSA & ICS

Intra-level decomposition:[A B] ⇝{A, B}

Inter-level decomposition:[A B] ⇝{1,0,1,…,1}

ICS

explaining productivity ii
PSA

Processes

ICS

Processes

Explaining Productivity II

Functions Semantics

ICS & PSA

Intra-level decomposition:G⇝{XAB, YBA}

+

Inter-level decomposition:W(G )⇝{1,0,1,0;…}

the ics architecture5
ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture
the ics architecture6
ƒ

G

kæt

[σk[æt]]

A

Constraint InteractionII

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture
constraint interaction ii ot
Constraint Interaction II: OT
  • ICS  Grammatical theory
    • Optimality Theory
      • Prince & Smolensky 1991, 1993/2004
constraint interaction ii ot1
Constraint Interaction II: OT
  • Differential strength encoded in strict domination hierarchies (≫):
    • Every constraint has complete priority over all lower-ranked constraints (combined)
    • Approximate numerical encoding employs special (exponentially growing) weights
    • “Grammars can’t count”
constraint interaction ii ot2
Constraint Interaction II: OT
  • “Grammars can’t count”
  • Stress is on the initial heavy syllable iff the number of light syllables n obeys

No way, man

constraint interaction ii ot3
Constraint Interaction II: OT
  • Differential strength encoded in strict domination hierarchies (≫)
  • Constraints are universal(Con)
  • Candidate outputs are universal (Gen)
  • Human grammars differ only in how these constraints are ranked
    • ‘factorial typology’
  • First true contender for a formal theory of cross-linguistic typology
  • 1st innovation of OT: constraint ranking
  • 2nd innovation: ‘Faithfulness’
the faithfulness markedness dialectic
The Faithfulness/Markedness Dialectic
  • ‘cat’: /kat/  kæt*NOCODA— why?
    • FAITHFULNESSrequires pronunciation = lexical form
    • MARKEDNESS often opposes it
  • Markedness-Faithfulness dialectic diversity
    • English: FAITH≫ NOCODA
    • Polynesian: NOCODA≫ FAITH(~French)
  • Another markedness constraint M:
    • Nasal Place Agreement [‘Assimilation’] (NPA):

ŋg ≻ŋb, ŋd

velar

nd ≻ md, ŋd

coronal

mb ≻nb, ŋb

labial

the ics architecture7
ƒ

G

kæt

[σk[æt]]

A

Constraint Interaction II: OT

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture
optimality theory
Optimality Theory
  • Diversity of contributions to theoretical linguistics
    • Phonology & phonetics
    • Syntax
    • Semantics & pragmatics
    • … e.g., following lectures. Now:
  • Can strict domination be explained by connectionism?
case study
Case study
  • Syllabification in Berber
  • Plan
    • Data, then:

OT grammar

Harmonic Grammar

Network

syllabification in berber
Syllabification in Berber
  • Dell & Elmedlaoui, 1985: Imdlawn Tashlhit Berber
  • Syllable nucleus can be any segment
  • But driven by universal preference for nuclei to be highest-sonority segments
ot grammar brbr ot
OT Grammar: BrbrOT

HNUC A syllable nucleus is sonorous

ONSET A syllable has an onset

Strict Domination

Prince & Smolensky ’93/04

harmonic grammar brbr hg
Harmonic Grammar: BrbrHG
  • HNUC A syllable nucleus is sonorous

Nucleus of sonoritys: Harmony = 2s1

s {1, 2, …, 8} ~ {t, d, f, z, n, l, i, a}

  • ONSET *VV Harmony = 28
  • Theorem. The global Harmony maxima are the correct Berber core syllabifications

[of Dell & Elmedlaoui; no sonority plateaux, as in OT analysis, here & henceforth]

brbrnet s global harmony maximum is the correct parse
BrbrNet’s Global Harmony Maximum is the correct parse
  • Contrasts with Goldsmith’s Dynamic Linear Models (Goldsmith & Larson ’90; Prince ’93)

For a given input string, a state of BrbrNet is a global Harmony maximum if and only if it realizes the syllabification produced by the serial Dell-Elmedlaoui algorithm

brbrnet s search dynamics
BrbrNet’s Search Dynamics

Greedy local optimization

  • at each moment, make a small change of state so as to maximally increase Harmony
  • (gradient ascent: mountain climbing in fog)
  • guaranteed to construct a local maximum
the hardest case 12378 t bx ya
The Hardest Case: 12378/t́.bx́.yá*

* hypothetical, but compare t́.bx́.lá.kḱw‘she even behaved as a miser’ [tbx́.lákkw]

subsymbolic parsing
t

.b

x

.i

a

Subsymbolic Parsing

V

V

V

V

V

V

V

V

parsing sonority profile 8121345787 a tb kf zn ya y
8

1

2

1

3

4

5

7

8

7

Parsing sonority profile 8121345787 á.tb́.kf.́zń.yáy

Finds best of infinitely many representations:1024 corners/parses

brbrnet has many local harmony maxima
BrbrNet has many Local Harmony Maxima

An output pattern in BrbrNet is a local Harmony maximum if and only if it realizes a sequence of legal Berber syllables (i.e., an output of Gen)

That is, every activation value is 0 or 1, and the sequence of values is that realizing a sequence of substrings taken from the syllable inventory {CV, CVC, #V, #VC},

where C = 0, V = 1 and # = word edge

Greedy optimization avoids local maxima: why?

hg ot s strict domination
HG  OT’s Strict Domination
  • Strict Domination: Baffling from a connectionist perspective?
  • Explicable from a connectionist perspective?
    • Exponential BrbrNet escapes local H maxima
    • Linear BrbrNet does not
linear brbrnet makes errors
Linear BrbrNet makes errors
  • (~ Goldsmith-Larson network)
  • Error: /12378/  .123.78. (correct: .1.23.78.)
subsymbolic harmony optimization can be stochastic
Subsymbolic Harmony optimization can be stochastic
  • The search for an optimal state can employ randomness
  • Equations for units’ activation values have random terms
    • pr(a) ∝eH(a)/T
    • T (‘temperature’) ~ randomness  0 during search
    • Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)
  • Can guarantee computation of global optimum in principle
  • In practice: how fast? Exponential vs. linear BrbrNet
the ics architecture8
ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture
ad