Optimality in cognition and grammar
Download
1 / 62

Optimality in Cognition and Grammar - PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on

Optimality in Cognition and Grammar. Paul Smolensky Cognitive Science Department, Johns Hopkins University Plan of lectures Cognitive architecture: Symbols & optimization in neural networks Optimization in grammar: HG  OT From numerical to algebraic optimization in grammar

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Optimality in Cognition and Grammar' - scott-massey


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Optimality in cognition and grammar
Optimality in Cognition and Grammar

Paul Smolensky

Cognitive Science Department, Johns Hopkins University

Plan of lectures

  • Cognitive architecture: Symbols & optimization in neural networks

  • Optimization in grammar: HG  OTFrom numerical to algebraic optimization in grammar

  • OT and nativismThe initial state & neural/genomic encoding of UG

  • ?


The ics hypothesis
The ICS Hypothesis

The Integrated Connectionist/Symbolic Cognitive Architecture (ICS)

  • In higher cognitive domains, representations and fuctions are well approximated by symbolic computation

  • The Connectionist Hypothesis is correct

  • Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation



The ics architecture

ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

σ

k

k

k

k

æ

æ

t

t

æ

æ

t

t

The ICS Architecture


Representation

σ

σ

k

k

æ

t

æ

t

σ/rε

k/r0

æ/r01

t/r11

[σ k [æ t]]

Representation


Tensor product representations

Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1)

Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1)

i

i

i, j, k∊{A, B, X, Y}

i, j, k∊{A, B, X, Y}

jk

jk

Depth 0

Depth 1

Depth 1

Filler vectors:A, B, X, Y

Filler vectors:A, B, X, Y

Tensor Product Representations

  • Representations:

Depth 0


Local tree realizations
Local tree realizations

  • Representations:


The ics isomorphism

F

Aux

V

by

G

B

Passive

LF

Patient

D

C

Output

V

P

A

P

A

Input

B

D

C

Aux

F

by

Patient

E

G

W

Agent

The ICS Isomorphism

Tensor product representations

Tensorial networks



Binding by synchrony

recipient

giver

give-obj

John

Mary

book

=

Filler

Formal Role

Binding by Synchrony = 

s = r1 [fbook + fgive-obj]+ r3 [fMary + frecipient] + r2 [fgiver + fJohn]

r1 [fbook + fgive-obj]

time

give(John, book, Mary)(Shastri & Ajjanagadde 1993)

[Tesar & Smolensky 1994]


The ics architecture1

ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture


Two fundamental questions
Two Fundamental Questions

Harmony maximization is satisfaction of parallel, violable constraints

2. What are the constraints?

Knowledge representation

Prior question:

1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?


Representation1

σ

σ

k

k

æ

t

æ

t

σ/rε

k/r0

æ/r01

t/r11

[σ k [æ t]]

Representation


Two fundamental questions1
Two Fundamental Questions

Harmony maximization is satisfaction of parallel, violable constraints

2. What are the constraints?

Knowledge representation

Prior question:

1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?


Constraints

σ

k

æ

t

*violation

‘cat’

W

a[σk [æ t ]] *

Constraints

NOCODA: A syllable has no coda [Maori/French/English]

* H(a[σk [æ t]]) = –sNOCODA < 0


The ics architecture2

ƒ

G

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture

kæt

[σk[æt]]

A


The ics architecture3

ƒ

G

Constraint Interaction ??

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture

kæt

[σk[æt]]

A


Constraint interaction i
Constraint Interaction I

  • ICS  Grammatical theory

    • Harmonic Grammar

      • Legendre, Miyata, Smolensky 1990 et seq.


Constraint interaction i1

σ

H

= H

k

æ

t

=

H(k ,σ)

> 0

H(σ, t)

< 0

NOCODACoda/t

ONSETOnset/k

=

Constraint Interaction I

The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths

Any formal language can be so generated.


The ics architecture4

ƒ

Constraint Interaction I: HG

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture

G

kæt

[σk[æt]]

A


Harmonic grammar parser

Top-down

X Y

X Y

X Y

Bottom-up

A B

B A

A B

B A

A B

B A

Harmonic Grammar Parser

  • Simple, comprehensible network

  • Simple grammar G

    • X → A B Y → B A

  • Language

Processing: Completion


Simple network parser

W

Simple Network Parser

  • Fully self-connected, symmetric network

  • Like previously shown network …

… Except with 12 units; representations and connections shown below


Harmonic grammar parser1
Harmonic Grammar Parser

H(Y, —A) > 0H(Y, B—) > 0

  • Weight matrix for Y → B A


Harmonic grammar parser2
Harmonic Grammar Parser

  • Weight matrix for X → A B


Harmonic grammar parser3
Harmonic Grammar Parser

  • Weight matrix for entire grammar G


Bottom up processing

X Y

A B

B A

Bottom-up Processing


Top down processing

X Y

A B

B A

Top-down Processing


Scaling up
Scaling up

  • Not yet …

  • Still conceptual obstacles to surmount


Explaining productivity
Explaining Productivity

  • Approaching full-scale parsing of formal languages by neural-network Harmony maximization

  • Have other networks (like PassiveNet) that provably compute recursive functions

    !productive competence

  • How to explain?




Proof of productivity
= Proof of Productivity

  • Productive behavior follows mathematically from combining

    • the combinatorial structure of the vectorial representations encoding inputs & outputs

      and

    • the combinatorial structure of the weight matrices encoding knowledge


Explaining productivity i

Functions Semantics

+

+

PSA

Processes

Processes

Explaining Productivity I

PSA & ICS

Intra-level decomposition:[A B] ⇝{A, B}

Inter-level decomposition:[A B] ⇝{1,0,1,…,1}

ICS


Explaining productivity ii

PSA

Processes

ICS

Processes

Explaining Productivity II

Functions Semantics

ICS & PSA

Intra-level decomposition:G⇝{XAB, YBA}

+

Inter-level decomposition:W(G )⇝{1,0,1,0;…}


The ics architecture5

ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture


The ics architecture6

ƒ

G

kæt

[σk[æt]]

A

Constraint InteractionII

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture


Constraint interaction ii ot
Constraint Interaction II: OT

  • ICS  Grammatical theory

    • Optimality Theory

      • Prince & Smolensky 1991, 1993/2004


Constraint interaction ii ot1
Constraint Interaction II: OT

  • Differential strength encoded in strict domination hierarchies (≫):

    • Every constraint has complete priority over all lower-ranked constraints (combined)

    • Approximate numerical encoding employs special (exponentially growing) weights

    • “Grammars can’t count”


Constraint interaction ii ot2
Constraint Interaction II: OT

  • “Grammars can’t count”

  • Stress is on the initial heavy syllable iff the number of light syllables n obeys

No way, man


Constraint interaction ii ot3
Constraint Interaction II: OT

  • Differential strength encoded in strict domination hierarchies (≫)

  • Constraints are universal(Con)

  • Candidate outputs are universal (Gen)

  • Human grammars differ only in how these constraints are ranked

    • ‘factorial typology’

  • First true contender for a formal theory of cross-linguistic typology

  • 1st innovation of OT: constraint ranking

  • 2nd innovation: ‘Faithfulness’


The faithfulness markedness dialectic
The Faithfulness/Markedness Dialectic

  • ‘cat’: /kat/  kæt*NOCODA— why?

    • FAITHFULNESSrequires pronunciation = lexical form

    • MARKEDNESS often opposes it

  • Markedness-Faithfulness dialectic diversity

    • English: FAITH≫ NOCODA

    • Polynesian: NOCODA≫ FAITH(~French)

  • Another markedness constraint M:

    • Nasal Place Agreement [‘Assimilation’] (NPA):

ŋg ≻ŋb, ŋd

velar

nd ≻ md, ŋd

coronal

mb ≻nb, ŋb

labial


The ics architecture7

ƒ

G

kæt

[σk[æt]]

A

Constraint Interaction II: OT

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture


Optimality theory
Optimality Theory

  • Diversity of contributions to theoretical linguistics

    • Phonology & phonetics

    • Syntax

    • Semantics & pragmatics

    • … e.g., following lectures. Now:

  • Can strict domination be explained by connectionism?


Case study
Case study

  • Syllabification in Berber

  • Plan

    • Data, then:

OT grammar

Harmonic Grammar

Network


Syllabification in berber
Syllabification in Berber

  • Dell & Elmedlaoui, 1985: Imdlawn Tashlhit Berber

  • Syllable nucleus can be any segment

  • But driven by universal preference for nuclei to be highest-sonority segments



Ot grammar brbr ot
OT Grammar: BrbrOT

HNUC A syllable nucleus is sonorous

ONSET A syllable has an onset

Strict Domination

Prince & Smolensky ’93/04


Harmonic grammar brbr hg
Harmonic Grammar: BrbrHG

  • HNUC A syllable nucleus is sonorous

    Nucleus of sonoritys: Harmony = 2s1

    s {1, 2, …, 8} ~ {t, d, f, z, n, l, i, a}

  • ONSET *VV Harmony = 28

  • Theorem. The global Harmony maxima are the correct Berber core syllabifications

    [of Dell & Elmedlaoui; no sonority plateaux, as in OT analysis, here & henceforth]


Brbrnet realizes brbr hg

ONSET

HNUC

BrbrNet realizes BrbrHG


Brbrnet s global harmony maximum is the correct parse
BrbrNet’s Global Harmony Maximum is the correct parse

  • Contrasts with Goldsmith’s Dynamic Linear Models (Goldsmith & Larson ’90; Prince ’93)

    For a given input string, a state of BrbrNet is a global Harmony maximum if and only if it realizes the syllabification produced by the serial Dell-Elmedlaoui algorithm


Brbrnet s search dynamics
BrbrNet’s Search Dynamics

Greedy local optimization

  • at each moment, make a small change of state so as to maximally increase Harmony

  • (gradient ascent: mountain climbing in fog)

  • guaranteed to construct a local maximum


Txznt tx zn t you sing stored

t

x

z

n

t

/txznt/  tx́.zńt ‘yousing stored’

H


The hardest case 12378 t bx ya
The Hardest Case: 12378/t́.bx́.yá*

* hypothetical, but compare t́.bx́.lá.kḱw‘she even behaved as a miser’ [tbx́.lákkw]


Subsymbolic parsing

t

.b

x

.i

a

Subsymbolic Parsing

V

V

V

V

V

V

V

V


Parsing sonority profile 8121345787 a tb kf zn ya y

8

1

2

1

3

4

5

7

8

7

Parsing sonority profile 8121345787 á.tb́.kf.́zń.yáy

Finds best of infinitely many representations:1024 corners/parses


Brbrnet has many local harmony maxima
BrbrNet has many Local Harmony Maxima

An output pattern in BrbrNet is a local Harmony maximum if and only if it realizes a sequence of legal Berber syllables (i.e., an output of Gen)

That is, every activation value is 0 or 1, and the sequence of values is that realizing a sequence of substrings taken from the syllable inventory {CV, CVC, #V, #VC},

where C = 0, V = 1 and # = word edge

Greedy optimization avoids local maxima: why?


Hg ot s strict domination
HG  OT’s Strict Domination

  • Strict Domination: Baffling from a connectionist perspective?

  • Explicable from a connectionist perspective?

    • Exponential BrbrNet escapes local H maxima

    • Linear BrbrNet does not


Linear brbrnet makes errors
Linear BrbrNet makes errors

  • (~ Goldsmith-Larson network)

  • Error: /12378/  .123.78. (correct: .1.23.78.)


Subsymbolic harmony optimization can be stochastic
Subsymbolic Harmony optimization can be stochastic

  • The search for an optimal state can employ randomness

  • Equations for units’ activation values have random terms

    • pr(a) ∝eH(a)/T

    • T (‘temperature’) ~ randomness  0 during search

    • Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)

  • Can guarantee computation of global optimum in principle

  • In practice: how fast? Exponential vs. linear BrbrNet


Stochastic brbrnet exponential can succeed fast
Stochastic BrbrNet:Exponential can succeed ‘fast’

5-run average


Stochastic brbrnet linear can t succeed fast

5-run average

Stochastic BrbrNet:Linear can’t succeed ‘fast’


The ics architecture8

ƒ

G

kæt

[σk[æt]]

A

σ

σ

σ

k

k

k

æ

t

æ

æ

t

t

The ICS Architecture


ad