- 77 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about ' Optimality in Cognition and Grammar' - scott-massey

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Optimality in Cognition and Grammar

Paul Smolensky

Cognitive Science Department, Johns Hopkins University

Plan of lectures

- Cognitive architecture: Symbols & optimization in neural networks
- Optimization in grammar: HG OTFrom numerical to algebraic optimization in grammar
- OT and nativismThe initial state & neural/genomic encoding of UG
- ?

The ICS Hypothesis

The Integrated Connectionist/Symbolic Cognitive Architecture (ICS)

- In higher cognitive domains, representations and fuctions are well approximated by symbolic computation
- The Connectionist Hypothesis is correct
- Thus, cognitive theory must supply a computational reduction of symbolic functions to PDP computation

⊗

⊗

Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1)

Role vectors:rε = (1; 0 0) r0 = (0; 1 1) r1 = (0; 1 1)

i

i

i, j, k∊{A, B, X, Y}

i, j, k∊{A, B, X, Y}

jk

jk

Depth 0

Depth 1

Depth 1

⑤

⑨

①

⑤

⑨

①

①

Filler vectors:A, B, X, Y

Filler vectors:A, B, X, Y

⑩

②

⑥

⑩

②

②

⑥

⑦

⑪

③

⑦

⑪

③

③

⑧

⑫

④

⑧

⑫

④

④

Tensor Product Representations- Representations:

Depth 0

⊗

Local tree realizations

- Representations:

F

Aux

V

by

G

B

Passive

LF

Patient

D

C

Output

V

P

A

P

A

Input

B

D

C

Aux

F

by

Patient

E

G

W

Agent

The ICS IsomorphismTensor product representations

Tensorial networks

recipient

giver

give-obj

John

Mary

book

=

Filler

Formal Role

Binding by Synchrony = s = r1 [fbook + fgive-obj]+ r3 [fMary + frecipient] + r2 [fgiver + fJohn]

r1 [fbook + fgive-obj]

time

give(John, book, Mary)(Shastri & Ajjanagadde 1993)

[Tesar & Smolensky 1994]

Two Fundamental Questions

Harmony maximization is satisfaction of parallel, violable constraints

2. What are the constraints?

Knowledge representation

Prior question:

1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?

Two Fundamental Questions

Harmony maximization is satisfaction of parallel, violable constraints

2. What are the constraints?

Knowledge representation

Prior question:

1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?

σ

k

æ

t

*violation

‘cat’

W

a[σk [æ t ]] *

ConstraintsNOCODA: A syllable has no coda [Maori/French/English]

* H(a[σk [æ t]]) = –sNOCODA < 0

Constraint Interaction I

- ICS Grammatical theory
- Harmonic Grammar
- Legendre, Miyata, Smolensky 1990 et seq.

σ

H

= H

k

æ

t

=

H(k ,σ)

> 0

H(σ, t)

< 0

NOCODACoda/t

ONSETOnset/k

=

Constraint Interaction IThe grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths

Any formal language can be so generated.

Top-down

X Y

X Y

X Y

Bottom-up

A B

B A

A B

B A

A B

B A

Harmonic Grammar Parser- Simple, comprehensible network
- Simple grammar G
- X → A B Y → B A
- Language

Processing: Completion

WSimple Network Parser

- Fully self-connected, symmetric network
- Like previously shown network …

… Except with 12 units; representations and connections shown below

Harmonic Grammar Parser

- Weight matrix for X → A B

Harmonic Grammar Parser

- Weight matrix for entire grammar G

Scaling up

- Not yet …
- Still conceptual obstacles to surmount

Explaining Productivity

- Approaching full-scale parsing of formal languages by neural-network Harmony maximization
- Have other networks (like PassiveNet) that provably compute recursive functions

!productive competence

- How to explain?

= Proof of Productivity

- Productive behavior follows mathematically from combining
- the combinatorial structure of the vectorial representations encoding inputs & outputs

and

- the combinatorial structure of the weight matrices encoding knowledge

Functions Semantics

+

+

PSA

Processes

Processes

Explaining Productivity IPSA & ICS

Intra-level decomposition:[A B] ⇝{A, B}

Inter-level decomposition:[A B] ⇝{1,0,1,…,1}

ICS

PSA

Processes

ICS

Processes

Explaining Productivity IIFunctions Semantics

ICS & PSA

Intra-level decomposition:G⇝{XAB, YBA}

+

Inter-level decomposition:W(G )⇝{1,0,1,0;…}

Constraint Interaction II: OT

- ICS Grammatical theory
- Optimality Theory
- Prince & Smolensky 1991, 1993/2004

Constraint Interaction II: OT

- Differential strength encoded in strict domination hierarchies (≫):
- Every constraint has complete priority over all lower-ranked constraints (combined)
- Approximate numerical encoding employs special (exponentially growing) weights
- “Grammars can’t count”

Constraint Interaction II: OT

- “Grammars can’t count”

- Stress is on the initial heavy syllable iff the number of light syllables n obeys

No way, man

Constraint Interaction II: OT

- Differential strength encoded in strict domination hierarchies (≫)
- Constraints are universal(Con)
- Candidate outputs are universal (Gen)
- Human grammars differ only in how these constraints are ranked
- ‘factorial typology’
- First true contender for a formal theory of cross-linguistic typology
- 1st innovation of OT: constraint ranking
- 2nd innovation: ‘Faithfulness’

The Faithfulness/Markedness Dialectic

- ‘cat’: /kat/ kæt*NOCODA— why?
- FAITHFULNESSrequires pronunciation = lexical form
- MARKEDNESS often opposes it
- Markedness-Faithfulness dialectic diversity
- English: FAITH≫ NOCODA
- Polynesian: NOCODA≫ FAITH(~French)
- Another markedness constraint M:
- Nasal Place Agreement [‘Assimilation’] (NPA):

ŋg ≻ŋb, ŋd

velar

nd ≻ md, ŋd

coronal

mb ≻nb, ŋb

labial

Optimality Theory

- Diversity of contributions to theoretical linguistics
- Phonology & phonetics
- Syntax
- Semantics & pragmatics
- … e.g., following lectures. Now:
- Can strict domination be explained by connectionism?

Syllabification in Berber

- Dell & Elmedlaoui, 1985: Imdlawn Tashlhit Berber
- Syllable nucleus can be any segment
- But driven by universal preference for nuclei to be highest-sonority segments

OT Grammar: BrbrOT

HNUC A syllable nucleus is sonorous

ONSET A syllable has an onset

Strict Domination

Prince & Smolensky ’93/04

Harmonic Grammar: BrbrHG

- HNUC A syllable nucleus is sonorous

Nucleus of sonoritys: Harmony = 2s1

s {1, 2, …, 8} ~ {t, d, f, z, n, l, i, a}

- ONSET *VV Harmony = 28
- Theorem. The global Harmony maxima are the correct Berber core syllabifications

[of Dell & Elmedlaoui; no sonority plateaux, as in OT analysis, here & henceforth]

BrbrNet’s Global Harmony Maximum is the correct parse

- Contrasts with Goldsmith’s Dynamic Linear Models (Goldsmith & Larson ’90; Prince ’93)

For a given input string, a state of BrbrNet is a global Harmony maximum if and only if it realizes the syllabification produced by the serial Dell-Elmedlaoui algorithm

BrbrNet’s Search Dynamics

Greedy local optimization

- at each moment, make a small change of state so as to maximally increase Harmony
- (gradient ascent: mountain climbing in fog)
- guaranteed to construct a local maximum

The Hardest Case: 12378/t́.bx́.yá*

* hypothetical, but compare t́.bx́.lá.kḱw‘she even behaved as a miser’ [tbx́.lákkw]

8

1

2

1

3

4

5

7

8

7

Parsing sonority profile 8121345787 á.tb́.kf.́zń.yáyFinds best of infinitely many representations:1024 corners/parses

BrbrNet has many Local Harmony Maxima

An output pattern in BrbrNet is a local Harmony maximum if and only if it realizes a sequence of legal Berber syllables (i.e., an output of Gen)

That is, every activation value is 0 or 1, and the sequence of values is that realizing a sequence of substrings taken from the syllable inventory {CV, CVC, #V, #VC},

where C = 0, V = 1 and # = word edge

Greedy optimization avoids local maxima: why?

HG OT’s Strict Domination

- Strict Domination: Baffling from a connectionist perspective?
- Explicable from a connectionist perspective?
- Exponential BrbrNet escapes local H maxima
- Linear BrbrNet does not

Linear BrbrNet makes errors

- (~ Goldsmith-Larson network)
- Error: /12378/ .123.78. (correct: .1.23.78.)

Subsymbolic Harmony optimization can be stochastic

- The search for an optimal state can employ randomness
- Equations for units’ activation values have random terms
- pr(a) ∝eH(a)/T
- T (‘temperature’) ~ randomness 0 during search
- Boltzmann Machine (Hinton and Sejnowski 1983, 1986); Harmony Theory (Smolensky 1983, 1986)
- Can guarantee computation of global optimum in principle
- In practice: how fast? Exponential vs. linear BrbrNet

Stochastic BrbrNet:Exponential can succeed ‘fast’

5-run average

Download Presentation

Connecting to Server..