Paul Smolensky Cognitive Science Department Johns Hopkins University

The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University with: Géraldine Legendre Donald Mathis Melanie Soderstrom Alan Prince Suzanne Stevenson Peter Jusczyk†

Advertisement The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky & Géraldine Legendre • Blackwell 2003 (??) • Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture • Case study in formalist multidisciplinary cognitive science (point out imports/exports of ICS)

Cognitive Science 101 • Computation is cognition • But what type? • Fundamental question of research on the human cognitive architecture

Table of Contents

Table of Contents Implications of architecture for nativism • Learnability • Initial state • Experimental test: Infants • (Genomic encoding of UG)

–λ (–0.9) a1 a2 i1 (0.6) i2 (0.5) Processing Algorithm: Activation • Computational neuroscience  ICS • Key sources • Hopfield 1982, 1984 • Cohen and Grossberg 1983 • Hinton and Sejnowski 1983, 1986 • Smolensky 1983, 1986 • Geman and Geman 1984 • Golden 1986, 1988 Processing — spreading activation — is optimization: Harmony maximization

a1 and a2must not be simultaneously active (strength: λ) Harmony maximization is satisfaction of parallel, violable constraints –λ (–0.9) a1 a2 a1must be active (strength: 0.6) a2must be active (strength: 0.5) i1 (0.6) i2 (0.5) Optimal compromise: 0.79 –0.21 Function: Optimization • Cognitive psychology  ICS • Key sources: • Hinton & Anderson 1981 • Rumelhart, McClelland, & the PDP Group 1986 CONFLICT!!

Representation • Symbolic theory  ICS • Complex symbol structures • Generative linguistics  ICS • Particular linguistic representations • PDP connectionism  ICS • Distributed activation patterns • ICS: • realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)

σ k æ t σ/rε k/r0 æ/r01 t/r11 [σ k [æ t]] Representation

σ k æ t *violation *violation W a[σk [æ t ]] * Knowledge: Constraints NOCODA: A syllable has no coda * H(a[σk [æ t]) = –sNOCODA < 0

Constraint Interaction I • ICS  Grammatical theory • Harmonic Grammar • Legendre, Miyata, Smolensky 1990 et seq.

σ H k = H æ t = H(k/,σ) H(σ,\t) NOCODACoda/t ONSETOnset/k = Constraint Interaction I The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths Any formal language can be so generated.

Top-down X Y X Y X Y Bottom-up A B B A A B B A A B B A Harmonic Grammar Parser • Simple, comprehensible network • Simple grammar G • X → A B Y → B A • Language Completion

i i, j, k∊{A, B, X, Y} jk Depth 0 Depth 1 ⑤ ⑨ ① Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑦ ⑪ ③ ⑧ ⑫ ④ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Harmonic Grammar Parser • Representations:

i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} • Representations: jk jk Depth 0 Depth 1 Depth 0 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Role vectors:rε = 1r0 = (1 1) r1 = (1 –1) Harmonic Grammar Parser ⊗

i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} • Representations: jk jk Depth 0 Depth 1 Depth 0 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ ⊗ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Harmonic Grammar Parser

Harmonic Grammar Parser • Representations:

Harmonic Grammar Parser H(Y, B—) > 0H(Y, —A) > 0 • Weight matrix for Y → B A

Harmonic Grammar Parser • Weight matrix for entire grammar G

Bottom-up Processing

Top-down Processing

Scaling up • Not yet … • Still conceptual obstacles to surmount

Constraint Interaction II: OT • ICS  Grammatical theory • Optimality Theory • Prince & Smolensky 1993

Constraint Interaction II: OT • Differential strength encoded in strict domination hierarchies: • Every constraint has complete priority over all lower-ranked constraints (combined) • Approximate numerical encoding employs special (exponentially growing) weights

Constraint Interaction II: OT • “Grammars can’t count” • Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man

Constraint Interaction II: OT • Constraints are universal • Human grammars differ only in how these constraints are ranked • ‘factorial typology’ • First true contender for a formal theory of cross-linguistic typology

The Faithfulness / Markedness Dialectic • ‘cat’: /kat/  kæt*NOCODA— why? • FAITHFULNESSrequires identity • MARKEDNESS often opposes it • Markedness-Faithfulness dialectic diversity • English: NOCODA≫ FAITH • Polynesian: FAITH≫ NOCODA(~French) • Another markedness constraint M: • Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻nb, ŋb nd ≻ md, ŋd ŋg ≻ŋb, ŋd labial coronal velar

Nativism I: Learnability • Learning algorithm • Provably correct and efficient (under strong assumptions) • Sources: • Tesar 1995 et seq. • Tesar & Smolensky 1993, …, 2000 • If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E

☺☞ Constraint Demotion Learning • If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E Correctly handles difficult case: multiple violations in E

Nativism I: Learnability • M ≫ F is learnable with /in+possible/→impossible • ‘not’ = in- except when followed by … • “exception that proves the rule”: M = NPA • M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp: √M and√F, no M vs. F conflict, no evidence for their ranking • Thus must have M ≫ F in the initial state, ℌ0

Nativism II: Experimental Test • Collaborators • Peter Jusczyk • Theresa Allocco • (Elliott Moreton, Karen Arnold) • Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time • More harmonic: • M ≻ *M, when equal on F • F ≻ *F, when equal on M • When must choose one or the other, more harmonic to satisfy M: M ≫ F • M = Nasal Place Assimilation (NPA)

um...b...umb um...b...iŋgu • iŋ…..gu...iŋgu vs. iŋ…..gu…umb • … … Experimental Paradigm • Headturn Preference Procedure (Kemler Nelson et al. ‘95; Jusczyk ‘97) • X/Y/XYparadigm (P. Jusczyk) • un...b...umb • un...b...umb *FNP ℜ p = .006 ∃FAITH • Highly general paradigm: Main result

4.5 Months (NPA)

Nativism III: UGenome • Can we combine • Connectionist realization of harmonic grammar • OT’s characterization of UG to examine the biological plausibility of UG as innate knowledge? • Collaborators • Melanie Soderstrom • Donald Mathis • Oren Schwartz

Nativism III: UGenome • The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device • Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’ • Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!

Summary • Described an attempt to integrate • Connectionist theory of mental processes (computational neuroscience, cognitive psychology) • Symbolic theory of • Mental functions (philosophy, linguistics) • Representations • General structure (philosophy, AI) • Specific structure (linguistics) • Informs theory of UG • Form, content • Genetic encoding

The Problem • No concrete examples of such a LAD exist • Even highly simplified cases pose a hard problem: How can genes— which regulate production of proteins — encode symbolic principles of grammar? • Test preparation: Syllable Theory

Grammar Innate Constraints Abstract Neural Network Abstract Genome B = A instantiates B = A encodes B B A Biological Neural Network A Approach: Multiple Levels of Encoding Biological Genome

Basic syllabification: Function • ƒ: /underlying form/  [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.]

Basic syllabification: Function • ƒ: /underlying form/  [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.] • Basic CV Syllable Structure Theory • Prince & Smolensky 1993: Chapter 6 • ‘Basic’ — No more than one segment per syllable position: .(C)V(C).

Syllabification: Constraints (Con) • PARSE: Every element in the input corresponds to an element in the output • FILLV/C: Every output V/C segment corresponds to an input V/C segment • ONSET: No V without a preceding C • NOCODA: No C without a following V

C V SAnet architecture • /C1 C2/  [C1 V C2] /C1 C2 / [ C1 V C2 ]

s2 i  2 1 s1 Local: fixed, gene-tically determined Content of constraint 1 Global: variable during learning Strength of constraint 1  Network weight: Network input: ι = WΨ a Connection substructure

C 1 1 1 1 V 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 PARSE • All connection coefficients are +2

C V ONSET • All connection coefficients are  1

Crucial Open Question(Truth in Advertising) • Relation between strict domination and neural networks? • Apparently not a problem in the case of the CV Theory

To be encoded • How many different kinds of units are there? • What information is necessary (from the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it? • How are constraints initially specified? • How are they maintained through the learning process?

Paul Smolensky Cognitive Science Department Johns Hopkins University

Paul Smolensky Cognitive Science Department Johns Hopkins University

Presentation Transcript

Johns Hopkins CPC4

Johns Hopkins University: The Research University Model

Johns Hopkins University Business Plan Competition

Johns Hopkins University Business Plan Competition

Joel R. Tolman Department of Chemistry Johns Hopkins University

Johns Hopkins Hospital

James F Philbin, PhD Johns Hopkins University

Paul Hopkins

Research Administration at Johns Hopkins University

Johns Hopkins Medical Institutions Department of Pathology

Data-Intensive Science at Johns Hopkins University

Wei Liu The Johns Hopkins University

Johns Hopkins University Department of Biomedical Engineering

Shane Bergsma Johns Hopkins University

Charles Flexner, MD Johns Hopkins University

Johns Hopkins University

Christopher Dreisbach, Ph.D. Johns Hopkins University

Johns Hopkins University Applied Physics Laboratory

Data-Intensive Science at Johns Hopkins University

Cognitive Science Department

Michael Kazhdan Johns Hopkins University