630 likes | 743 Views
The Harmonic Mind. Paul Smolensky Cognitive Science Department Johns Hopkins University. with:. G é raldine Legendre Donald Mathis Melanie Soderstrom. Alan Prince Suzanne Stevenson Peter Jusczyk †. Advertisement. The Harmonic Mind:
E N D
The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University with: Géraldine Legendre Donald Mathis Melanie Soderstrom Alan Prince Suzanne Stevenson Peter Jusczyk†
Advertisement The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky & Géraldine Legendre • Blackwell 2003 (??) • Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture • Case study in formalist multidisciplinary cognitive science (point out imports/exports of ICS)
Cognitive Science 101 • Computation is cognition • But what type? • Fundamental question of research on the human cognitive architecture
Table of Contents Implications of architecture for nativism • Learnability • Initial state • Experimental test: Infants • (Genomic encoding of UG)
–λ (–0.9) a1 a2 i1 (0.6) i2 (0.5) Processing Algorithm: Activation • Computational neuroscience ICS • Key sources • Hopfield 1982, 1984 • Cohen and Grossberg 1983 • Hinton and Sejnowski 1983, 1986 • Smolensky 1983, 1986 • Geman and Geman 1984 • Golden 1986, 1988 Processing — spreading activation — is optimization: Harmony maximization
a1 and a2must not be simultaneously active (strength: λ) Harmony maximization is satisfaction of parallel, violable constraints –λ (–0.9) a1 a2 a1must be active (strength: 0.6) a2must be active (strength: 0.5) i1 (0.6) i2 (0.5) Optimal compromise: 0.79 –0.21 Function: Optimization • Cognitive psychology ICS • Key sources: • Hinton & Anderson 1981 • Rumelhart, McClelland, & the PDP Group 1986 CONFLICT!!
Representation • Symbolic theory ICS • Complex symbol structures • Generative linguistics ICS • Particular linguistic representations • PDP connectionism ICS • Distributed activation patterns • ICS: • realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)
σ k æ t σ/rε k/r0 æ/r01 t/r11 [σ k [æ t]] Representation
σ k æ t *violation *violation W a[σk [æ t ]] * Knowledge: Constraints NOCODA: A syllable has no coda * H(a[σk [æ t]) = –sNOCODA < 0
Constraint Interaction I • ICS Grammatical theory • Harmonic Grammar • Legendre, Miyata, Smolensky 1990 et seq.
σ H k = H æ t = H(k/,σ) H(σ,\t) NOCODACoda/t ONSETOnset/k = Constraint Interaction I The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths Any formal language can be so generated.
Top-down X Y X Y X Y Bottom-up A B B A A B B A A B B A Harmonic Grammar Parser • Simple, comprehensible network • Simple grammar G • X → A B Y → B A • Language Completion
i i, j, k∊{A, B, X, Y} jk Depth 0 Depth 1 ⑤ ⑨ ① Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑦ ⑪ ③ ⑧ ⑫ ④ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Harmonic Grammar Parser • Representations:
i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} • Representations: jk jk Depth 0 Depth 1 Depth 0 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Role vectors:rε = 1r0 = (1 1) r1 = (1 –1) Harmonic Grammar Parser ⊗
i i i, j, k∊{A, B, X, Y} i, j, k∊{A, B, X, Y} • Representations: jk jk Depth 0 Depth 1 Depth 0 Depth 1 ⑤ ⑨ ① ⑤ ⑨ ① ① Filler vectors:A, B, X, Y Filler vectors:A, B, X, Y ⑩ ② ⑥ ⑩ ② ② ⑥ ⑦ ⑪ ③ ⑦ ⑪ ③ ③ ⑧ ⑫ ④ ⑧ ⑫ ④ ④ ⊗ ⊗ Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Role vectors:rε = 1 r0 = (1 1) r1 = (1 –1) Harmonic Grammar Parser
Harmonic Grammar Parser • Representations:
Harmonic Grammar Parser H(Y, B—) > 0H(Y, —A) > 0 • Weight matrix for Y → B A
Harmonic Grammar Parser • Weight matrix for entire grammar G
Scaling up • Not yet … • Still conceptual obstacles to surmount
Constraint Interaction II: OT • ICS Grammatical theory • Optimality Theory • Prince & Smolensky 1993
Constraint Interaction II: OT • Differential strength encoded in strict domination hierarchies: • Every constraint has complete priority over all lower-ranked constraints (combined) • Approximate numerical encoding employs special (exponentially growing) weights
Constraint Interaction II: OT • “Grammars can’t count” • Stress is on the initial heavy syllable iff the number of light syllables n obeys No way, man
Constraint Interaction II: OT • Constraints are universal • Human grammars differ only in how these constraints are ranked • ‘factorial typology’ • First true contender for a formal theory of cross-linguistic typology
The Faithfulness / Markedness Dialectic • ‘cat’: /kat/ kæt*NOCODA— why? • FAITHFULNESSrequires identity • MARKEDNESS often opposes it • Markedness-Faithfulness dialectic diversity • English: NOCODA≫ FAITH • Polynesian: FAITH≫ NOCODA(~French) • Another markedness constraint M: • Nasal Place Agreement [‘Assimilation’] (NPA): mb ≻nb, ŋb nd ≻ md, ŋd ŋg ≻ŋb, ŋd labial coronal velar
Nativism I: Learnability • Learning algorithm • Provably correct and efficient (under strong assumptions) • Sources: • Tesar 1995 et seq. • Tesar & Smolensky 1993, …, 2000 • If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E
☺☞ Constraint Demotion Learning • If you hear A when you expected to hear E, minimally demote each constraint violated by A below a constraint violated by E Correctly handles difficult case: multiple violations in E
Nativism I: Learnability • M ≫ F is learnable with /in+possible/→impossible • ‘not’ = in- except when followed by … • “exception that proves the rule”: M = NPA • M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if no affixes and all underlying morphemes have mp: √M and√F, no M vs. F conflict, no evidence for their ranking • Thus must have M ≫ F in the initial state, ℌ0
Nativism II: Experimental Test • Collaborators • Peter Jusczyk • Theresa Allocco • (Elliott Moreton, Karen Arnold) • Linking hypothesis: More harmonic phonological stimuli ⇒ Longer listening time • More harmonic: • M ≻ *M, when equal on F • F ≻ *F, when equal on M • When must choose one or the other, more harmonic to satisfy M: M ≫ F • M = Nasal Place Assimilation (NPA)
um...b...umb um...b...iŋgu • iŋ…..gu...iŋgu vs. iŋ…..gu…umb • … … Experimental Paradigm • Headturn Preference Procedure (Kemler Nelson et al. ‘95; Jusczyk ‘97) • X/Y/XYparadigm (P. Jusczyk) • un...b...umb • un...b...umb *FNP ℜ p = .006 ∃FAITH • Highly general paradigm: Main result
Nativism III: UGenome • Can we combine • Connectionist realization of harmonic grammar • OT’s characterization of UG to examine the biological plausibility of UG as innate knowledge? • Collaborators • Melanie Soderstrom • Donald Mathis • Oren Schwartz
Nativism III: UGenome • The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device • Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’ • Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!
Summary • Described an attempt to integrate • Connectionist theory of mental processes (computational neuroscience, cognitive psychology) • Symbolic theory of • Mental functions (philosophy, linguistics) • Representations • General structure (philosophy, AI) • Specific structure (linguistics) • Informs theory of UG • Form, content • Genetic encoding
The Problem • No concrete examples of such a LAD exist • Even highly simplified cases pose a hard problem: How can genes— which regulate production of proteins — encode symbolic principles of grammar? • Test preparation: Syllable Theory
Grammar Innate Constraints Abstract Neural Network Abstract Genome B = A instantiates B = A encodes B B A Biological Neural Network A Approach: Multiple Levels of Encoding Biological Genome
Basic syllabification: Function • ƒ: /underlying form/ [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.]
Basic syllabification: Function • ƒ: /underlying form/ [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.] • Basic CV Syllable Structure Theory • Prince & Smolensky 1993: Chapter 6 • ‘Basic’ — No more than one segment per syllable position: .(C)V(C).
Syllabification: Constraints (Con) • PARSE: Every element in the input corresponds to an element in the output • FILLV/C: Every output V/C segment corresponds to an input V/C segment • ONSET: No V without a preceding C • NOCODA: No C without a following V
C V SAnet architecture • /C1 C2/ [C1 V C2] /C1 C2 / [ C1 V C2 ]
s2 i 2 1 s1 Local: fixed, gene-tically determined Content of constraint 1 Global: variable during learning Strength of constraint 1 Network weight: Network input: ι = WΨ a Connection substructure
C 1 1 1 1 V 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 PARSE • All connection coefficients are +2
C V ONSET • All connection coefficients are 1
Crucial Open Question(Truth in Advertising) • Relation between strict domination and neural networks? • Apparently not a problem in the case of the CV Theory
To be encoded • How many different kinds of units are there? • What information is necessary (from the source unit’s point of view) to identify the location of a target unit, and the strength of the connection with it? • How are constraints initially specified? • How are they maintained through the learning process?