1 / 67

The Harmonic Mind

The Harmonic Mind. Paul Smolensky Cognitive Science Department Johns Hopkins University. with:. G é raldine Legendre Donald Mathis Melanie Soderstrom. Alan Prince Peter Jusczyk †. Advertisement. The Harmonic Mind: From neural computation to optimality-theoretic grammar

braden
Download Presentation

The Harmonic Mind

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Harmonic Mind Paul Smolensky Cognitive Science Department Johns Hopkins University with: Géraldine Legendre Donald Mathis Melanie Soderstrom Alan Prince Peter Jusczyk†

  2. Advertisement The Harmonic Mind: From neural computation to optimality-theoretic grammar Paul Smolensky   & Géraldine Legendre • Blackwell 2002 (??) • Develop the Integrated Connectionist/Symbolic (ICS) Cognitive Architecture • Apply to the theory of grammar • Present a case study in formalist multidisciplinary cognitive science; show inputs/outputs of ICS

  3. Talk Outline ‘Sketch’ the ICS cognitive architecture, pointing to contributions from/to traditional disciplines • Connectionist processing as optimization • Symbolic representations as activation patterns • Knowledge representation: Constraints • Constraint interaction I: Harmonic Grammar, Parser • Explaining ‘productivity’ in ICS (Fodor et al. ‘88 et seq.) • Constraint interaction II: Optimality Theory (‘OT’) • Nativism I: Learnability theory in OT • Nativism II: Experimental test • Nativism III: UGenome

  4. Processing I: Activation • Computational neuroscience  ICS • Key sources • Hopfield 1982, 1984 • Cohen and Grossberg 1983 • Hinton and Sejnowski 1983, 1986 • Smolensky 1983, 1986 • Geman and Geman 1984 • Golden 1986, 1988

  5. –λ (–0.9) a1 a2 i1 (0.6) i2 (0.5) Processing I: Activation Processing — spreading activation — is optimization: Harmony maximization

  6. –λ (–0.9) a1 a2 i1 (0.6) i2 (0.5) Processing II: Optimization • Cognitive psychology  ICS • Key sources: • Hinton & Anderson 1981 • Rumelhart, McClelland, & the PDP Group 1986 Processing — spreading activation — is optimization: Harmony maximization

  7. a1 and a2must not be simultaneously active (strength: λ) Harmony maximization is satisfaction of parallel, violable constraints –λ (–0.9) a1 a2 a1must be active (strength: 0.6) a2must be active (strength: 0.5) i1 (0.6) i2 (0.5) Optimal compromise: 0.79 –0.21 Processing II: Optimization Processing — spreading activation — is optimization: Harmony maximization

  8. Two Fundamental Questions Harmony maximization is satisfaction of parallel, violable constraints 2. What are the constraints? Knowledge representation Prior question: 1. What are the activation patterns — data structures — mental representations — evaluated by these constraints?

  9. Representation • Symbolic theory  ICS • Complex symbol structures • Generative linguistics  ICS • Particular linguistic representations • PDP connectionism  ICS • Distributed activation patterns • ICS: • realization of (higher-level) complex symbolic structures in distributed patterns of activation over (lower-level) units (‘tensor product representations’ etc.)

  10. σ σ k k æ t æ t σ/rε k/r0 æ/r01 t/r11 [σ k [æ t]] Representation

  11. Constraints • Linguistics (markedness theory) ICS • ICS  Generative linguistics: Optimality Theory • Key sources: • Prince & Smolensky 1993 [ms.; Rutgers TR] • McCarthy & Prince 1993 [ms.] • Texts: Archangeli & Langendoen 1997, Kager 1999, McCarthy 2001 • Electronic archive: http://roa.rutgers.edu

  12. σ k æ t *violation ‘cat’ W a[σk [æ t ]] * Constraints NOCODA: A syllable has no coda * H(a[σk [æ t]) = –sNOCODA < 0

  13. Constraint Interaction I • ICS  Grammatical theory • Harmonic Grammar • Legendre, Miyata, Smolensky 1990 et seq.

  14. σ H k = H æ t = H(k ,σ) > 0 H(σ, t) < 0 NOCODACoda/t ONSETOnset/k = Constraint Interaction I The grammar generates the representation that maximizes H: this best-satisfies the constraints, given their differential strengths Any formal language can be so generated.

  15. Top-down X Y X Y X Y Bottom-up A B B A A B B A A B B A Harmonic Grammar Parsing • Simple, comprehensible network • Simple grammar G • X → A B Y → B A • Language Parsing

  16. W Simple Network Parser • Fully self-connected, symmetric network • Like previously shown network … … Except with 12 units; representations and connections shown below

  17. Explaining Productivity • Approaching full-scale parsing of formal languages by neural-network Harmony maximization • Have other networks that provably compute recursive functions !productive competence • How to explain?

  18. 1. Structured representations

  19. + 2. Structured connections

  20. = Proof of Productivity • Productive behavior follows mathematically from combining • the combinatorial structure of the vectorial representations encoding inputs & outputs and • the combinatorial structure of the weight matrices encoding knowledge

  21. Constraint Interaction II: OT • ICS  Grammatical theory • Optimality Theory • Prince & Smolensky 1993

  22. Constraint Interaction II: OT • Differential strength encoded in strict domination hierarchies: • Every constraint has complete priority over all lower-ranked constraints (combined) • Approximate numerical encoding employs special (exponentially growing) weights • “Grammars can’t count” — question period

  23. Constraint Interaction II: OT • Constraints are universal(Con) • Candidate outputs are universal (Gen) • Human grammars differ only in how these constraints are ranked • ‘factorial typology’ • First true contender for a formal theory of cross-linguistic typology • 1st innovation of OT: constraint ranking • 2nd innovation: ‘Faithfulness’

  24. The Faithfulness / Markedness Dialectic • ‘cat’: /kat/  kæt*NOCODA— why? • FAITHFULNESSrequires pronunciation = lexical form • MARKEDNESS often opposes it • Markedness-Faithfulness dialectic diversity • English: FAITH≫ NOCODA • Polynesian: NOCODA≫ FAITH(~French) • Another markedness constraint M: • Nasal Place Agreement [‘Assimilation’] (NPA): ŋg ≻ŋb, ŋd velar nd ≻ md, ŋd coronal mb ≻nb, ŋb labial

  25. Optimality Theory • Diversity of contributions to theoretical linguistics • Phonology • Syntax • Semantics • Here: New connections between linguistic theory & the cognitive science of language more generally • Learning • Neuro-genetic encoding

  26. Nativism I: Learnability • Learning algorithm • Provably correct and efficient(under strong assumptions) • Sources: • Tesar 1995 et seq. • Tesar & Smolensky 1993, …, 2000 • If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E

  27. ☺☞ Constraint Demotion Learning If you hear A when you expected to hear E, increase the Harmony of A above that of E by minimally demoting each constraint violated by A below a constraint violated by E Correctly handles difficult case: multiple violations in E

  28. Nativism I: Learnability • M ≫ F is learnable with /in+possible/→impossible • ‘not’ = in- except when followed by … • “exception that proves the rule, M = NPA” • M ≫ F is not learnable from data if there are no ‘exceptions’ (alternations) of this sort, e.g., if lexicon produces only inputs with mp, never np: then M andF, no M vs. F conflict, no evidence for their ranking • Thus must have M ≫ F in the initial state, ℌ0

  29. Nativism II: Experimental Test • Collaborators • Peter Jusczyk • Theresa Allocco • (Elliott Moreton, Karen Arnold)

  30. Nativism II: Experimental Test • Linking hypothesis: More harmonic phonological stimuli ⇒Longer listening time • More harmonic: • M ≻ *M, when equal on F • F ≻ *F, when equal on M • When must chose one or the other, more harmonic to satisfy M: M ≫ F • M = Nasal Place Assimilation (NPA)

  31. 4.5 Months (NPA)

  32. 4.5 Months (NPA)

  33. 4.5 Months (NPA)

  34. 4.5 Months (NPA)

  35. Nativism III: UGenome • Can we combine • Connectionist realization of harmonic grammar • OT’s characterization of UG to examine the biological plausibility of UG as innate knowledge? • Collaborators • Melanie Soderstrom • Donald Mathis

  36. Nativism III: UGenome • The game: take a first shot at a concrete example of a genetic encoding of UG in a Language Acquisition Device — no commitment to its (in)correctness • Introduce an ‘abstract genome’ notion parallel to (and encoding) ‘abstract neural network’ • Is connectionist empiricism clearly more biologically plausible than symbolic nativism? No!

  37. The Problem • No concrete examples of such a LAD exist • Even highly simplified cases pose a hard problem: How can genes— which regulate production of proteins — encode symbolic principles of grammar? • Test preparation: Syllable Theory

  38. Basic syllabification: Function • ƒ: /underlying form/  [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.]

  39. Basic syllabification: Function • ƒ: /underlying form/  [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.] • Basic CV Syllable Structure Theory • Prince & Smolensky 1993: Chapter 6 • ‘Basic’ — No more than one segment per syllable position: .(C)V(C).

  40. Basic syllabification: Function • ƒ: /underlying form/  [surface form] • Plural form of dish: • /dš+s/[.d.š z.] • /CVCC/ [.CV.CV C.] • Basic CV Syllable Structure Theory • Correspondence Theory • McCarthy & Prince 1995 (‘M&P’) • /C1V2C3C4/ [.C1V2.C3 V C4]

  41. Syllabification: Constraints (Con) • PARSE: Every element in the input corresponds to an element in the output — “no deletion” [M&P 95: ‘MAX’]

  42. Syllabification: Constraints (Con) • PARSE: Every element in the input corresponds to an element in the output • FILLV/C: Every output V/C segment corresponds to an input V/C segment [every syllable position in the output is filled by an input segment] — “no insertion/epenthesis” [M&P 95: ‘DEP’]

  43. Syllabification: Constraints (Con) • PARSE: Every element in the input corresponds to an element in the output • FILLV/C: Every output V/C segment corresponds to an input V/C segment • ONSET: No V without a preceding C

  44. Syllabification: Constraints (Con) • PARSE: Every element in the input corresponds to an element in the output • FILLV/C: Every output V/C segment corresponds to an input V/C segment • ONSET: No V without a preceding C • NOCODA: No C without a following V

  45. C V Network Architecture • /C1 C2/  [C1 V C2] /C1 C2 / [ C1 V C2 ]

  46. s2 i  2 1 s1 Local: fixed, gene-tically determined Content of constraint 1 Global: variable during learning Strength of constraint 1  Network weight: Network input: ι = WΨ a Connection substructure

  47. C 1 1 1 1 V 1 1 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 PARSE • All connection coefficients are +2

  48. C V ONSET • All connection coefficients are 1

  49. Activation: Stochastic, binary Boltzmann Machine/Harmony Theory dynamics (T  0); maximizes Harmony: • Learning: Gradient descent in error: • During the processing of training data in phase P, whenever unit φ (of type Φ) and unit ψ (of type Ψ) are simultaneously active, modify si by ε . Network Dynamics

  50. Crucial Open Question(Truth in Advertising) • Relation between strict domination and neural networks? • Apparently not a problem in the case of the CV Theory

More Related