Neural Network Model for Natural Language Learning: Examples from Cognitive Metaphor and Constructional Polysemy

A Neural Net Model for natural language learning: examples from cognitive metaphor and constructional polysemy Eleni Koutsomitopoulou PhD candidate, Computational Linguistics, Georgetown University, Washington DCandSenior Indexing Analyst, LexisNexis Butterworths Tolley, London UK GURT 2003 Cognitive and Discourse Perspectives on Language and Language Learning

Summary of the presentation • Older approaches to NL Learning • Initial Motivations for the ART0 neural network Model • The Adaptive Resonance Theory Approach • Learning through differentiation • Some critical questions in NL learning vis-à-vis cognition • Illustrative examples from cognitive metaphor and constructional polysemy • Conclusions

A high-level overview of related models • The ‘classical’ Hierarchical Propositional Approach (e.g. Quillian’s 1969) • A distributed connectionist model that learns from exposure to information about the relations between concepts and their properties (Rumelhard & Maclelland, PDP, 1986 et seq) • A natural language based propositional distributed connectionist model that learns about concepts and their properties in discourse. 2

Quillian’sHierarchicalPropositional Model 3

Initial Motivations for the Model • Provide a connectionist alternative to traditional hierarchical propositional models of conceptual knowledge representation. • Account for development of conceptual knowledge as a gradual process involving progressive differentiation. 4

The ART Approach • Processing occurs via propagation of activation among simple processing units (represented as nodes in the network). • Knowledge is stored in the weights on connections between the nodes (LTM), as well as in individual nodes (STM). • Propositions are stored directly after being parsed and mapped as nodes in the network. • The ability to produce resonant propositions from partial probes based on previous learned propositional input arises through the activation process, based on the interaction between STM and LTM knowledge stored in the nodes and their interconnections respectively. • Learning occurs via adjustment in time of the strength of the nodes and that of their connections (ART differential equations). • Semantic knowledge is gradually acquired through repeated exposure to new propositional input, mirroring the gradual nature of cognitive and NL development. 5

ART equations ART0 basic equations (Grossberg 1980, Loritz 2000) ART equations Xj= -Cxj +Dxj (nij ) -Exk (nkj) C: inhibition D: excitation E: decay Zij= -Azij + Bxixj B: learning rate A: decay rate B= learning rate C= inhibition E=node decay A=LTM decay D=node excitation 6

Differentiation in Learning 7

Some critical questions in NL learning -Which properties are central to particular natural language categories (prototype effects, Rosch, 1975 et seq) -How properties should be generalized from one category to another (inference through experience) -Must some “constraints” on acquiring natural language be available ‘initially’? (signal decay, habituation, rebounds, expectancies) -Is reorganization of such NL knowledge possible through experience, and how. 8

ART0 Basics-In the network, the salient properties for a given NL concept are represented in an antagonistic dipole anatomy. Probing the activation patterns of the NL concepts mapped we represent learning.-The traditional notions of “category aptness” and “feature salience” are a matter of gradual structural and functional modification via specialization of the NL input.-Attributes/Concepts activated as part of the same pattern create conceptual clusters contiguous in semantic space facilitating learning.-Granularity: Primary concepts (“feature-centric”) are the building blocks of more complex super-ordinate concepts (“cognitive categories”), but whether we classify (learn) “concepts” or “features” we do it via the vehicle of NL propositions. -Learning via self-similarity is easier, faster and more economical. Individual concepts (i.e. concepts in no relation with any others) are learned at a slower pace and after certain pertinent subnetworks have been acquired. The principle of differentiation via inhibition and its effects on NL learning9 9

Certain ART0 assumptions about Conceptual Reorganization • General assumption: Higher-level concepts are acquired only after certain crucial lower-level (primary) concepts have been acquired (Carey 1985). However, the acquisition comes via quantification (assimilation of information and acquisition via differentiation and classification) not qualification (granularity is irrelevant – there is no a priori hierarchy of concepts/features). • Primary metaphors (Grady 1997) are basic dipole anatomies. Resemblance metaphors are complex conceptual clusters learned around each dipole. • For the emergence of a new concept or feature assimilation different kinds of information is needed. If a new concept cannot be readily accommodated in the cognitive system (because some prerequisite factoids have not been acquired yet) a new cognitive category is built to retain it in memory for as long as some supportive factoids reinforce this learning. If no related factoids will be presented, the new category will get “forgotten”. 10

Testing Conceptual contiguity: methods(1) • Representations are generated by usingthe ART differential equations, testing the effects of the nodes and weights across the links in the network. • Instead of comparing separate trained representations as a typical Rumelhart-McClelland model would do, we check patterns of activation by comparing the activation numbers to see whether the anticipated relationships between concepts were successfully modeled. 11

Domain specific vs. domain generic: methods (2) • The simulation suggests that generic inter-domain/discourse learning mechanisms such as that of inhibition can teach the network the aptness of different features for different concept and different concepts for different discourses. • The network is able to map and acquire stable domain-specific conceptual knowledge. • Knowledge acquisition in the network is possible via introduction/mapping of factoids based on NL input and native speaker intuitions about it, without the need for initial or a priori domain knowledge. 12

Running the ART0 simulations • First we construct a few "minimal anatomies" which display requisite properties such as stability in the face of (plastic) inputs and LTM stability in the absence of inputs. These minimal anatomies are generated by metaphoric and non-metaphoric sentential inputs to an artificial neural network constructed on ART principles. • TheART network takes as input parse trees for sentences drawn from some major classes of metaphor identified by CMT. A basic parser generates the parse trees, and the parse tree of each input sentence will be converted to a resonant network according to the ART equations. Each input sentence is connected (mapped) to the network at the terminal nodes, i.e. the lexical items which may be common to multiple input sentences. 13

Conceptual Reorganization in the Model • The ART0 simulation model provides a vehicle for exploring how conceptual reorganization can occur. • Changing the links (relations) between the nodes (concepts), as well as the nodes involved in a simulation each time, the ART0 model is capable of forming initial representations based on “superficial” appearances (for instance, internal sentence structure). • Later, after the phasic input has been introduced to the network, the model reorganizes its previous representations as it learns new discourse-dependent concept relations. • The network can categorize patterns across different discourses, and the emergent structure may be used as a basis for a deeper NL understanding. 14

examples

Metaphoric feature probe (resemblance metaphor) • John is a Hominidae. • Wilbur is a Suidae. • Wilbur is a pig. t --------------------------------------------------- t+1 • John is a pig. 16

How the network looks like 17

S S 3 1 Was_asked_out Was_kicked_out_of_th arguments _the_house e_house values 3.648 7.930 Experimental Results(activation patterns)activation patterns) • Table here Results par. C (inhibition) = .6 (connection weight) Zij = .05 par. B (learning rate) = .45 18

Orientational Primary metaphor • The boy ran down the stairs. • Mary feels down. • John feels bad. t ---------------------------------------------- t+1 • Down is bad. 19

Experimental Results(activation patterns)activation patterns) • Table here 21

A glimpse at event-structure metaphor • John is at a crossroads in his business. • John is at a crossroads in his life. • Life is a journey. • A journey may lead to an intersection. t ---------------------------------------------- t+1 • John is at an intersection. 22

Experimental Results(activation patterns)activation patterns) • Table here 24

Constructional polysemy • The dog was kicked out of the house. • John was asked out of the house. • John is out of the house. t ------------------------------------------------- t+1 • Bill is out of the house. 25

Experimental Results(activation patterns)activation patterns) 27

conclusion • The model exhibits certain characteristics of human cognition and NL learning in particular. • The model does this simply by mapping NL propositional input as nodes in the network and by adjusting both the weights on the connections as well as the connectivity between and activation patterns of individual nodes in time, and by propagating signals forward (in time and structure) through these connections. 28

Review of ART0 system features • It provides explicit mechanisms indicating how intra-domain and inter-domain knowledge influences semantic cognition and NL learning. • It offers a learning process that provides a means for the acquisition of such knowledge. • It demonstrates that some of the sorts of constraints people have suggested might be innate can in fact be acquired from experience. • Unlike other connectionist models (e.g. PDP), the ART0 learning algorithm emphasizes the role of memory in NL learning. 29

Neural Network Model for Natural Language Learning: Examples from Cognitive Metaphor and Constructional Polysemy