1 / 36

A Naturalistic, Functional Approach to NLU November 2008

A Naturalistic, Functional Approach to NLU November 2008. Jerry Ball Air Force Research Laboratory. Introduction. By Naturalistic , I mean… Models language behavior below the level of input-output behavior Inside the cognitive “black box” (Ball, 2006) (But above the neural level)

lamya
Download Presentation

A Naturalistic, Functional Approach to NLU November 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Naturalistic, Functional Approach to NLUNovember 2008 Jerry Ball Air Force Research Laboratory

  2. Introduction • By Naturalistic, I mean… • Models language behavior below the level of input-output behavior • Inside the cognitive “black box” (Ball, 2006) • (But abovethe neural level) • Adheres to well established cognitive constraints on human language representation and processing

  3. Introduction • By Naturalistic, I mean… • Avoids computational techniques which are obviously not cognitively plausible, e.g. • Algorithmic backtracking • Requiring the full input in advance • Strictly autonomous processing modules • Staged part of speech tagging followed by parsing • Using the right context to make parsing decisions • Full unification (unlimited depth of recursion) • Backward inferencing (running productions in reverse)

  4. Introduction • By Functional, I mean… • Handles a broad range of linguistic inputs • Not limited to some specialized collection of inputs which tests some isolated psycholinguistic phenomenon or models a toy world • Doesn’t assume away lexical and structural ambiguity • Supports the addition of linguistic categories and mechanisms, as needed, to model a broad range of inputs • Functionally motivated linguistic categories • Focus on meaning, not just form • Intended for use in real-world applications • Synthetic Air Vehicle Operator (AVO) Teammate project

  5. Introduction • Empirically validated at a gross-level • Small-scale laboratory studies conducted without a functional system in place are likely to be counter-productive • Don’t generalize well to more complex systems • From the functionalist perspective, it is premature to enforce minimalist assumptions in the absence of a functional model • Ockham’s Razor may well be inappropriate • Ockham’s Razor favors the simplest model that covers a set of phenomona, but does not simultaneously favor modeling the simplest set of phenomena (Roelofs, 2005)

  6. Key Assumption • Given the inherently human nature of language processing, adhering to well-established cognitive constraints may actually facilitate development by pushing development in directions that are more likely to be successful • Short-term costs associated with adherence to cognitive constraints will ultimately yield long-term benefits • System for handling variability in word input form (e.g. H-AREAh-areaH Area harea) also supports processing of multi-word expressions (e.g. “kick the bucket”) • Don’t know what you’re giving up when you adopt cognitively implausible mechanisms • Microsoft parser – processes input from right to left! • Can’t be integrated with speech recognition systems • Full input required in advance • Can’t be used in interactive applications

  7. Constraints on Human Language Processing • Visual World Paradigm(Tanenhaus et al. 1995) • Subjects presented with a visual scene • Subjects listen to auditory linguistic input describing scene • Immediate determination of meaning • Subjects look immediately at referents of linguistic expressions, sometimes before end of expression • Incremental processing • Interactive, highly context-sensitive processing (Trueswell et al. 1999) • Ambiguous expressions are processed consistent with scene “the green…” “put the arrow on the paper into the box”

  8. Constraints on Human Language Processing • Largely serial and deterministic • Empirical evidence that we don’t retract previously built representations (Christianson et al. 2001) • “While Mary dressed the baby sat up on the bed” • Empirical evidence that we don’t carry forward multiple representations in parallel – Garden Path Sentences • “The horse raced past the barn fell” (Bever 1970) • Some evidence of parallelism • Empirical evidence that we may carry forward multiple representations in parallel – Garden Path Effects can be eliminated with sufficient context • Sensitive to frequency of language experience • Limited recursive capabilities (no unbounded stack) • Center embedded constructions are extremely difficult to process • “The mouse the cat the dog chased bit ate the cheese”

  9. Linguistic Representations • Psycholinguistic studies reveal little about linguistic representations • Levelt’s early studies are an exception • However, if language processing is highly context sensitive, then linguistic representations are likely to reflect this… • No autonomous syntactic processing  no strictly syntactic representations

  10. Linguistic Representations • Encode syntactic, functional and linguistically relevant semantic information • No sharp distinction between syntax and semantics (or pragmatics) • Most form-based variation is functional and meaningful • Linguistic categories are functionally motivated • Handling wh-questions requires mechanisms for recognizing the fronted wh-expression and binding the fronted expression to a trace of an implicit argument (or equivalent functionality) • What1 did he do t1?

  11. Linguistic Representations • Two key dimensions of meaning which get grammatically encoded are Referential and Relational meaning (Double R Grammar) • X-Bar Semantics: • (Ref-Pt)+ Spec + Head Referring Expression (aka Maximal Projection) • Rel-Head + 1..4 Complements  Relational Expression • Nominals refer to objects  Object Referring Expression • Clauses refer to situations  Situation Referring Expression • Encoding additional dimensions of meaning leads to more complex grammatical representations • Topic/Focus • Given/New

  12. Wh-question major grammatical unit Who did he kick the ball to? part of speech Flat representations akin to Simpler Syntax and Construction Grammar Wh-question  Wh-focus Operator-Specifier SubjectHead

  13. Wh-question Who did he kick the ball to? grammatical function Functional categories from X-Bar Theory explicitly represented Head Specifier -- Operator Specifier Modifier -- Post-head Modifier Complement -- Subject, Object…

  14. Wh-question Who did he kick the ball to? referring expression All refering expressions have a bind-indx slot

  15. Wh-question Who did he kick the ball to? relation complement Relations (verb, preposition, adjective, adverb) take 1 to 4 complements (subj, obj, iobj, sit-comp, loc-comp)

  16. Wh-question Who did he kick the ball to? semantic feature

  17. Wh-question Who did he kick the ball to? *1* trace-*1* Implicit object of preposition binds to fronted wh-obj-refer-expr

  18. “Well-Established” Cognitive Constraint • At a gross level, humans process language incrementally in real-time performance cannot slow down with length of input • Non-determinism must somehow be managed at Marr’s algorithmic level • Via parallel processing • Spreading activation • Via non-monotonic processing • Context accommodation • Heuristics • Using probabilities • (Restricted language)

  19. Language Processing in the Model • “Nearly” deterministic serial processing (integration)without backtracking or lookahead! • Parallel, probabilistic, spreading activation mechanism (activation and selection) proposes linguistic constructions which are likely to be correct given current input & prior context – highly context sensitive • If current input is unexpected given the prior context, then accommodate the input without backtracking • The following example is from the Language Processing Model • “no airspeed or altitude restrictions”

  20. “no”  object specifier object referring expression no = nominal construction

  21. “airspeed”  object head no airspeed integration

  22. “airspeed or altitude”  object head no airspeed or altitude override Accommodation of conjunction via function overriding

  23. “airspeed or altitude”  modifier“restrictions”  object head no airspeed or altitude restrictions shift Appearance of parallel processing! airspeed or altitude= head vs. airspeed or altitude= mod Accommodation of new head via function shift

  24. Combining Serial, Deterministicand Parallel, Probabilistic Mechanisms The parallel probabilistic substrate makes a nearly deterministic serial processing mechanism possible! Parallel Probabilistic Parallel Distributed Processing PDP Supertag Stapling Probabilistic LTAG Tree Supertagging Construction Activation & Selection Construction Integration Nearly Deterministic Range Double R Lexicalized PCFG Lexical Rule Selection Rule Application Rule Selection Rule Application PCFG Rule Selection & Application Non-deterministic CFG Serial Deterministic

  25. Some Pitfalls to Avoid • Risk of becoming detached from empirical reality • Competence/Performance distinction allowed generative grammarians to ignore performance • No theory of performance • Not constrained by computational implementation • Core/Peripheral distinction exacerbates the problem • No sharp distinction between core and peripheral grammar • Language full of pseudo-regular constructions • No sharp distinction between lexicon and grammar • Grammaticality judgements are the primary empirical tool • OK gross level tool is used judiciously, but not exclusively

  26. Our Empirical Reality

  27. Some Pitfalls to Avoid • Computational linguistic systems which use machine learning techniques to identify linguistic categories are at risk of over fitting the data • Trade-off between simplicity and fit (Tenenbaum, 2007) • The Bikel reimplementation of the Collins parser learns rule like “if the noun following the verb is ‘milk’ attach low, else attach high”based on a single occurrence of “milk” following a verb in the Penn Treebank corpus where “milk” was annotated as attaching low (Fong, 2007) • On our corpus, the Brill part of speech tagger tagged “airspeed” as a verb based on the “ed” ending, due to over reliance on morphological information and lack of context for when to apply the rule • Silly to tag “airspeed” in “the airspeed” as a verb!

  28. The Problem of Complexity • Manual development may be overcome by inherent complexity • Computational linguistic systems built using machine learning techniques outperform manually built systems on large corpora, but provide only superficial analysis • Overcoming complexity may require • Better theories • Staged models of language processing were never practical for large systems – too much non-determinism and errors at lower levels get propagated to higher levels! • Integrating statistical and manual techniques • Use statistical mechanisms to compute frequencies and probabilities over theoretically motivated linguistic categories

  29. Conclusions • A Naturalistic, Functional approach to NLU has much to recommend it • Adhering to well-established cognitive constraints pushes development in directions that are more likely to be successful • What is needed is a demonstration that the approach is capable of delivering a functional system that is cognitively plausible…

  30. Questions?

  31. References Ball, J. (2007). A Bi-Polar Theory of Nominal and Clause Structure and Function. Annual Review of Cognitive Linguistics. Ball, J. (2007). Construction-Driven Language Processing. Proceedings of the 2nd European Cognitive Science Conference. Ball, J., Heiberg, A. & Silber, R. (2007). Toward a Large-Scale Model of Language Comprehension in ACT-R 6. Proceedings of the 8th International Conference on Cognitive Modeling. Heiberg, A., Harris, J. & Ball, J. (2007). Dynamic Visualization of ACT-R Declarative Memory Structure. Proceedings of the 8th International Conference on Cognitive Modeling. Ball, J. (2006). Can NLP Systems be a Cognitive Black Box? In Papers from the AAAI Spring Symposium, Technical Report SS-06-02, 1-6. Menlo Park, CA: AAAI Press

  32. Other References Anderson, J. R., Bothell, D., Byrne, M. D., Douglass, S, Lebiere, C, and Qin, Y. (2004). An Integrated Theory of the Mind. Psychological Review 111, (4). 1036-1060. Bever, T. (1970). The cognitive basis for linguistic structures. In J.R. Hayes (ed.), Cognition and Language Development, 277-360. New York: Wiley. Christianson et al. (2001). Thematic roles assigned along the garden path linger. Cognitive Psychology, 42, 368-407. Cooke, N. & Shope, S. (2005). Synthetic Task Environments for Teams: CERTT’s UAV-STE. Handbook on Human Factors and Ergonomics Methods. 46-1-46-6. Boca Raton, FL: CLC Press, LLC. Prince, A. & Smolensky, P. (1993/2004). Optimality Theory: Constraint interaction in generative grammar. Tech Report, Rutgers University & University of Colorado at Boulder. Revised version published by Blackwell, 2004. Rutgers Optimality Archive 537. Tanenhaus et al. (1995). Integration of visual and linguistic information in spoken language comprehension. Science, 268, 632-634. Trueswell, J. Sekering, I., Hill, N. & Logrip, M. (1999). The kindergarten path effect: studying on-line sentence processing in young children. Cognition, 73, 89-134.

  33. Some Pitfalls to Avoid • Typical computational linguistic systems perform only low level analysis of the linguistic input • “most of the current research on statistical NLP is focused on shallow syntactic analysis, due to the difficulty of modeling deep analysis with basic statistical learning algorithms” (Shen, 2006) • Sergei & Marge’s system is an exception!

  34. Some Pitfalls to Avoid • Risk of proliferation of functional elements • Incremental addition of categories for each new phenomenon of study can be explosive • Too many levels of representation and hidden elements in pre-minimalist generative grammar based representations • No psychological “face validity” (cf. Fereira, 2000) • How can hidden elements be learned? • The Minimalist Program is attempting to simplify grammar to redress the language acquisition problem • Explanatory adequacy

  35. Some Pitfalls to Avoid • Trade-off between simplicity and fit (Tenenbaum, 2007) • The simplest theory will seldom be the best fit, but don’t want to over fit the data • Minimalist syntax is a much simpler theory than its predecessors, but is a poor fit to much of the linguistic data that earlier theories handled (Culicover & Jackendoff, 2005) • Descriptive adequacy has been sacrificed in pursuit of a “perfect” system of core grammar

  36. Some Pitfalls to Avoid • Culicover and Jackendoff’s Simpler Syntax is redressing empirical and functional shortcomings of generative grammar by simplifying syntax and adding a generative semantic component • Not all meaning distinctions must be represented syntactically  syntax can be simplified • Scope of quantification, noun-noun combination, binding • By complicating semantic representations, and the interface between semantic and syntactic representations, syntactic representations can be simplified without loss in empirical coverage • Is overall complexity reduced?

More Related