1 / 14

Statistical Methods and Linguistics - Steven Abney

Statistical Methods and Linguistics - Steven Abney. 1998. 09. 24. Thur. POSTECH Computer Science NLP Lab 9425021 Shim Jun-Hyuk. Contents. Introduction Linguistics Review under Statistical methods Language Acquisition Language Change Language Variation

adelle
Download Presentation

Statistical Methods and Linguistics - Steven Abney

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistical Methods and Linguistics - Steven Abney 1998. 09. 24. Thur. POSTECH Computer Science NLP Lab 9425021 Shim Jun-Hyuk

  2. Contents • Introduction • Linguistics Review under Statistical methods • Language Acquisition • Language Change • Language Variation • Language Structure and Performance • Language Property • Grammaticality and Ambiguity v. Performance • Non-Linguistic Factors for Performance • Grammaticality and Acceptability • Grammar and Computation • The Frictionless Plane, Autonomy and Isolation • Holy Grail CS730B - Statistical NLP

  3. Contents • How Statistics Helps • Disambiguation • Degrees of Grammaticality • Naturalness • Structure Preferences • Error Tolerance • Learning on the Fly • Lexical Acquisition • Objections • Are Stochastic Methods only for engineers? • Did not Chomsky debunk all this ages ago? • Conclusion CS730B - Statistical NLP

  4. Introduction • Linguistics • Computation Linguistics • Performance • Practical Application • little concerned with human language processing • Rationale by the Statistical Method • Theoretical Linguistics • Competence • Theoretical Research with grammars and structures • concerned with human language processing • Objectives • Theoretical Background of Statistical analyses • Review in the view of Linguistics • Importance of Weighted Grammar CS730B - Statistical NLP

  5. 1. Linguistics Review under Statistical Models (1) • Objective • Linguistics Issues in terms of population of grammar • General population of grammar can be usefully examined by the Statistical Models • Language Acquisition (LA) • Probabilistic(stochastic) or weighted grammar in Children’s LA Process • Co-existence and decay in grammars • Algebraic(Non-stochastic) grammar as supplementation CS730B - Statistical NLP

  6. 1. Linguistics Review under Statistical Models (2) • Language Change • Change in Probability of Language Construction • EX) Rule, Parameter setting • Not “Abrupt”, but “Gradual” • Statistical Co-existence and Decay • “Adult monolingual speaker” - finally the grammar is stochastic in community • Language Variance • Dialectology • Arbitrary continuum of language made by geographic distance • Contact Frequency and intelligibility • Typology • EX) Language Feature, Conditional Probability distributions • Statistical Modeling using the stochastic grammar CS730B - Statistical NLP

  7. 2. Language Structure and Performance (1) • Language • Algebraic Properties • Idealization - Adult monolingual Speaker • theoretical syntax - Linguistics Data • Structure judgments for competence • Statistical Properties • Stochastic Model - Performance data • adjustments on structure-judgement data for “performance effects” • grammaticality and ambiguity judgments about the sentences as opposed to structure CS730B - Statistical NLP

  8. 2. Language Structure and Performance (2) • Grammaticality and Ambiguity v. Performance • Example • The a are of I • The cows are grazing in the meadow • John saw Mary • Ambiguity Problem under Grammatical structures • Genuine ambiguities and Spurious ambiguities Problem • Is not ungrammatical but undesired analyses • case1 - elided sentence • case2 - rare Usage • The Problem is how to identify the correct structure form the possible. • Can be solved by the use of weighted grammars in computational linguistics CS730B - Statistical NLP

  9. 2. Language Structure and Performance (3) • Non-Linguistic Factors for Performance • Perception is the problem of Performance and It needs Non-Linguistic Factors with Grammaticality • Grammaticality and Acceptability • perceptions of grammaticality and Ambiguity - Performance data • What is “Performance data” - find some choice of words and context to get a clear positive judgment (Acceptability) • Grammar and Computation • The Problem how can we compute the linguistic data simply and absolutely • Competence v. Computation • Autonomy of syntax - not same as isolation and not be reduced to semantics • Holy Grail • The larger picture and ultimate goal of Generative linguistics is to make sense of language production, comprehension, acquisition, variation, and change CS730B - Statistical NLP

  10. 3. How Statistics Helps (1) • Disambiguation (모호성 해소) • Describing an algorithm to compute the correct parse among the possible • correct parse - the parse that human perceive • various statistical methods exist • 예) “John walks” - Context-free grammar with weights of rules • Degrees of Grammaticality • Gradations of acceptability • Degrees of error in speech production • Measure of goodness is a global measure that combine the degrees of grammaticality with naturalness and structural preference • By parameter Estimation, we can get the measure of “ degrees of grammaticality” CS730B - Statistical NLP

  11. 3. How Statistics Helps (2) • Naturalness • plausibility - in the sense of selectional preferences • collocational knowledge - “how do you say it” • statistical method are applied to collocations and selectional restrictions • Structural Preference • One of the parsing strategies • longest-match preference • make an important role in the dispreference for the structure • Error tolerance • Detecting the error in sentences and select the best analysis • Primary motivations for Shannon’s noisy channel model CS730B - Statistical NLP

  12. 3. How Statistics Helps (3) • Learning on the Fly • much like the error correction • to admit a space of learning operations • assigning a new part of speech to a word • adding a new subcategorization frame to verb, etc • Lexical Acquisition • the absolute richness of natural language grammars and lexica • primary area of application for distributional and statistical approaches to acquisition • Example of distributional Approaches • acquisition of Part-of-Speech • Collocation • selectional restriction and ETC. CS730B - Statistical NLP

  13. 4. Objections to Statistical Methods • Are Stochastic Models only for Engineers? • Are the stochastic models practically always a stopgap approximation? • With a complex deterministic system and the initial conditions we can compute the state at all time • In fact, more insight and successful than identifying every deterministic factors • What Chomsky really proves? • syntactic Structures (1957) • Chomsky : grammatical(s)  Pn(s) > E • no choice for “n” and “E” • Pn(s) : best n-th order approximation to English • Shannon’s MM : grammatical(s)  lim(noo)Pn(s) > E • n increase, then erroneously assigned non-zero probability decease • Handbook of Mathematical Psychology (1963) CS730B - Statistical NLP

  14. 5. Conclusion • Statistical method • weighted grammars, distributional induction methods • relevant to Linguistics • Performance v. Competence • Performance is not a goal but a useful tool of Computational Linguistics • Competence is needed to understand the algebraic properties of language • Algebraic methods are inadequate for understanding the human language • The Age of Computational Linguistics using Statistical Technology CS730B - Statistical NLP

More Related