150 likes | 372 Views
Statistical Methods and Linguistics - Steven Abney. 1998. 09. 24. Thur. POSTECH Computer Science NLP Lab 9425021 Shim Jun-Hyuk. Contents. Introduction Linguistics Review under Statistical methods Language Acquisition Language Change Language Variation
E N D
Statistical Methods and Linguistics - Steven Abney 1998. 09. 24. Thur. POSTECH Computer Science NLP Lab 9425021 Shim Jun-Hyuk
Contents • Introduction • Linguistics Review under Statistical methods • Language Acquisition • Language Change • Language Variation • Language Structure and Performance • Language Property • Grammaticality and Ambiguity v. Performance • Non-Linguistic Factors for Performance • Grammaticality and Acceptability • Grammar and Computation • The Frictionless Plane, Autonomy and Isolation • Holy Grail CS730B - Statistical NLP
Contents • How Statistics Helps • Disambiguation • Degrees of Grammaticality • Naturalness • Structure Preferences • Error Tolerance • Learning on the Fly • Lexical Acquisition • Objections • Are Stochastic Methods only for engineers? • Did not Chomsky debunk all this ages ago? • Conclusion CS730B - Statistical NLP
Introduction • Linguistics • Computation Linguistics • Performance • Practical Application • little concerned with human language processing • Rationale by the Statistical Method • Theoretical Linguistics • Competence • Theoretical Research with grammars and structures • concerned with human language processing • Objectives • Theoretical Background of Statistical analyses • Review in the view of Linguistics • Importance of Weighted Grammar CS730B - Statistical NLP
1. Linguistics Review under Statistical Models (1) • Objective • Linguistics Issues in terms of population of grammar • General population of grammar can be usefully examined by the Statistical Models • Language Acquisition (LA) • Probabilistic(stochastic) or weighted grammar in Children’s LA Process • Co-existence and decay in grammars • Algebraic(Non-stochastic) grammar as supplementation CS730B - Statistical NLP
1. Linguistics Review under Statistical Models (2) • Language Change • Change in Probability of Language Construction • EX) Rule, Parameter setting • Not “Abrupt”, but “Gradual” • Statistical Co-existence and Decay • “Adult monolingual speaker” - finally the grammar is stochastic in community • Language Variance • Dialectology • Arbitrary continuum of language made by geographic distance • Contact Frequency and intelligibility • Typology • EX) Language Feature, Conditional Probability distributions • Statistical Modeling using the stochastic grammar CS730B - Statistical NLP
2. Language Structure and Performance (1) • Language • Algebraic Properties • Idealization - Adult monolingual Speaker • theoretical syntax - Linguistics Data • Structure judgments for competence • Statistical Properties • Stochastic Model - Performance data • adjustments on structure-judgement data for “performance effects” • grammaticality and ambiguity judgments about the sentences as opposed to structure CS730B - Statistical NLP
2. Language Structure and Performance (2) • Grammaticality and Ambiguity v. Performance • Example • The a are of I • The cows are grazing in the meadow • John saw Mary • Ambiguity Problem under Grammatical structures • Genuine ambiguities and Spurious ambiguities Problem • Is not ungrammatical but undesired analyses • case1 - elided sentence • case2 - rare Usage • The Problem is how to identify the correct structure form the possible. • Can be solved by the use of weighted grammars in computational linguistics CS730B - Statistical NLP
2. Language Structure and Performance (3) • Non-Linguistic Factors for Performance • Perception is the problem of Performance and It needs Non-Linguistic Factors with Grammaticality • Grammaticality and Acceptability • perceptions of grammaticality and Ambiguity - Performance data • What is “Performance data” - find some choice of words and context to get a clear positive judgment (Acceptability) • Grammar and Computation • The Problem how can we compute the linguistic data simply and absolutely • Competence v. Computation • Autonomy of syntax - not same as isolation and not be reduced to semantics • Holy Grail • The larger picture and ultimate goal of Generative linguistics is to make sense of language production, comprehension, acquisition, variation, and change CS730B - Statistical NLP
3. How Statistics Helps (1) • Disambiguation (모호성 해소) • Describing an algorithm to compute the correct parse among the possible • correct parse - the parse that human perceive • various statistical methods exist • 예) “John walks” - Context-free grammar with weights of rules • Degrees of Grammaticality • Gradations of acceptability • Degrees of error in speech production • Measure of goodness is a global measure that combine the degrees of grammaticality with naturalness and structural preference • By parameter Estimation, we can get the measure of “ degrees of grammaticality” CS730B - Statistical NLP
3. How Statistics Helps (2) • Naturalness • plausibility - in the sense of selectional preferences • collocational knowledge - “how do you say it” • statistical method are applied to collocations and selectional restrictions • Structural Preference • One of the parsing strategies • longest-match preference • make an important role in the dispreference for the structure • Error tolerance • Detecting the error in sentences and select the best analysis • Primary motivations for Shannon’s noisy channel model CS730B - Statistical NLP
3. How Statistics Helps (3) • Learning on the Fly • much like the error correction • to admit a space of learning operations • assigning a new part of speech to a word • adding a new subcategorization frame to verb, etc • Lexical Acquisition • the absolute richness of natural language grammars and lexica • primary area of application for distributional and statistical approaches to acquisition • Example of distributional Approaches • acquisition of Part-of-Speech • Collocation • selectional restriction and ETC. CS730B - Statistical NLP
4. Objections to Statistical Methods • Are Stochastic Models only for Engineers? • Are the stochastic models practically always a stopgap approximation? • With a complex deterministic system and the initial conditions we can compute the state at all time • In fact, more insight and successful than identifying every deterministic factors • What Chomsky really proves? • syntactic Structures (1957) • Chomsky : grammatical(s) Pn(s) > E • no choice for “n” and “E” • Pn(s) : best n-th order approximation to English • Shannon’s MM : grammatical(s) lim(noo)Pn(s) > E • n increase, then erroneously assigned non-zero probability decease • Handbook of Mathematical Psychology (1963) CS730B - Statistical NLP
5. Conclusion • Statistical method • weighted grammars, distributional induction methods • relevant to Linguistics • Performance v. Competence • Performance is not a goal but a useful tool of Computational Linguistics • Competence is needed to understand the algebraic properties of language • Algebraic methods are inadequate for understanding the human language • The Age of Computational Linguistics using Statistical Technology CS730B - Statistical NLP