What cross-linguistic variation tells us about information density in on-line processing

What cross-linguistic variation tells us about information density in on-line processing John A. Hawkins UC Davis & University of Cambridge

Patterns of variation across languages provide relevant evidence for current issues in psychology on information density in on-line processing.

Some background, first of all. I have argued (Hawkins 1994, 2004, 2009, to appear) for a ‘Performance-Grammar Correspondence Hypothesis’:

Performance-Grammar Correspondence Hypothesis (PGCH) Languages have conventionalized grammatical properties in proportion to their degree of preference in performance, as evidenced by patterns of selection in corpora and by ease of processing in psycholinguistic experiments.

I.e. languages have conventionalized or ‘fixed’ in their grammars the same kinds of preferences and principles that we see in performance, esp. in those languages in which speakers have alternatives to choose from in language use

E.g. between: alternative word orders relative clauses with or without a relativizer, with a gap or a resumptive pronoun extraposedvs non-extraposed phrases ‘Heavy’ NP Shift or no shift alternative ditransitive constructions zero vs non-zero case markers and so on

The patterns and principles found in these selections are, according to the PGCH, the same patterns and principles that we see in grammars in languages with fewer conventionalized options (more fixed orderings, gaps only in certain relativization environments, etc).

If so, linguists developing theories of grammar and of typological variation need to look seriously at theories of processing, in order to understand which structures are selected in performance, when, and why, with the result that grammars come to conventionalize these, and not other, patterns. See Hawkins (2004, 2009, to appear)

Conversely, psychologists need to look at grammars and at cross-linguistic variation in order to see what they tell us about processing. since grammars are conventionalized processing preferences.

Alternative variants across grammars are also, by hypothesis, alternatives for efficient processing. And the frequency with which these alternatives are conventionalized is, again by hypothesis, correlated with their degree of preference and efficiency in processing.

Looking at grammatical variation from a processing perspective can be revealing, therefore.

E.g. Japanese, Korean, Dravidian languages do not move heavy and complex phrases to the end of their clauses, like English does, they move them to the beginning, in proportion to their (relative) complexity. If your psychological model predicts that all languages should be like English, then you need to go back to the drawing board and look at these different grammars, and at their performance, before you define and test your model further.

Which brings me to today’s topic: What do grammars and typological variation tell us about information density in on-line processing?

Let us define Information as: the set of linguistic forms {F} (phonemes, morphemes, words, etc) and the set of properties {P} (ultimately semantic properties in a semantic representation) that are assigned to them by linguistic convention and in processing.

Let us define Density as: the number of these forms and properties that are assigned at a particular point in processing, i.e. the size of a given {Fi}-{Pi} pairing at point … i … in on-line comprehension or production.

I see evidence for two very general and complementary principles of information density in cross-linguistic patterns.

First, minimize {Fi} minimize the set {Fi} required for the assignment of a particular Pior {Pi} I.e. minimize the number of linguistic forms that need to be processed at each point in order to assign a given morphological, syntactic or semantic property or set of properties to these forms on-line.

The conditions that determine the degree of permissible minimization can be inferred from the patterns themselves and essentially involve efficiency and ease of processing in the assignment of {Pi} to {Fi}.

Examples will be given from morphological hierarchies and from syntactic patterns such as word order and filler-gap dependencies.

Second, maximize {Pi} maximize the set {Pi} that can be assigned to a particular Fior {Fi}. I.e. select and arrange linguistic forms so that as many as possible of their (correct) syntactic and semantic properties can be assigned to them at each point in on-line processing.

A set of linear ordering universals will be presented in which category A is systematically preferred before B regardless of language type, i.e. A + B. Positioning B first would always result in incomplete or incorrect assignments of properties to B on-line, whereas positioning it after A permits the full assignment of properties to B at the time it is processed. These universals provide systematic evidence for maximize {Pi}.

Consider first some grammatical patterns from morphology that support the minimize {Fi} principle minimize the set {Fi} required for the assignment of a particular Pi or {Pi}

In Hawkins (2004) I formulated the following principle of form minimization based on parallel data from cross-linguistic variation and language-internal selection patterns.

Minimize Forms (MiF) The human processor prefers to minimize the formal complexity of each linguistic form F (its phoneme, morpheme, word or phrasal units) and the number of forms with unique conventionalized property assignments, thereby assigning more properties to fewer forms. These minimizations apply in proportion to the ease with which a given property P can be assigned in processing to a given F.

The basic premise of MiF is that the processing of linguistic forms and their conventionalized property assignments requires effort. Minimizing the forms required for property assignments is efficient since it reduces that effort by fine-tuning it to information that is already active in processing through accessibility, high frequency, and inferencing strategies of various kinds.

MiF is visible in two sets of variation data across and within languages. The first involves complexity differences between surface forms (morphology and syntax), with preferences for minimal expression (e.g. zero morphemes) in proportion to their frequency of occurrence and hence ease of processing through degree of expectedness (cf. Levy 2008, Jaeger 2006).

E.g. singular number for nouns is much more frequent than plural, absolutive case is more frequent than ergative. Correspondingly singularity on nouns is expressed by shorter or equal morphemes, often zero (cf. English cat vs. cat-s), almost never by more. Similarly for absolutive and ergative case marking.

A second data pattern captured in MiF involves the number and nature of lexical and grammatical distinctions that languages conventionalize. The preferences are again in proportion to their efficiency, including frequency of use.

There are preferred lexicalization patterns across languages. Certain grammatical distinctions are cross-linguistically preferred: certain numbers on nouns certain tenses aspects causativity some basic speech act types thematical roles like Agent, Patient etc

The result is numerous ‘hierarchies’ of lexical and grammatical patterns E.g. the famous color term hierarchy of Berlin & Kay (1969), and the Greenbergian morphological hierarchies

Where we have comparative performance and grammatical data for these hierarchies it is very clear that the grammatical rankings (e.g. Singular > Plural) correspond to a frequency/ease of processing ranking, with higher positions receiving less or equal formal marking and more or equal unique forms for the expression of that category alone.

Form Minimization Prediction 1 The formal complexity of each F is reduced in proportion to the frequency of that F and/or the processing ease of assigning a given P to a reduced F (e.g. to zero).

The cross-linguistic effects of this can be seen in the following Greenbergian (1966) morphological hierarchies (with reformulations and revisions by the authors shown):

Sing > Plur > Dual > Trial/Paucal (for number) [Greenberg 1966, Croft 2003] Nom/Abs > Acc/Erg > Dat > Other (for case marking) [Primus 1999] Masc,Fem > Neut (for gender) [Hawkins 2004] Positive > Comparative > Superlative [Greenberg 1966]

Greenberg pointed out that these grammatical hierarchies define performance frequency rankings for the relevant properties in each domain. The frequencies of number inflections on nouns in a corpus of Sanskrit, for example, were: Singular = 70.3%; Plural = 25.1%; Dual = 4.6%

By MiF Prediction 1 we therefore expect: For each hierarchy H the amount of formal marking (i.e. phonological and morphological complexity) will be greater or equal down each hierarchy position.

E.g. in (Austronesian) Manam: 3rd Singular suffix on nouns = 0 3rd Plural suffix = -di, 3rd Dual suffix = -di-a-ru 3rd Paucal = -di-a-to (Lichtenberk 1983) The amount of formal marking increases from singular to plural, and from plural to dual, and is equal from dual to paucal, in accordance with the hierarchy prediction.

Form Minimization Prediction 2 The number of unique F:P pairings in a language is reduced by grammaticalizing or lexicalizing a given F:P in proportion to the frequency and preferred expressiveness of that P in performance.

In the lexicon the property associated with teacheris frequently used in performance, that of teacher who is late for classmuch less so. The event of X hitting Yis frequently selected, that of X hitting Y with X’s right handless so. The more frequently selected properties are conventionalized in single lexemes or unique categories and constructions. Less frequently used properties must then be expressed through word and phrase combinations and their meanings must be derived by semantic composition.

This makes the expression of more frequently used meanings shorter, that of less frequently used meanings longer, and this pattern matches the first pattern of less versus more complexity in the surface forms themselves correlating with relative frequency. Both patterns make utterances shorter and the communication of meanings more efficient overall, which is why I have collapsed them both into one common Minimize Forms principle.

By MiF Prediction 2 we expect: For each hierarchy H (A > B > C) if a language assigns at least one morpheme uniquely to C, then it assigns at least one uniquely to B; if it assigns at least one uniquely to B, it does so to A.

E.g.adistinct Dual implies a distinct Plural and Singular in the grammar of Sanskrit. A distinct Dative implies a distinct Accusative and Nominative in the case grammar of Latin and German (or a distinct Ergative and Absolutive in Basque, cf. Primus 1999).

A unique number or case assignment low in the hierarchy implies unique and differentiated numbers and cases in all higher positions.

I.e. grammars prioritize categories for unique formal expression in each of these areas in proportion to their relative frequency and preferred expressiveness. This results in these hierarchies for conventionalized categories whereby languages with fewer categories match the performance frequency rankings of languages with many.

By MiF Prediction 2 we also expect: For each hierarchy H any combinatorial features that partition references to a given position on H will result in fewer or equal morphological distinctions down each lower position of H.

E.g. when gender features combine with and partition number, unique gender-distinctive pronouns often exist for the singular and not for the plural English he/she/it vs they the reverse uniqueness is not found (i.e. with a gender-distinctive plural, but gender-neutral singular).

More generallyMiF Prediction 2 leads to a general principle of cross-linguistic morphology: Morphologization A morphological distinction will be grammaticalized in proportion to the performance frequency with which it can uniquely identify a given subset of entities {E} in a grammatical and/or semantic domain D.

This enables us to make sense of ‘markedness reversals’. E.g. in certain nouns in Welsh whose referents are much more frequently plural than singular, like ‘leaves’ and ‘beans’, it is the singular form that is morphologically more complex than the plural: deilen("leaf") vs. dail ("leaves") ffäen("bean") vs. ffa ("beans") Cf. Haspelmath(2002:244).

All of these data provide support for our minimize {Fi} principle: minimize the set {Fi} required for the assignment of a particular Pi or {Pi} I.e. minimize the number of linguistic forms that need to be processed at each point in order to assign a given morphological, syntactic or semantic property or set of properties to these forms on-line.

Either the surface forms of the morphology are reduced, in proportion to frequency and/or ease of processing. Or lexical and grammatical categories are given priority for unique formal expression, in proportion to frequency and/or preferred expression, resulting in reduced morpheme and word combinations for their expression.

What cross-linguistic variation tells us about information density in on-line processing

What cross-linguistic variation tells us about information density in on-line processing

Presentation Transcript

Starlight and What it Tells Us

Perspectives on Human Linguistic Variation

Morphology: Cross-linguistic variation

Starlight and What it Tells Us

Tracking Linguistic Variation in Historical Corpora

GOD TELLS US ABOUT HIMSELF What God Is Like

Tornado deaths: What the past tells us about the future

What Mythology Tells Us About People

Tornado deaths: What the past tells us about the future

What the research tells us about evaluating teacher effectiveness

Verb Learning and Cross-Linguistic Variation

Morphology: Cross-linguistic variation

Morphology: Cross-linguistic variation in word formation

Linguistic Variation: Speech Communities

WHAT FAMILY CAREGIVING RESEARCH TELLS US ABOUT COLLABORATION

What SNAAP Tells Us About Music Alumni Sally Gaskill

What Research Tells us About Designing Online Content

Morphology: Cross-linguistic variation

What your data tells us

Cross-Linguistic Transfer:

What Current Research Tells us about Incorporating Efficiency Measurement in P4P

4 On Line Tells Very Visible