A CORPUS LINGUISTIC STUDY OF ELLIPSIS AS A COHESIVE DEVICE Katrin Menzel Institute of Applied Linguistics, Translation and Interpreting, Saarland University Corpus Linguistics Conference – 27 June 2013, St. Petersburg
GECCoproject http://www.gecco.uni-saarland.de/ GECCo/Home.html
GECCo project GECCo: German-English Contrasts in Cohesion • supportedby DFG 1st phase 2011-2013 2nd phase 2013-2016 • Project Team: MarilisaAmoia Kerstin Kunz Ekaterina Lapshinova Katrin Menzel Erich Steiner
Main research questions Which systemic resources of cohesion are instantiated in English and German texts in different registers/genres? How frequent are they? Which cohesive meanings do they express?
Research goals • analysecohesiveresourcesprovidedby the language systems and instantiations in texts • explore contrasts in form, frequency, function andmeaningrelationsacross and between languages, registers and production types
Motivation Filling major research gaps: Comprehensive accounts of cohesion: only existent from a monolingual perspective (e.g. Halliday & Hasan 1976) empirical monolingual or contrastive analyses on text and discourse level mainly deal with individual phenomena
CORPUS RESOURCES procedures to extract cohesive phenomena require compilation, annotation and exploitation of GECCoCorpus (written and spoken texts) assumption: no clear dividing line but a continuum fromwrittentospoken
written part of GECCo is a translation corpus and consists of various genres (popular-scientific, fictional and tourism text, prepared speeches, political essays, corporal communication, instruction manuals, websites) of English and German original texts that are aligned with their translation
spoken part of the corpus is comparable corpus of English and German original texts (interviews, academic lectures, web-forum, talkshows…)
Present Study: ellipsis types: nominal, verbal, clausal (cf. Halliday&Hasan, 1976) across: - languages: English vs. German - registers: different text types - production types: originals vs. translations
Research goals describing ellipsis from a cross-linguistic viewpoint in English and German enhancing corpus linguistic methods to cover a comprehensive variety of ellipses in different registers of spoken and written language in a bilingual corpus (GECCo) of about 1 million words
Defining cohesive ellipsis Ellipsis as a cohesive device is the omission of an element normally required by the grammar that can be recovered by the linguistic context. Halliday/Hasan: nominal, verbal, clausal ellipsis Examples: There are two approaches to problem-solving: the empirical [ ] and the rational [ ]. I want to help you, but I can’t [ ]. Whatisthecapitalofthe Philippines? – Manila [ ].
Ellipsis as a cohesive device • cohesive ellipsis vs. other types of ellipsis and fragments (e.g. headlines, exophoric ellipsis without textual antecedent, lexicalised ellipsis) • missing information must be supplied fromthe surrounding co-text (usually anaphorically)
Some difference between English and German e.g. nominal ellipsis: ellipsis remnant has to show strong morphological agreement in order to license the elided noun in German ein grünes [Haus], keine [Häuser], keins [?] in a few cases, this also happens in English (mine, none…)
Verbal ellipsis in English and German lack of correspondence between English and German verbal system more differences between E/G than with regard to nominal ellipsis e.g. inclusive imperative: Let’s [go]. / Let’s not. (does not exist in German: *Lass uns!) English examples in GECCo: many subtypes of verbal ellipsis with varyingdegree of complexity German: mainly ellipses of modal verb complement (Er muss [ ])
Clausal ellipsis in English and German • Differences G/E: case • Von wem wurde der Junge untersucht? – • (Von) EinerPsychologin. • * EinePsychologin. • Who was the boy examined by? – • A psychologist. • Sluicing: • Er will jemandem schmeicheln, aber sie wissen nicht wem [ ] • He wants to flatter someone, but they don't know who [ ].
Practical Issues Annotating / querying ellipsis in corpora
Manual annotation with MMAX2 to compare with automatic annotation http://www.h-its.org/english/research/nlp/download/mmax.php Pointer relation can be used to link a bridging expression to its bridging antecedent.
CQP queries: to query empty elements we have to find syntactically incomplete or deficient structures German: Stuttgart-Tübingen-TagSet STTS, English: Penn Treebank tagset
Queryingcorpuswith CQP(German: Stuttgart-Tübingen-TagSet STTS, in English: Penn Treebank tagset)
Sample CQP queries GO: • [pos='adja'][pos='vafin']; (adjective + finite verb); • [pos='art'] [pos='adja'][pos!='nn|ne']; (article + adjective, not followedby noun/proper noun) • EO/ETRANS (different tagset) : [pos='jj'][pos='vv.*']; (adj. + verb)
some manual correction necessary difficulty for tagger: in English, many ellipsis remnants have multiple word class membership
pronouns (e.g. "other": det/adj/pron), words ending in -ing: the second being very... - to know whether being is a verb or a noun context has to be taken into account as tagging is sometimes wrong and leads to irrelevant examples in query results)
e.g. "one": number/pronoun/det/adj/ noun - sometimes used with nominal ellipsis, sometimes nominal substitution): the green one (= nominal subsitution) we saw one [lion] (=nominal ellipsis)
sometimes ellipsis remnants are zero derivations (especially in English this additionally contributes to word class ambiguity for taggers, e.g. N/V: salt, ship, Adj./N: modal)
- some nominalised elements (tagged as adj. / numerals), which often refer to people or abstract concepts + lexicalised / context-free ellipsis also have to be sorted out manually: - the immoral, the rich, - the elderly, a 1 year old - the Fantastic Four - the big two [?] (referring to Oxford and Cambridge university, lexicalised?) - lexicalised idiomatic ellipsis: eine [ ] rauchen
normalized frequencies of typical ellipsis subtypes per 100.000 words in 4 German & English registers of GECCo
Spoken registers EO/GO GECCo: Redundant elements were inserted - instead of elided -, words were repeated, even in an ungrammatical way to remind the hearer of items that were mentioned earlier in the text. - Da machen wir etwas was es absolut verrückt ist. - Ich war bis 1975 war ich in Stuttgart (GO Interview) - For me it’s important is identifying where you come from. (EO Interview)
Translation as a cause of linguistic change with regard to cohesive devices
cohesive devices, especially ellipsis and substitution, are particular elements where translations involve specific shifts and some kind of ‘fingerprints’ (Gellerstam 2005) or ‘shining through’ (Teich 2003) from the source language into the target language
‘shining through’ (Teich 2003) from source language into target language: empirically identifiable traces of source language interference in terms of proportional frequencies of constructions that have the potential to spread from translated to non-translated target language texts
- translation-induced language change is subtle and often overlooked, but in recent years, some interesting studies have demonstrated the significance of translation as a site of language contact (e.g. House 2006) - lexical and orthographic level is probably affected most frequently as words are sometimes borrowed through translation
- source language interference with regard to syntactic or discourse-structural patterns, such as the use of cohesive devices, is more complex and less easily perceptible without a quantitative analysis of proportional frequencies in larger text corpora - using translation and parallel text corpora, House (2011) for instance has demonstrated that textual norms in German are adapted to anglophone ones
analysis of GECCo corpus indicates that, compared to English originals, English translations of German texts include a higher frequency of nominal ellipsis after adjectives where we would normally expect for example ‘one/s’, ‘of them’, a general or a specific noun: (1) ein Denken …, das strenger ist als das begriffliche [ ] translation: a thinking more rigorous than the conceptual [ ] (2) Der größte und schönste [ ] ist der Naschmarkt. translation: The largest and most impressive [ ] is Naschmarkt.
On the other hand, translations into English seem to have a higher frequency of 'one' as a substitute where it is not obligatory (e.g. after 'next', 'second', 'another', 'which').
translators often insert ‘tun‘ in the case of English lexical verb ellipsis or use it as a direct translation of ‘do’ If we do not, no one else will [ ]. translation in corpus: Wenn wir es nicht tun, wird niemand es tun. just as Ukraine and South Africa had done and as Libya is doing today translation: so wie es die Ukraine und Südafrika getan haben, und wie Libyen es heute tut
corpus extraction results show that number of hits of lemma ‘tun’ is much higher in German translations (41 / 100.000 words) than in German originals (29 / 100.000 words)
translations contribute to semantic bleaching of this verb (writers of German original texts usually tend to avoid the verb ‘tun’ as a substitute for a main verb for stylistic reasons)
depending on various factors such as standardization of the language and genre and amount and prestige of translated texts, language specific structures and innovations may spread from translated to non-translated target language texts
References: Evert, S. 2005. The CQP Query Language Tutorial. IMS, Universität Stuttgart. Gellerstam, M. 2005. Fingerprints in Translation, In: In and out of English: For Better, for Worse, ed. by G. Anderman and M. Rogers, Clevedon: Multilingual Matters, pp. 201-13. Halliday, Michael. A.K. and Ruqaiya Hasan. 1976. Cohesion in English. London: Longman. House, J. 2006. Covert Translation, Language Contact, Variation and Change. In: SYNAPS 19. 25-47. House, J. 2011. Using translation and parallel text corpora to investigate the influence of Global English on text norms in other languages. In: A. Kruger et al eds. Corpus-based Translation Studies. London: continuum.
Kunz, K. & Lapshinova-Koltunski, E. 2011, Tools to Analyse German-English Contrasts in Cohesion. In proceedings of GSCL-2011, Hamburg, Germany. Neumann, S. & S. Hansen-Schirra. The CroCo Project. Cross-linguistic corpora for the investigation of explicitation in translations. In Proceedings from the Corpus Linguistics Conference Series (PCLC), 2005. Vol. 1 no. 1, Steiner, E. 2008. Empirical studies of translations as a mode of language contact - “explicitness” of lexicogrammatical encoding as a relevant dimension. In: Siemund, P. & N. Kintana (eds.). Language contact and contact languages. Amsterdam: John Benjamins (Hamburg Studies in Multilingualism Vol. 7). pp. 317-346. Teich, Elke. 2003. Cross-Linguistic Variation in System and Text: A Methodology for the Investigation of Translations and Comparable Texts. Berlin: Mouton de Gruyter.
Спасибо за внимание! У вас есть вопросы? Do you have any questions? Comments? Katrin Menzel firstname.lastname@example.org