collocational properties of translated language n.
Skip this Video
Loading SlideShow in 5 Seconds..
Collocational properties of translated language PowerPoint Presentation
Download Presentation
Collocational properties of translated language

Loading in 2 Seconds...

play fullscreen
1 / 51

Collocational properties of translated language - PowerPoint PPT Presentation

  • Uploaded on

Collocational properties of translated language. Silvia Bernardini School for Translators and Interpreters University of Bologna at Forlì 30 July 07 Collocations Brief overview Frequency vs. Phraseology A note on statistics Translation studies Theory

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Collocational properties of translated language' - ostinmannual

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
collocational properties of translated language

Collocational properties of translated language

Silvia Bernardini

School for Translators and Interpreters

University of Bologna at Forlì

30 July 07


Brief overview

Frequency vs. Phraseology

A note on statistics

Translation studies





Current study






Ways forward

what is a collocation
What is a collocation?
  • “[] I would like to put forward the concept of collocation which I have introduced in my own work. This is the study of key-words, pivotal words, leading words, by presenting them in the company they usually keep – that is to say, an element of their meaning is indicated when their habitual word accompaniments are shown.” (Firth 1956:106-107)
  • E.g.: English: the English people, English literature, English reserve, English manners, English countryside, the English and all that can be said about them, the English public schools, English Universities (!)


frequency oriented views
Frequency-oriented views
  • “Significant” collocation is regular collocation between items, such that they occur more often than their respective frequencies and the length of the text in which they occur would predict (Jones and Sinclair 1974:19)
  • A collocation is a sequence of words that occurs more than once in identical form and is grammatically well-structured

(Kjellmer 1987: 133)


phraseology oriented views
Phraseology-oriented views
  • Restricted collocations are fully institutionalised phrases, memorized as wholes and used as conventional form-meaning pairings(Howarth 1996: 37)


frequency vs phraseology
Sum of many occurrences in texts

Position important

Number of words involved important

Syntactic relationship can be important

Frequency/statistics important

An abstract entity with instantiations in texts (PERFORM + TASK)

Position/number of constituents not central;

Different restrictions distinguished (DOG+BARK not a collocation)

Main criterion: semantic unpredictability

Frequency vs. Phraseology


2 ways of finding collocations
2 ways of finding collocations
  • Starting from a (set of) keyword(s) and looking left and right
    • Gledhill (2000): phraseology surrounding “keywords” in different sections of cancer research articles
  • Selecting all sequences of N words that recur a certain number of times
    • Kjellmer (1994): All two-word sequences appearing more than two times in the Brown corpus


a note on statistics
A note on statistics
  • Frequency (Danielsson 2001)
  • Statistics: pointwise Mutual Information (MI)
    • Compares the probability of observing x and y together (the joint probability) with the probabilities of observing x and y independently (chance).

(Church and Hanks 1990: 77)

    • Formula

p(xy) * N

MI(x;y)= log2 -------------

p(x) * p(y)

  • Limits of MI


corpus based ts
Corpus-based TS
  • Theoretical background
  • Methodological background
  • Studies of collocation within TS
  • Limits
theoretical background 1
Theoretical background 1

Baker (1993: 243)

The most important task that awaits the application of corpus techniques in translation studies […] is the elucidation of the nature of translated text as a mediated communicative event.

Corpus-based Translation Studies

theoretical background 2
Theoretical background 2

Toury (1995)

Translation as norm-governed behaviour:

‘translatorship’ amounts first and foremost to being able to play a social role, i.e. to fulfil a function allotted by a community […] in a way which is deemed appropriate in its own terms of reference (ibid.: 53)

Corpus-based Translation Studies

operationalising it
Operationalising it
  • Studies should be carried out focusing on the nature of translational norms as compared to those governing non-translational kinds of text production (Toury 1995: 61).
  • Corpus research in TS should focus on the identification of universal features of translation, that is features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems. (Baker 1993:243).

Corpus-based Translation Studies

  • Explicitation/explicitness
  • Simplification
  • Disambiguation
  • Levelling out (homogeneity)
  • Preference for conventional grammar
  • Avoidance of repetition
  • Exaggeration of features of the target language
  • Normalisation/sanitisation
  • Absence of TL-specific “unique items”
  • “Shining-through”

Corpus-based Translation Studies

methodological background
Methodological background
  • Monolingual comparable corpora
    • Originals in Language A and comparable translations into Language A
    • They should make visible “patterning which is specific to translated texts, irrespective of the source or target languages involved” (Baker 1995: 234).
  • Parallel corpora
    • Originals in Language A and their translations into Language B, usually combined with reference corpora

Corpus-based Translation Studies

ts research on collocation
TS: research on collocation

Olohan (2004): Collocation and moderation

  • Quite, rather, pretty and fairly in translated vs. original English fiction
  • Pretty and rather, and more marginally quite, “are used a lot less in [TEC-Fiction] but, when they are, there is usually more variation in usage than in [BNC-fiction] and less repetition of common collocates”

Corpus-based Translation Studies

ts research on collocation1
TS: research on collocation

Øverås (1998): Collocation and explicitation

  • First 50 sentences of 40 novel extracts (English + Norwegian)
  • Additions enriching the text with a common target language collocation

ST: Det var en blanding av vill dristighet og en frøkenaktig, fornem finhet i hans slekt.

(a mixture)

TT: There was a strange mixture of wild boldness and dignified gentility in the family.

  • A collocational clash in the ST is rendered with a conventional TL combination

ST: the cook's fat son would play plump tunes on his accordion.

TT: kokkens fete sønn spille trivelige melodier på trekkspillet sitt.

(pleasant tunes)

Corpus-based Translation Studies

ts research on collocation2
TS: research on collocation

Kenny (2001): Collocation and sanitisation

  • Three-way comparison: a parallel corpus (English/German) and reference corpora of SL/TL
  • Treatment of lexical creativity in translation
  • Starting points: collocation hapaxes and clusters that are repeated in the work of a single author but not attested in any other texts

Augen ~ trinken

ich trinke mit gierigen Augen

(literally: I drink with greedy eyes)

translated as: “my avid eyes drank in…”

Corpus-based Translation Studies

ts research on collocation3
TS: research on collocation

Baroni and Bernardini (2003): Collocation in MCC

  • Monolingual comparable corpus of Italian original and translated articles from a single geopolitics journal.
  • All bigrams from the translated sub-corpus and from the original sub-corpus
  • Ranked according to their log-likelihood ratio value
  • “Translated language is repetitive, possibly more repetitive than original language. Yet the two differ in what they tend to repeat: translations show a tendency to repeat structural patterns and strongly topic-dependent sequences, whereas originals show a higher incidence of topic-independent sequences, i.e. the more usual lexicalised collocations in the language”

Corpus-based Translation Studies

ts research on collocation4
TS: research on collocation

Danielsson (2001): Collocation: monolingual & translational

  • Units of meaning in two large corpora of English and Swedish
  • Words occurring ≥200 times
  • Collocates (≥5)

plugs sockets (6 occurrences)

headphone sockets (7 occurrences)

sunken sockets (6 occurrences)

bulging their sockets (5 occurrences)

  • Data-sparseness problem: only 2 units of meaning (of the 12,099 previously identified) occur five times or more in the ST component of the parallel fiction corpus (Swedish into English, ~400,000 words per component)

Corpus-based Translation Studies

  • General limits of MCC
    • Variables
    • Tools and methods: too crude?
    • Excessive downplay of the source text
    • Over-generalisation of translation universals
  • Specific difficulty of collocational studies
    • Data-sparseness in relatively small corpora

Corpus-based Translation Studies

collocations a new approach
Collocations: a new approach
  • Aim and method
  • Results (monolingual and parallel)
  • Discussion, limits, ways forward
research questions
Research questions
  • Are translated texts more/less collocational than original texts in the same language
    • i.e., are their collocation types overall more/less frequently attested and significant?
  • If so, is this a consequence of the translation process?
    • i.e., can we identify shifts that could account for the observed overall differences?

Aim and method

  • The point is not to look for collocations that repeat themselves frequently within small and hardly comparable “translation-driven” corpora, but to identify those collocations that are frequent and/or significant in the language as a whole.

Aim and method

2 sets of corpus resources
2 sets of corpus resources
  • Study corpora
    • Small monolingual comparable corpora of fiction texts (English => Italian; Italian => English)
  • Reference corpora
    • The British National Corpus
      • (100 million words from a variety of sources)
    • The Repubblica Corpus
      • (340 million words from a single Italian newspaper)
    • The English and Italian Web via Google/Yahoo automatic API queries

Aim and method

study corpora fiction
M. Atwood/C. Penati

Il racconto dell’ancella

M. Atwood/M. Papi

Occhio di gatto

M. Cruz Smith/P. F. Paolini

Gorky Park

C. Fowler/S. Bini

Nozze di sangue

N. Gordimer/F. Cavagnoli

Storia di mio figlio

G. Greene/B. Oddera

Il decimo uomo

D. Leavitt/A. Cossiga

Un luogo dove non sono mai stato

R. Rendell/H. Brinis

Oltre il cancello

F. Camon

La malattia chiamata uomo

G. Celati

I narratori delle pianure

C. Comencini

Le pagine strappate

L. Blissett


D. Maraini

Donna in guerra

G. Pontiggia

Il giocatore invisibile

G. Tomasi di Lampedusa

Il Gattopardo

Study corpora (fiction)

Aim and method

corpus preparation
Corpus preparation
  • Scanning in
  • Tokenisation
  • Tagging (part-of-speech)
  • Lemmatisation
    • treetagger
  • Metadata annotation
  • Alignment (easyalign)
  • Indexing (CorpusWorkBench, CWB)

Aim and method

extraction of candidates 1
Extraction of candidates 1
  • Target sequences
    • Lexical collocations
    • Made of two words
    • Contiguous
  • Pos-based extraction from study corpora
    • Based on literature, e.g.
      • JN, NN, VN, V * N, N * * N
  • Collection of frequencies from reference corpora

Aim and method

extraction of candidates 2
Extraction of candidates 2
  • Calculate MI
    • UCS (Evert 2004-2006)
  • Rank sequences
  • Take top
    • Arbitrary cut-off point: MI>2 and fq2
  • Calculate significance of difference btw original and translated
    • Mann-Whitney significance tests

Aim and method

results mcc mann whitney
Results (MCC, Mann-Whitney)
  • J N lit eng (MI; higher in original, p=.08)
  • N V lit ita (MI; p=.008)
  • N V lit eng (FQ; p=.05)
  • V N lit ita (MI; p=.01)
  • J * J lit ita (MI; p=.06)
  • N prep/conj N lit ita (MI; p=.007)
  • N * N lit eng (FQ; p=.06)
  • N * * N lit ita (FQ; p=.07)


results parallel summary
Results (parallel, summary)

Shifts leading to increased “collocativeness”


creative collocational 7
Creative => collocational (7)

TT: Ricordo l’odore della terra smossa, il <senso di pienezza> che davano le forme tonde dei bulbi chiusi nella mano

LIT: I remember the smell of the turned earth, the sense of fullness that gave the round shapes of the bulbs held in the hand

ST: I can remember the smell of the turned earth, the plump shapes of bulbs held in the hands, fullness

The handmaid’s tale

TT: Il <rumore dei tacchi> risuonò sulle piastrelle del corridoio.

LIT: the noise of the heels resounded on the tiles of the corridor

ST: Her heelsclicked on the hall tiles.

Red bride


different meaning 7
Different meaning (7)

TT: Fa collezione di <cartine di sigarette> con disegni di aeroplani, e ne conosce tutti i nomi.

ST: He collects cigarette cards with pictures of airplanes on them, and knows the names of all the planes. Cat’s eye

free collocational 11
Free => collocational (11)

ST: handpainted by Alex with purple garlic bulbs, she sees that

A place I’ve never been

TT: decorazioni di <spicchi d' aglio>, si rende conto che

Web data


explicitation 86 general
Explicitation (86) - general

TT: All'apertura nel basso <muro di cinta> l'autista esitò, poi accelerò

LIT: At the opening in the low perimeter wall the driver hesitated, then accelerated

ST: He hesitated at the gap in the low wall, then accelerated and went ahead

A place I’ve never been

TT: schiacciato sotto il <tacco della scarpa>, seppellito

LIT: ground away under the heel of the shoe, buried

ST: ground away under my heel, buried

My son’s story


explicitation 86 partitives
Explicitation (86) - partitives

TT: Non riuscivo a prendere sonno, così sono sceso a bere un <sorso d'acqua>

LIT: I couldn’t sleep, so I came down to drink a gulp of water

ST: I couldn't sleep, so I came down for water

The tenth man

TT: i <raggi del sole> filtrano dalla lunetta sulla porta

LIT: the rays of the sun filter through the fanlight

ST: Sun comes through the fanlight

The handmaid’s tale


explicitation 86 head nouns
Explicitation (86) - head nouns

TT: manifesti di Bon Jovi e dei Guns' n Roses attaccati con le <puntine da disegno> sul grande mare della parete

ST: Bon Jovi and Guns' n Roses posters thumbtacked into the great sea wall

A place I’ve never been

TT: Osserviamo il <cerume delle orecchie>, il muco del naso e lo sporco tra le dita dei piedi

ST: We look at ear-wax, or snot, or dirt from our toes

Cat’s eye


more formal more exact 16
More formal/more exact (16)

TT: Spostando col piede i <capi di vestiario> sul pavimento, non trovò traccia della prova incriminante.

LIT: items of clothing

ST: Kicking around among the clothes on the floor, he found no trace of the incriminating article.

Red bride

TT: Si stava frugando tra le <pieghe dell'abito>, per prendere il lasciapassare

LIT: folds of the robe

ST: She was fumbling in her robe, for her pass

The handmaid’s tale


other cases 9
Other cases (9)
  • Adverbs

TT: Dal <punto di vista> domestico, si adattarono l' uno all' altra

ST: Domestically they adjusted to one another My son’s story

  • Domestication

TT: Il cadavere era stato fatto a fettine da una lama larga e pesante, non trovata sul <luogo del delitto>

ST: The corpse had been slashed to ribbons by a large, heavy blade, not found on the premises. Red bride

  • Gratuitous changes

TT: del greco c'era anche qualche tavolino con sudici <vasetti di fiori> artificiali e bottiglie di ketchup

ST: the Greek had a few tables set out with flyspotted artificial flowers and tomato sauce bottles My son’s story


discussion mcc
Discussion - MCC
  • Are Italian translated texts more/less collocational than originals?
    • Translated texts would seem to be more collocational than originals
    • A single exception: JN into Eng
      • Translated less collocational than original, why?
        • Probable shining-through
        • Over-representation of collocations with common words

Discussion, limits, ways forward

jn in eng shining through
JN in Eng: shining-through?

Delicate fingers

TT: I put some soft golden apricots as big as eggs on his plate, and watch him split them open, hardly moving his long, <delicate fingers>.

ST: Gli ho messo nel piatto delle albicocche grandi come uova, morbide, dorate, e l'ho osservato mentre le spaccava, muovendo appena le dita lunghe e delicate.

Donna in guerra

Collocation fq1 fq2 fq1-2 MI LL

delicate fingers 1646 5346 5 2.7545 53.4624

gentle fingers 2477 5346 12 2.9572 139.5338

slender fingers 701 5346 15 3.6023 219.2139

nimble fingers 101 5346 15 4.4437 279.3528

Discussion, limits, ways forward

jn in eng common words
JN in Eng: common words

Overall frequency of few: translated 133, original 39

discussion parallel
Discussion - parallel
  • Is higher collocativeness a consequence of the translation process?
    • Probably…
  • NB: shifts towards higher collocativeness would appear to be
    • partly independent
      • free=> collocational, different meaning (normalisation)
    • partly related to other strategies/procedures
      • explicitation, shining-through

Discussion, limits, ways forward

  • Just how certain are we that these shifts are the cause of the observed differences?
    • Shifts are no doubt observable also in non-significant rankings…
  • (To what extent) could single author or translator preferences account for these differences?
    • The corpora are very small…

Discussion, limits, ways forward

further work
Further work
  • Bottom-up search for regularities
    • Other genres?
  • Source-oriented approach
    • Starting from ST collocations
  • Role of reference corpora
    • BNC, WWW, ukwac / Repubblica, WWW, itwac
  • Collocation extraction
    • Evaluation of method: no hands!
  • Creative exploitation of collocations
    • Can it be automatised?

Discussion, limits, ways forward

Thank you


Baker, M. 1993. “Corpus linguistics and translation studies” In Baker et al. (eds) Text and Technology. Benjamins.

Baker, M. 1995. “Corpora in translation studies: An overview and some suggestions for future research”. Target 7, 2.

Baroni, M. and S. Bernardini. 2003. “A preliminary analysis of collocational differences in monolingual comparable corpora”. In Archer et al. (eds), Proceedings of CL 2003. UCREL.

Danielsson P. 2001. The Automatic identification of meaningful units in language. PhD Thesis. Göteborg University.

Evert, S. 2004-2006. The UCS Toolkit []

Firth, J.R. 1956 (1968). “Descriptive linguistics and the study of English”. in Palmer (ed) Selected papers of J.R. Firth1952-1959. Longman.

Gledhill, C. 2000. Collocations in science writing. Gunter Narr.

Howarth, P. 1996. Phraseology in English academic writing. Max Niemeyer.

Kenny, D. 2001. Lexis and creativity in translation. St. Jerome.

Kjellmer, G. 1987. “Aspects of English collocations”. In Meijs (ed) Corpus Linguistics and Beyond. Rodopi.

Kjellmer, G. 1994. A Dictionary of English collocations. Clarendon Press.

Olohan, M. 2004. Introducing corpora in translation studies. Routledge.

Øverås, L. 1998. “In search of the third code: An investigation of norms in literary translation”. Meta 43, 4.

Sinclair, J. McH. 1991. Corpus, concordance, collocation. Oxford University Press.

Sinclair, J. McH. and S. Jones 1974. “English lexical collocations”. Cahiers de Lexicologie 24.

Toury, G. 1995. Descriptive translation studies and beyond. Benjamins.