collocations in translated text issues insights implications n.
Skip this Video
Loading SlideShow in 5 Seconds..
Collocations in translated text issues, insights, implications PowerPoint Presentation
Download Presentation
Collocations in translated text issues, insights, implications

Loading in 2 Seconds...

play fullscreen
1 / 34

Collocations in translated text issues, insights, implications - PowerPoint PPT Presentation

  • Uploaded on

Collocations in translated text issues, insights, implications. Silvia Bernardini University of Bologna, Italy Aston Corpus symposium 23 May 2008. Talk outline. Collocations Corpus Linguistics Corpus-Based Translation Studies

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

Collocations in translated text issues, insights, implications

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
collocations in translated text issues insights implications

Collocations in translated textissues, insights, implications

Silvia Bernardini

University of Bologna, Italy

Aston Corpus symposium

23 May 2008

talk outline
Talk outline
  • Collocations
    • Corpus Linguistics
    • Corpus-Based Translation Studies
  • Research questions, methodology, results
    • Fiction
    • Open source software
  • Implications
    • Descriptive and applied
  • Methodological follow up
  • Future work
background collocations in cl
Background: Collocations in CL
  • “Phraseology-oriented” approaches
    • E.g. (Howarth 1996:47)

[Restricted collocations are] combinations in which one component is used in its literal meaning, while the other is used in a specialised sense. The specialised meaning of one element can be figurative, delexical or in some way technical and is an important determinant of limited collocability at the other. These combinations are, however, fully motivated.

background collocations in cl1
Background: Collocations in CL
  • “Parameters” of collocation within phraseology approaches
    • Motivation/arbitrariness
    • Commutability
    • Non-literalness
    • Transparency
    • Unpredictability
background collocations in cl2
Background: Collocations in CL
  • “Frequency-oriented” approaches
    • “Automatisation” is the result of repetition
    • British school of linguistics (Firth)
      • The statistical tendency of words to co-occur (Hunston 2002: 12)
      • “Significant” collocation is regular collocation between items, such that they occur more often than their respective frequencies and the length of the text in which they occur would predict (Jones and Sinclair 1974:19)
searching for collocations in text
Searching for collocations in text
  • “Keyword” method
    • Starting from a (set of) keyword(s) and looking left and right
      • E.g. Sinclair 1998, Stubbs 2001, Danielsson 2001
  • “Sequence” method
    • Selecting all sequences of N words (or lemmas, or POS tags) that recur a certain number of times
      • E.g. Kjellmer 1994, Biber et al. 1999, Johansson 1993
  • MI, t-score, z-score, log-likelihood…
    • P. Baker (2006), McEnery et al (2006)
  • Bare frequency
    • Krenn and Evert (2001)
  • A mixture of both
    • MI * log fq
      • Kilgarriff and Tugwell (2001)
    • frequency-based cut-offs
      • Krenn (2000)
nn in ukwac bare fq top 10
NN in ukWaC (bare fq, top 10)

175642 web site

81127 case study

70514 search engine

66693 application form

65198 credit card

60626 web page

56721 car park

48833 health care

47655 climate change

46643 email address

collocations in cbts applied perspectives
Collocations in CBTSapplied perspectives
  • Bahumaid (2006)
    • Arab university lecturers translating sentences containing collocations (make a noise, domino effect) into English and into Arabic with any reference tools available
      • Less than 50% “correct” answers even when translating into their L1
      • Paraphrase most common strategy (40-48%)
collocations in cbts applied perspectives1
Collocations in CBTSapplied perspectives
  • Hatim and Mason (1997:205)
    • Collocations should in general be neither less unexpected (i.e. more banal) nor more unexpected (i.e. demanding greater processing effort) than in the ST
  • Baker (1992: 56ff)
    • Engrossing effect of source text patterning
    • Tension between accuracy and naturalness
    • The use of established patterns of collocation […] helps to distinguish between a smooth translation, one that reads like an original, and a clumsy translation which sounds ‘foreign’.
issues in descriptive cbts
Issues in descriptive CBTS
  • Translation “norms” or “universals”
    • Corpus research in TS should focus on the identification of “features which typically occur in translated text rather than original utterances and which are not the result of interference from specific linguistic systems”. (Baker 1993:243)
    • E.g.: explicitation/explicitness, simplification, disambiguation, levelling out (homogeneity), preference for conventional grammar, avoidance of repetition, exaggeration of features of the target language, normalisation/sanitisation…
collocations in cbts descriptive perspectives
Collocations in CBTSdescriptive perspectives
  • Anecdotal evidence by Øverås (1998):

ST: Arket i skrivemaskinen var like skinnende nyfødt blankt som da hun satte det inn i valsen for en time siden.

(newborn blank)

TT: The sheet of paper in her typewriter was as pristinely white as when she had inserted it over an hour ago.

  • Confirms Toury’s (1995) hypothesis that translators often produce repertoremes in place of textemes, i.e. they “produce ready-made, cliché structures”.
collocations in cbts descriptive perspectives1
Collocations in CBTSdescriptive perspectives
  • Kenny (2001)
    • Normalisation/sanitisation in the translation of creative lexical combinations
  • Danielsson (2001)
    • Automatic identification of collocations (keyword-based) in ST corpus and analysis of renderings in TT corpus
  • Dayrell (2007)
    • Range of collocations employed in original vs. translated language (monolingual comparable comparison)
    • 10 nouns with frequency >200 and their collocates in a span ±4, fq4, MI4
  • Kenny (2001)
    • Habitual collocations not covered; method not scalable
  • Danielsson (2001)
    • Plagued by data-sparseness
      • Only 2 units of meaning (of the ~12K identified in a large monolingual corpus) occur 5 times in a 800K word parallel corpus
  • Dayrell (2007)
    • Main issue investigated is lexical repetitiveness at the collocational level
    • Selective focus: collocations of frequent words only
    • No cross-check with source texts
    • Uncontrolled variable makes results difficult to interpret
an alternative approach research questions
An alternative approachResearch questions
  • Are translated texts more/less collocational than original texts in the same language
    • i.e., are their collocation types overall more/less frequently attested and/or significant?
  • If so, is this a consequence of the translation process?
    • i.e., can we identify shifts that could account for the observed overall differences?
an alternative approach corpus resources
An alternative approachCorpus resources
  • Literary and specialised texts English/Italian
    • Monolingual comparable corpora (MCC)
      • Originals in Language A and comparable translations into Language A
    • Parallel corpora
      • Originals in Language A and their translations into Language B, usually combined with reference corpora

+ Reference corpora of English (BNC) and Italian (Repubblica)

an alternative approach corpus resources1
An alternative approachCorpus resources
  • Literary texts
    • 8 English STs→ Italian TTs (samples)
    • 7 Italian STs→ English TTs (samples)
    • ~150K words per component
  • Specialised texts
    • Open-source software documentation
      • 10 English STs→ Italian TTs (full texts)
      • 6 Italian originals (full texts) (→ 1 English translation)
    • ~250K words per component
oss texts sampled
OSS texts sampled

S.Frampton Linux administration made easy)

L.Wirzenius The Linux System Administrator’s Guide

M.Cooper the Advanced Bash- Scripting Guide

G.Beekmans Linux from scratch

G. Short 3-button mouse HOWTO

D.Jarvis 3D Graphics Modelling and Rendering mini HOWTO

J.Tranter Linux Amateur Radio AX.25 HOWTO

E.Raymond The DocBook Demystification HOWTO

P.Gortmaker Linux Ethernet HOWTO


A. Madesani IDE e SoundBlaster 32 creative – HOWTO

L. Pulici Adaptec AVA 1505 mini- HOWTO

G. Paolone LDR Linux Domande e Risposte

D. MedriLinux facile

G. Giusti Programmare in PHP

D. GiacominiAppunti di informatica libera

extracting collocations
Extracting collocations
  • Target sequences
    • Lexical collocations
    • Made of two words
    • Contiguous
  • Pos-based extraction from study corpora
      • JN, NN, VN, V * N, N * * N (types)
    • Collection of token frequencies from reference corpora (BNC and Repubblica)
extracting collocations1
Extracting collocations
  • Calculate Mutual Information (MI)
  • Rank sequences
  • Take top
    • Arbitrary cut-off point: MI>2 and fq>1
  • Calculate significance of difference btwn original and translated
    • Mann-Whitney significance tests
mutual information
Mutual Information

MI compares the probability of observing x and y together (the joint probability) with the probabilities of observing x and y independently (chance). If there is a genuine association between x and y, then the joint probability P(x,y) will be much larger than chance […].

(Church & Hanks 1990:77)

p(xy) * N

MI(x;y)= log2 -------------

p(x) * p(y)

mann whitney wilcoxon ranks test
Mann-Whitney-Wilcoxon ranks test
  • Confidence with which we can reject the null hypothesis that two ranked sets of observations are taken from the same population
  • Non-parametric, i.e. makes no assumptions about observations being normally distributed
  • Used (and tested) by Kilgarriff (2001) in comparisons of the LOB and Brown corpora and of male and female speech in the BNC
rankings top 10 for jn eng
Original fiction corpus

MI collocation fq (BNC)

7,0621 Shredded Wheat 9

6,4372 open-toed sandals 5

5,9465 beta carotene 5

5,7365 Milky Way 80

5,5479 barbed wire 193

5,4172 floppy disks 63

5,3891 eternal damnation 14

5,3798 cursive script 18

5,3046 pearl necklace 14

5,2500 herbal teas 7

Rankings (top 10) for JN (eng)

Translated fiction corpus

MI collocation fq (BNC)

6,2687 wall-to-wall carpeting 6

6,1698 vous plait 10

5,6773 pistachio nuts 10

5,3305 boric acid 5

5,2218 submachine gun 9

5,2170 Venetian blinds 16

5,2060 Neapolitan dialect 4

5,1170 nasal twang 2

5,0816 westering sun 4

5,0775 hard-boiled eggs 30

summing up
Summing up
  • Translated fiction texts (Italian and English) tend to be (overall) richer in salient collocations than original texts in the same language
  • Italian (and English) open source software manuals however show the opposite trend…
implications for descriptive ts
Implications for descriptive TS
  • Norm/law-governed (rather than universal) trends (Toury 1995)
    • Law of interference
      • Stronger in OSS translation
    • Law of growingstandardization
      • Stronger in fiction translation
implications for applied ts
Implications for applied TS
  • Parallel comparison (not discussed here) highlights strategies displayed by professional translators at the collocational level
  • Starting point for awareness-raising and revision exercises focusing on:
    • Normalization
    • Rise in formality
    • Explicitation
methodological follow up
Methodological follow up
  • Crucial role played by reference corpora
  • What happens if we repeat the calculations with MI data from different reference corpora?
adjective noun italian oss texts
Adjective-Noun (Italian OSS texts)
  • Repubblica (fq>1 and MI>2)
  • itWaC (fq>10 and MI>1)
noun prep conj noun italian fiction texts
Noun – prep|conj - Noun (Italian fiction texts)
  • Repubblica (fq>1 and MI>2)
  • itWaC (fq>10 and MI>1)
further work
Further work
  • Bottom-up search for regularities
    • Other genres?
  • Source-oriented approach
    • Starting from ST collocations
  • Collocation extraction and reference corpora
    • Evaluation of method
  • Search for creative exploitation of collocations
    • Can it be automatised?
thank you

Thank you

Silvia Bernardini

University of Bologna, Italy

Aston Corpus symposium

23 May 2008