1 / 39

Towards a Methodology for a Corpus-Based Approach t o Translation Evaluation

The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of linguistic and conceptual . patterns

fell
Download Presentation

Towards a Methodology for a Corpus-Based Approach t o Translation Evaluation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The main advantage of concordancing tools is that they allow translators to see terms in a variety of contexts simultaneously to detect various kinds of linguistic and conceptual .patterns The majority of corpus analysis tools also offer a number of other features, which often combine the data produced by the concordancerand word frequency counts

  2. Towards a Methodology for a Corpus-Based Approach to Translation Evaluation By MojganHeydarali Professor : DR, Behbahani Course : Translation Assessment Azad University of literature & Foreign Languages, Tehran South Branch

  3. Content 1. Translator trainers responsibility 2. Evaluation tools limitations 3. Importance of corpus-based approach 4. Characteristics of corpus-based approach 5.Challenges facing evaluators in academic context 6. Corpora and corpus analysis tools 7.Designing an evaluation corpus a. Comparable Source Corpus b. Quality Corpus c. Quantity Corpus d. Inappropriate Corpus

  4. Translator trainers are responsible for : Grading students’ work and importantlyfeedback, providing useful

  5. In the past translators and trainers worked with resources such as: Dictionaries Printed parallel texts Unverified intuition Subject field experts But they were not always conductive to providing the conceptual and linguisticknowledge necessary to .an objective translation evaluation

  6. What is the importance of corpus-based approach? It removes a great deal of subjectivity : 1 2: Provides improved access to appropriate conceptual and linguistic information of specialized subject field which is documented by experts in that field.

  7. In another word a specially designed evaluation corpus can act as a benchmarkfor comparingstudents translations on a number of different levels

  8. so Translator trainers by having access to wide range of authentic and suitable texts can: Verify or correct both conceptual and linguistic Information and, Provide more constructive feedback based on evidence.

  9. What is a corpus-based approach characteristics? Firstly, It is based on the analysis of a comparatively large and carefully selected collection of naturally occurring texts that are stored in machine-readable form( i. e, a corpus).

  10. Secondly, Because it analyzes actual patterns of language use in the corpus, it is empirical and therefore objective.

  11. Thirdly, :This approach takes advantage of ComputationalTools, Methods for Manipulating the corpus, Arranging the Data, in ways that make it possible to spot items and patterns that would be difficult to identify in other types of resources.

  12. Additionally Computers provide consistent and reliable analysis (i.e., they do not change their minds or get distracted.)

  13. Finally The corpus-based approach combines both Quantitative and Qualitative techniques; A computer is capable of churching out counts of linguistic features, but translator trainer is responsible for exploring and interpreting data in order to learn about patterns of language use.

  14. 1. Challenges Facing Evaluators in an Academic Context A. The main difficulty surrounding translation evaluation is its subjective nature ; the notion of quality has very fuzzy and shifting boundaries. B. Clients who commission translations are not interested educating the translator while trainer has .obligation to help students improve their performance

  15. C. In order to properly preparing students for entering the translation profession, students needs to be exposed to wide range of translation material and text types, but naturally trainers are not expert in all subjects. So specially designed evaluation corpus can help to meet this need.

  16. Corpora and corpus analysis tools Similarity between corpus and conventional parallel texts: In translation context, a suitable corpus might be one containing texts that correspond to the intended skoposof target text. In this way a corpus is similar to the conventional parallel .texts used by many translators

  17. However an electronic corpus is generally much larger and can be processed with the help of computerized tools known as corpus analysis tools.

  18. Most corpus analysis tools contains at least two main features:WordFrequency lists and ConcordancersA. Word Frequency lists, allows users to discover how many different words are in the corpus and how often each appears. DVD 765 * video 126* not 89 * player 80 Is 341 * we 121* said 85 * all 79 Will 208 * have 116 * PC 82* MPEG 81

  19. “I really like translation because I think that translation is really, really interesting.” :Tokens (total word ) = 13 They can be stored in * Alphabetical order * Ascending order * Descending frequency Types(different words) = 9

  20. Words belonging to the same lemma can be counted together or separately,as can words beginning with upper or lower case. Lemmarefer to words which have same stem and belong to the same major word class, differing only by spelling or inflection. Stop lists refer to lists of words to be ignored and can also be used In order to eliminate common function words such as prepositions or conjunctions. Frequency information can be used for helping translators decide which term to use when faced with a number of potential synonyms or translation equivalents.

  21. B. Concordancer A concordancer retrieves all the occurrences of particular search pattern in its immediate contexts and displays these in an easy-to-read format. The most commonly used format is KWIC (key word in context) shows one occurrence of the search pattern per line with the search pattern itself high-lighted in the center of the screen . Atsushita slick, portable DVD player with a color LCD and Ndows explorer, but their movie player software refused to pla Ers with a “record” button. The player will not even have the o three years ,” he says. Such a player would have a display

  22. * The extent of the context on either side of the search pattern is variable, * These contexts can be sorted in a variety of ways such as: a. Order of appearance in the corpus, b. Alphabetically, c. The words preceding or following the search pattern

  23. Concordancers are flexible and allow functionssuch as: *Case-Sensitive VS Non-Case Sensitive searches (Bill ex president of USA & bill ,Polish people of poland & polish) * Wildcard searches( e.g. ‘play’ to retrieve ‘play’, ‘player’, ‘played’, etc.) * Another term must appear within a user- specified distance of search term (e.g. contexts where ‘play’ appear within five words of ‘DVD’ )

  24. The majority of corpus analysis tools also offer a number of other features, which often combine the data produced be the Concordancer and FrequencyCounts. It must be considered: * The value of what comes out of a corpus is largely dependent on what texts are included in it. * Criteria for designing general language corpora have been well- documented in literature ; however, these criteria cannot be adopted wholesale for the design of a special-purpose corpus such as an Evaluation Corpus.

  25. Designing an Evaluation Corpus The evaluation Corpus is the collective name for the collection of texts that is divided into four main sub-corpora: 1. The Comparable Corpus 2. The Quality Corpus 3. The Quantity Corpus 4. The Inappropriate Corpus These sub-corpora differ in content and intended function.

  26. 1. Comparable Source Corpus (CSC) It is optional and depends on factors such as Time, text type, skoposof the target text. CSC contains a selection of SL textsthat are similar to the source text in term of text type, publication date, subject matter.

  27. The purpose of CSC Its purpose is to allow the evaluator to gauge the “normality” of the source text with regard to other source language texts of that type. Normalizationis a feature of translated texts; normalized texts display exaggerated features of the target language and conform to its typical pattern (Baker.1997)

  28. Sanitization: The suspected adaptation of a source text reality to make it more palatable for target audiences.(Kenny) Both Normalization and Sanitization result in deliberately chosen unconventional lexicalor syntactic ST features being changed in translation so that the TT fits in with the conventions of the target language.

  29. Determining inappropriate normalization or sanitization :Evaluators can first: use the CSC as a reference corpus to establish the relative normality of the ST. Second: they can then use Quantity Corpus as reference corpus to establish the relative normality of TT.

  30. If the ST is deemed to be normal ( in vocabulary, register, style, etc.) with reference to texts in the comparable source corpus, then the text should be normal when compared with texts in the Quantity Corpus( and vise versa).

  31. 2. Quality Corpus The Quality Corpus is a high quality sub-corpus consisting of handpicked texts primarily for their conceptual content, It is very small by corpus linguistics standards containing four or five texts with total word 5,000 words

  32. The Quality Corpus is used primarily as a source of conceptual information rather than linguistic information so it is not necessary that all texts to be of the same text type. But it is important to be complete texts (not a sample or extract of the text). At list some of the texts should be current. Using Quality Corpus will help translator trainer become familiar with basic concepts in the field and identify some of the key terms. If the texts are well chosen they can serve as benchmark for evaluating students translation.

  33. 3. Quantity Corpus Why it is not appropriate to rely exclusively on the Quality Corpus ? Firstly Because it is a relatively small collection, There is no real way to know that if selected texts are truly representative of the text type at large. Secondly The texts contained in the Quality Corpus may be “older” texts and a term which was appropriate in the past may no longer be so.

  34. The Quantity Corpus is designed to provide a larger and more representative sample of specialized language in question. External factors such as time and availability of data have influence on the question of how large and . how representative By experience the Quantity Corpora from 20,000 to 200,000 words have proved useful. 20,000 for highly specialized subject field, .200,000 for subject field that are not extremely narrow

  35. It is useful to divide the Quantity Corpus into further sub-corpora, one for each year , this enables translators or evaluators track terminological changes over time. A Corpus analysis tool such as Word Smith allows users to consult multiple corpora at once.

  36. The Quantity Corpus: pros and cons Pros: The Quantity corpus is compiled in semi-automated fashion and can be used by translator trainer to verifyterminological, phraseological, and stylistic appropriateness made by students. Most Corpus analysis software gives users the option of expanding the context to several lines or the complete text. The volume of the data makes it possible to spot pattern more easily, to make generalizations and provide concrete evidence to support decisions.

  37. Cons: Interacting solely with a large electronic corpus Causes loosing sight of the fact that translation is a text-based activity . In corpus analysis the focus is on micro- contexts andthe primary power of corpus analysis remains at a sub- text level. The texts are not readily available in electronic form.

  38. 4. Inappropriate corpus It is a corpus containing “inappropriate” paralleltexts. Its size vary based on the subjects. In well established or with wider interest it would be larger, but it is smaller in very recent subjects. Its purpose is to help translator trainer uncover the mysteries of the unsuitable equivalents in students translation. If a student has used a term which does not appear in Quality and Quality Corpus it can be checked in this corpus.

  39. THE END

More Related