The link between controlled language and post editing
1 / 36

The Link between Controlled Language and Post-Editing: - PowerPoint PPT Presentation

  • Uploaded on

The Link between Controlled Language and Post-Editing:. An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon O’Brien, CTTS/SALIS. Overview. Research Parameters Temporal Effort Technical Effort Cognitive Effort Conclusions. Definition.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' The Link between Controlled Language and Post-Editing:' - wiley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
The link between controlled language and post editing

The Link between Controlled Language and Post-Editing:

An Empirical Investigation of Technical, Temporal and Cognitive Effort

Sharon O’Brien, CTTS/SALIS


  • Research Parameters

  • Temporal Effort

  • Technical Effort

  • Cognitive Effort

  • Conclusions


  • an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style.

    (Huijsen, 1998: 2)

Motivation in a nutshell
Motivation – In a Nutshell

  • Can the introduction of CL rules really improve MT output such that post-editing effort is reduced?

Machine translatability
Machine “Translatability”

  • One of the main “goals” of CL

  • The notion of translatability is based on so-called "translatability indicators" where the occurrence of such an indicator in the text is considered to have a negative effect on the quality of machine translation. The fewer translatability indicators, the better suited the text is to translation using MT.

    (Underwood and Jongejan 2001: 363)

Machine translatability1
Machine “Translatability”

  • “Negative” Translatability Indicators

    • “NTIs” for short

    • Examples (for English as SL)

      • Long noun phrases

      • Passive voice

      • Ungrammatical constructs

      • Use of slang…

    • Use of NTI list (Bernth/Gdaniec 2001)

    • Use of term “minimal NTI”

Research design
Research Design

  • SL: English; TL: German

  • Text Type: User Manual (1 777 words)

  • Users: 12 Professional Translators

  • Tools: IBM Websphere, Translog, IBM’s EasyEnglishAnalyzer, Sun Microsystem’s Sunproof

  • Place of Data Capture: IBM Stuttgart


  • Edit SL text to create two sentence types:

    • S(nti) = sentences with known negative translatability indicators

    • S(min-nti) = sentences where all listed NTIs had been removed

  • 9 subjects: post-editing (P1-P9)

  • 3 subjects: translating (T1-T3)

  • First pass exercise, no QA

Temporal effort
Temporal Effort

  • Post-Editing vs. Translation

    • median words per minute

Temporal effort 2
Temporal Effort (2)

  • Post-Editing vs. Translation

    • median processing speed

  • Processing speed is the total number of source words in each segment divided by the total processing time for that segment

    • i.e. words processed per second

Median processing speed
Median Processing Speed

  • S(ntis) vs. S(min-ntis)

Temporal effort conclusions
Temporal Effort: Conclusions

  • The post-editing task was completed faster than the translation task.

    • First-pass exercise/No QA

  • The median processing speeds for S(min-nti) segments were significantly higher than S(nti) segments

  • So, from a temporal point of view, it seems that the introduction of CL benefits turnaround times

Technical effort
Technical Effort

  • Measured using Translog:

    • Keyboarding

      • Deletions, insertions, cuts, pastes

    • Dictionary Look-Up Activity

Keyboarding median measurements1
Keyboarding Median Measurements

  • Small difference between the two segment types, but statistically significant for insertions/deletions

  • Cutting and pasting: very limited even though post-editors recycled whole chunks of text

Use of the translog dictionary
Use of the Translog Dictionary

  • Training and practice prior to task

  • All users reported being comfortable with the feature

Possible explanations
Possible Explanations?

  • Subjects not as familiar with feature as they reported

  • Subjects felt it was unnecessary to use dictionary

  • Subjects used to having terms suggested on-screen with TM/Terminology tool

  • Subjects lost faith in the feature when they encountered problems

Conclusions on technical effort
Conclusions on Technical Effort

  • S(min-nti) segments require significantly fewer deletions and insertions than S(nti) segments.

  • Cutting and pasting was a very rare activity for both segment types.

  • Dictionary searches were uncommon during this study. When they were carried out, the search facility was frequently used incorrectly.

Technical temporal combined
Technical/Temporal Combined

  • Results on technical post-editing effort add to the evidence presented above on temporal post-editing effort and further supports the claim that the elimination of NTIs from a segment can reduce post-editing effort.

Cognitive effort
Cognitive Effort

  • Potential Methodologies

    • TAP (rejected)

    • Pause Analysis

    • Choice Network Analysis

    • Eye tracking (unavailable at the time)

Pause behaviour
Pause Behaviour

  • No discernible correlations between pause behaviour and post-editing activity

    • Pause analysis rejected

Cognitive effort1
Cognitive Effort

  • Choice Network Analysis

Choice network analysis
Choice Network Analysis

  • …Choice Network Analysis compares the renditions of a single string of translation by multiple translators in order to propose a network of choices that theoretically represents the cognitive model available to any translator for translating that string. The technique is favoured over the think-aloud method, which is acknowledged as not being able to access automaticized processes.

    (Campbell, 2000: 215)

Example sentence with ntis
Example – Sentence with NTIs

  • ST:

    • “Save the document(s).”

  • Raw MT output:

    • „Sichern Sie das Dokument(s).“

  • NTIs for this sentence:

    • Short segment

    • Use of “(s)” for plural

Example sentence with minimal ntis
Example – Sentence with minimal NTIs

  • ST:

    • “The editor contains a menu and a toolbar.”

  • Raw MT output:

    • „Der Editor enthält ein Menü und eine Symbolleiste.“

Ntis and cognitive effort
NTIs and Cognitive Effort

  • Using CNA as a guide, NTIs categorised into:

  • High impact on post-editing effort

    • 50% or more of the occurrences of the NTI resulted in post-editing by two or more post-editors

  • Moderate impact on post-editing effort

    • Between 31% and 49% of occurrences

  • Low impact on post-editing effort

    • 30% or fewer occurrences

Correlating measurements
Correlating Measurements

  • By combining data on temporal, technical and cognitive effort: High Impact NTIs

    • Use of the gerund

    • Proper nouns

    • Problematic punctuation

    • Ungrammatical constructs

    • Use of (s) for plural

    • Non-finite verbs

    • Incomplete syntactic unit

    • Long NP

    • Short segment

Correlating measurements1
Correlating Measurements

  • Moderate impact NTIs:

    • Multiple coordinators

    • Passive voice

    • Personal pronouns

    • Use of a slash as a separator

    • Ambiguous scope in coordination

    • Parentheses

Correlating measurements2
Correlating Measurements

  • Low impact NTIs:

    • Abbreviations

    • Demonstrative pronouns

    • Missing “in order to”

    • Contractions


  • Within the limited scope of this research, we now have empirical evidence to support the assertion that controlling the input to MT leads to lower post-editing effort.

  • The elimination of some NTIs can have a higher impact than other NTIs

    • Is it worth having a relatively high number of CL rules?

  • Even if we remove known NTIs, MT engines are still likely to produce some errors and post-editors are still likely to post-edit.