the link between controlled language and post editing
Skip this Video
Download Presentation
The Link between Controlled Language and Post-Editing:

Loading in 2 Seconds...

play fullscreen
1 / 36

The Link between Controlled Language and Post-Editing: - PowerPoint PPT Presentation

  • Uploaded on

The Link between Controlled Language and Post-Editing:. An Empirical Investigation of Technical, Temporal and Cognitive Effort Sharon O’Brien, CTTS/SALIS. Overview. Research Parameters Temporal Effort Technical Effort Cognitive Effort Conclusions. Definition.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'The Link between Controlled Language and Post-Editing:' - wiley

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
the link between controlled language and post editing

The Link between Controlled Language and Post-Editing:

An Empirical Investigation of Technical, Temporal and Cognitive Effort

Sharon O’Brien, CTTS/SALIS

  • Research Parameters
  • Temporal Effort
  • Technical Effort
  • Cognitive Effort
  • Conclusions
  • an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style.

(Huijsen, 1998: 2)

motivation in a nutshell
Motivation – In a Nutshell
  • Can the introduction of CL rules really improve MT output such that post-editing effort is reduced?
machine translatability
Machine “Translatability”
  • One of the main “goals” of CL
  • The notion of translatability is based on so-called "translatability indicators" where the occurrence of such an indicator in the text is considered to have a negative effect on the quality of machine translation. The fewer translatability indicators, the better suited the text is to translation using MT.

(Underwood and Jongejan 2001: 363)

machine translatability1
Machine “Translatability”
  • “Negative” Translatability Indicators
    • “NTIs” for short
    • Examples (for English as SL)
      • Long noun phrases
      • Passive voice
      • Ungrammatical constructs
      • Use of slang…
    • Use of NTI list (Bernth/Gdaniec 2001)
    • Use of term “minimal NTI”
research design
Research Design
  • SL: English; TL: German
  • Text Type: User Manual (1 777 words)
  • Users: 12 Professional Translators
  • Tools: IBM Websphere, Translog, IBM’s EasyEnglishAnalyzer, Sun Microsystem’s Sunproof
  • Place of Data Capture: IBM Stuttgart
  • Edit SL text to create two sentence types:
    • S(nti) = sentences with known negative translatability indicators
    • S(min-nti) = sentences where all listed NTIs had been removed
  • 9 subjects: post-editing (P1-P9)
  • 3 subjects: translating (T1-T3)
  • First pass exercise, no QA
temporal effort
Temporal Effort
  • Post-Editing vs. Translation
    • median words per minute
temporal effort 2
Temporal Effort (2)
  • Post-Editing vs. Translation
    • median processing speed
  • Processing speed is the total number of source words in each segment divided by the total processing time for that segment
    • i.e. words processed per second
median processing speed
Median Processing Speed
  • S(ntis) vs. S(min-ntis)
temporal effort conclusions
Temporal Effort: Conclusions
  • The post-editing task was completed faster than the translation task.
    • First-pass exercise/No QA
  • The median processing speeds for S(min-nti) segments were significantly higher than S(nti) segments
  • So, from a temporal point of view, it seems that the introduction of CL benefits turnaround times
technical effort
Technical Effort
  • Measured using Translog:
    • Keyboarding
      • Deletions, insertions, cuts, pastes
    • Dictionary Look-Up Activity
keyboarding median measurements1
Keyboarding Median Measurements
  • Small difference between the two segment types, but statistically significant for insertions/deletions
  • Cutting and pasting: very limited even though post-editors recycled whole chunks of text
use of the translog dictionary
Use of the Translog Dictionary
  • Training and practice prior to task
  • All users reported being comfortable with the feature
possible explanations
Possible Explanations?
  • Subjects not as familiar with feature as they reported
  • Subjects felt it was unnecessary to use dictionary
  • Subjects used to having terms suggested on-screen with TM/Terminology tool
  • Subjects lost faith in the feature when they encountered problems
conclusions on technical effort
Conclusions on Technical Effort
  • S(min-nti) segments require significantly fewer deletions and insertions than S(nti) segments.
  • Cutting and pasting was a very rare activity for both segment types.
  • Dictionary searches were uncommon during this study. When they were carried out, the search facility was frequently used incorrectly.
technical temporal combined
Technical/Temporal Combined
  • Results on technical post-editing effort add to the evidence presented above on temporal post-editing effort and further supports the claim that the elimination of NTIs from a segment can reduce post-editing effort.
cognitive effort
Cognitive Effort
  • Potential Methodologies
    • TAP (rejected)
    • Pause Analysis
    • Choice Network Analysis
    • Eye tracking (unavailable at the time)
pause behaviour
Pause Behaviour
  • No discernible correlations between pause behaviour and post-editing activity
    • Pause analysis rejected
cognitive effort1
Cognitive Effort
  • Choice Network Analysis
choice network analysis
Choice Network Analysis
  • …Choice Network Analysis compares the renditions of a single string of translation by multiple translators in order to propose a network of choices that theoretically represents the cognitive model available to any translator for translating that string. The technique is favoured over the think-aloud method, which is acknowledged as not being able to access automaticized processes.

(Campbell, 2000: 215)

example sentence with ntis
Example – Sentence with NTIs
  • ST:
    • “Save the document(s).”
  • Raw MT output:
    • „Sichern Sie das Dokument(s).“
  • NTIs for this sentence:
    • Short segment
    • Use of “(s)” for plural
example sentence with minimal ntis
Example – Sentence with minimal NTIs
  • ST:
    • “The editor contains a menu and a toolbar.”
  • Raw MT output:
    • „Der Editor enthält ein Menü und eine Symbolleiste.“
ntis and cognitive effort
NTIs and Cognitive Effort
  • Using CNA as a guide, NTIs categorised into:
  • High impact on post-editing effort
    • 50% or more of the occurrences of the NTI resulted in post-editing by two or more post-editors
  • Moderate impact on post-editing effort
    • Between 31% and 49% of occurrences
  • Low impact on post-editing effort
    • 30% or fewer occurrences
correlating measurements
Correlating Measurements
  • By combining data on temporal, technical and cognitive effort: High Impact NTIs
    • Use of the gerund
    • Proper nouns
    • Problematic punctuation
    • Ungrammatical constructs
    • Use of (s) for plural
    • Non-finite verbs
    • Incomplete syntactic unit
    • Long NP
    • Short segment
correlating measurements1
Correlating Measurements
  • Moderate impact NTIs:
    • Multiple coordinators
    • Passive voice
    • Personal pronouns
    • Use of a slash as a separator
    • Ambiguous scope in coordination
    • Parentheses
correlating measurements2
Correlating Measurements
  • Low impact NTIs:
    • Abbreviations
    • Demonstrative pronouns
    • Missing “in order to”
    • Contractions
  • Within the limited scope of this research, we now have empirical evidence to support the assertion that controlling the input to MT leads to lower post-editing effort.
  • The elimination of some NTIs can have a higher impact than other NTIs
    • Is it worth having a relatively high number of CL rules?
  • Even if we remove known NTIs, MT engines are still likely to produce some errors and post-editors are still likely to post-edit.