stephen doherty cngl salis s tephen doherty2@mail dcu ie
Skip this Video
Download Presentation
Stephen Doherty, CNGL/SALIS s [email protected]

Loading in 2 Seconds...

play fullscreen
1 / 21

Stephen Doherty, CNGL/SALIS s [email protected] - PowerPoint PPT Presentation

  • Uploaded on

Current Research A comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input. Stephen Doherty, CNGL/SALIS s [email protected] Overview. Past Research Readability & Comprehensibility Controlled Language

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about ' Stephen Doherty, CNGL/SALIS s [email protected]' - twyla

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
stephen doherty cngl salis s tephen doherty2@mail dcu ie

Current ResearchA comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input

Stephen Doherty, CNGL/SALIS

[email protected]

  • Past Research
  • Readability & Comprehensibility
  • Controlled Language
  • Research Proposal (Methodology)
  • Evaluation (Eye Tracking)
  • Conclusion
past research
Past Research
  • Translating Versus Post-Editing: A Segmentation Comparison Based on Pauses (B.A. Dissertation)
  • Think-Aloud Protocols in Translation Studies (Interessen der kognitiv orientiereten Translationswissenschaft)
research proposal
Research Proposal
  • CNGL Work Package: ILT1.8 Controlled Language:
  • Supervisors – Dr. Sharon O’Brien, Dr. Dorothy Kenny
  • “adapt the systems developed by other ILT WPs to deal with in-house data which conforms to both source and target controlled language guidelines”
readability comprehensibility
Readability & Comprehensibility
  • What is readability?
  • (Gray 1935: “In the reader, those features affecting readability are 1. prior knowledge, 2. reading skill, 3. interest, and 4. motivation. In the text, those features are 1. content, 2. style, 3. design, and 4. structure”.)
  • What is comprehensibility?
readability comprehensibility1
Readability & Comprehensibility
  • Metrics: (Reading scores, recall tests...)
  • E.g. Flesch Reading Ease:
  • Gunning-Fog Index – SMOG (Simple Measure of Gobbledygook) (Mc Laughlin 1969)


controlled language
Controlled Language
  • What is controlled language?

“an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style”

(Huijsen, 1998)

controlled language1
Controlled Language
  • Types of CL:
      • Human-Orientated Controlled Language (HOCL): readability & comprehensibility e.g. AECMA Simplified English
      • Machine-Orientated Controlled Language (MOCL): improved translatability, MT system specific

(Huijsen, 1998)

controlled language2
Controlled Language
  • Examples of CLs: AECMA Simplified English, Sun Microsystem’s Controlled English, IBM Easy English, Caterpillar Technical English, GM...
  • Usage (mostly English, but…)
  • Symantec (CNGL Industry Partner)
controlled language3
Controlled Language
  • Roturier (2006):
      • Consistent spelling (54)
      • Do not use pronouns that have no specific referent (19)
      • Avoid unusual punctuation (35)
      • Avoid embedded clauses introduced by commas or dashes (41)
      • Do not use more than 25 words per sentence (5)
      • Use a question mark only at the end of a direct question (48)
controlled language4
Controlled Language
  • O’Brien (2003) - three types of rule categories:
      • Lexical (e.g. Rules that allow or rule out the use of specific acronyms or abbreviations)
      • Syntactic (e.g. specifying when and where past participles can be used and avoiding the present participle)
      • Textual:
        • Text Structure (e.g. Specifying admissible sentence length)
        • Pragmatic (e.g. Using certain verb forms for specific text purposes – imperative for instructions)
research proposal1
Research Proposal

A comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input


HypothesesI. Controlled input to an MT system results in a higher level of readability and comprehensibility than uncontrolled inputII. The above is true regardless of whether the MT system is rule-based or statistics-based


Proposed MethodologyA corpus will be gathered to train the MT system (DCU School of Computing)A set of CL rules (Symantec)Four corpora (Symantec):1. Uncontrolled English – IT security domain2. Same corpus but with Symantec CL rules applied using Acrocheck, an authoring control tool3. RBMT output in French for corpus one4. RBMT output in French for corpus two


Proposed MethodologyMost of the uncontrolled and controlled bi-lingual corpora (the training data) will then be used to train the SMT system.The remaining subset of source-language side of corpora one and two (the test data) will then be translated using the resulting MT system (exact size/composition to be decided).

  • Both automatic and human evaluation (focus)
  • Automatic evaluation (Blue…)
  • Human evaluation: eye tracking & retrospective protocols (recall tests & interviews)
  • Eye Tracking:
          • What is it exactly? (background)
          • Successful application in this research area
  • Tobii Eye Tracker & ClearView software
  • Additional video recording, keystroke & mouse logging
  • Recall tests (comprehensibility)
  • Retrospective interviews (generation of additional data & resolving possible issues)

In Conclusion…What: SMT & RBMT output given controlled and uncontrolled input How: Automatic and human evaluation (eye tracking)Why (Future): Success of application of CL, comparison of MT systems with & without CL usage, Controlled Translation, implementing new technology & methodologies in research area, commercial benefits...