Stephen doherty cngl salis s tephen doherty2@mail dcu ie
This presentation is the property of its rightful owner.
Sponsored Links
1 / 21

Stephen Doherty, CNGL/SALIS s [email protected] PowerPoint PPT Presentation


  • 108 Views
  • Uploaded on
  • Presentation posted in: General

Current Research A comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input. Stephen Doherty, CNGL/SALIS s [email protected] Overview. Past Research Readability & Comprehensibility Controlled Language

Download Presentation

Stephen Doherty, CNGL/SALIS s [email protected]

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Stephen doherty cngl salis s tephen doherty2@mail dcu ie

Current ResearchA comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input

Stephen Doherty, CNGL/SALIS

[email protected]


Overview

Overview

  • Past Research

  • Readability & Comprehensibility

  • Controlled Language

  • Research Proposal(Methodology)

  • Evaluation (Eye Tracking)

  • Conclusion


Past research

Past Research

  • Translating Versus Post-Editing: A Segmentation Comparison Based on Pauses (B.A. Dissertation)

  • Think-Aloud Protocols in Translation Studies (Interessen der kognitiv orientiereten Translationswissenschaft)


Research proposal

Research Proposal

  • CNGL Work Package: ILT1.8 Controlled Language:

  • Supervisors – Dr. Sharon O’Brien, Dr. Dorothy Kenny

  • “adapt the systems developed by other ILT WPs to deal with in-house data which conforms to both source and target controlled language guidelines”


Readability comprehensibility

Readability & Comprehensibility

  • What is readability?

  • (Gray 1935: “In the reader, those features affecting readability are 1. prior knowledge, 2. reading skill, 3. interest, and 4. motivation. In the text, those features are 1. content, 2. style, 3. design, and 4. structure”.)

  • What is comprehensibility?


Readability comprehensibility1

Readability & Comprehensibility

  • Metrics: (Reading scores, recall tests...)

  • E.g. Flesch Reading Ease:

  • Gunning-Fog Index – SMOG (Simple Measure of Gobbledygook) (Mc Laughlin 1969)

6


Controlled language

Controlled Language

  • What is controlled language?

    “an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style”

    (Huijsen, 1998)


Controlled language1

Controlled Language

  • Types of CL:

    • Human-Orientated Controlled Language (HOCL): readability & comprehensibility e.g. AECMA Simplified English

    • Machine-Orientated Controlled Language (MOCL): improved translatability, MT system specific

      (Huijsen, 1998)


Controlled language2

Controlled Language

  • Examples of CLs: AECMA Simplified English, Sun Microsystem’s Controlled English, IBM Easy English, Caterpillar Technical English, GM...

  • Usage (mostly English, but…)

  • Symantec (CNGL Industry Partner)


Controlled language3

Controlled Language

  • Roturier (2006):

    • Consistent spelling (54)

    • Do not use pronouns that have no specific referent (19)

    • Avoid unusual punctuation (35)

    • Avoid embedded clauses introduced by commas or dashes (41)

    • Do not use more than 25 words per sentence (5)

    • Use a question mark only at the end of a direct question (48)


Controlled language4

Controlled Language

  • O’Brien (2003) - three types of rule categories:

    • Lexical (e.g. Rules that allow or rule out the use of specific acronyms or abbreviations)

    • Syntactic (e.g. specifying when and where past participles can be used and avoiding the present participle)

    • Textual:

      • Text Structure (e.g. Specifying admissible sentence length)

      • Pragmatic (e.g. Using certain verb forms for specific text purposes – imperative for instructions)


Research proposal1

Research Proposal

A comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input


Stephen doherty cngl salis s tephen doherty2 mail dcu ie

HypothesesI. Controlled input to an MT system results in a higher level of readability and comprehensibility than uncontrolled inputII. The above is true regardless of whether the MT system is rule-based or statistics-based


Stephen doherty cngl salis s tephen doherty2 mail dcu ie

Proposed MethodologyA corpus will be gathered to train the MT system (DCU School of Computing)A set of CL rules (Symantec)Four corpora (Symantec):1. Uncontrolled English – IT security domain2. Same corpus but with Symantec CL rules applied using Acrocheck, an authoring control tool3. RBMT output in French for corpus one4. RBMT output in French for corpus two


Stephen doherty cngl salis s tephen doherty2 mail dcu ie

Proposed MethodologyMost of the uncontrolled and controlled bi-lingual corpora (the training data) will then be used to train the SMT system.The remaining subset of source-language side of corpora one and two (the test data) will then be translated using the resulting MT system (exact size/composition to be decided).


Evaluation

Evaluation

  • Both automatic and human evaluation (focus)

  • Automatic evaluation (Blue…)

  • Human evaluation: eye tracking & retrospective protocols (recall tests & interviews)


Evaluation1

Evaluation

  • Eye Tracking:

    • What is it exactly? (background)

    • Successful application in this research area

  • Tobii Eye Tracker & ClearView software

  • Additional video recording, keystroke & mouse logging


  • Stephen doherty cngl salis s tephen doherty2 mail dcu ie

    Tobii 1750 Eye Tracker

    (www.tobii.se)


    Evaluation2

    Evaluation

    • Recall tests (comprehensibility)

    • Retrospective interviews (generation of additional data & resolving possible issues)


    Stephen doherty cngl salis s tephen doherty2 mail dcu ie

    In Conclusion…What: SMT & RBMT output given controlled and uncontrolled input How: Automatic and human evaluation (eye tracking)Why (Future): Success of application of CL, comparison of MT systems with & without CL usage, Controlled Translation, implementing new technology & methodologies in research area, commercial benefits...


    Thanks for your attention questions

    Thanks for your attention!Questions?


  • Login