Current Research A comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input. Stephen Doherty, CNGL/SALIS s [email protected] Overview. Past Research Readability & Comprehensibility Controlled Language
“an explicitly defined restriction of a natural language that specifies constraints on lexicon, grammar, and style”
A comparative investigation of the readability and comprehensibility of SMT and RBMT output for controlled and uncontrolled input
HypothesesI. Controlled input to an MT system results in a higher level of readability and comprehensibility than uncontrolled inputII. The above is true regardless of whether the MT system is rule-based or statistics-based
Proposed MethodologyA corpus will be gathered to train the MT system (DCU School of Computing)A set of CL rules (Symantec)Four corpora (Symantec):1. Uncontrolled English – IT security domain2. Same corpus but with Symantec CL rules applied using Acrocheck, an authoring control tool3. RBMT output in French for corpus one4. RBMT output in French for corpus two
Proposed MethodologyMost of the uncontrolled and controlled bi-lingual corpora (the training data) will then be used to train the SMT system.The remaining subset of source-language side of corpora one and two (the test data) will then be translated using the resulting MT system (exact size/composition to be decided).
In Conclusion…What: SMT & RBMT output given controlled and uncontrolled input How: Automatic and human evaluation (eye tracking)Why (Future): Success of application of CL, comparison of MT systems with & without CL usage, Controlled Translation, implementing new technology & methodologies in research area, commercial benefits...