1 / 1

The Chinese Room: Understanding and Correcting Machine Translation

The Chinese Room: Understanding and Correcting Machine Translation. Josh Albrecht, Rebecca Hwa, and G. Elisabeta Marai {jsa8,hwa,marai}@cs.pitt.edu Department of Computer Science, University of Pittsburgh. Solution: The Chinese Room. The Problem With Machine Translation.

Download Presentation

The Chinese Room: Understanding and Correcting Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Chinese Room: Understanding and Correcting Machine Translation Josh Albrecht, Rebecca Hwa, and G. Elisabeta Marai {jsa8,hwa,marai}@cs.pitt.edu Department of Computer Science, University of Pittsburgh Solution: The Chinese Room The Problem With Machine Translation What is Machine Translation? • Machine translated sentences are often difficult or impossible to understand. • Example machine translation: • He utter eyes and not the slightest attention As leakage. • Intended meaning: • His eyes were wide apart; nothing in their field of vision escaped. • Errors are caused by the machine’s lack of world knowledge and its inability to form coherent sentences or understand ideas. • Machine Translation (MT) is the process of automatically converting text from one human language to another (Ex: Chinese to English) • MT is performed by algorithms that extract statistical translation rules from millions of human generated translation pairs (sentences with the same meaning in both Chinese and English) • Uses of MT: • People that want to read text in an unknown foreign language • People who are barely proficient with a language can use it to learn • Businesses want to translate documents into other languages • We focus on the first case, though our work could easily be extended to the other cases as well • Idea: We propose a collaborative approach between users, who have good world knowledge and writing skills, and the machine, which is good at processing large amounts of data into useful linguistic resources. • We have created an interactive visualization of these linguistic resources that enables the user to explore alternative translations in order to better understand and correct machine translations. • Design was based on iterative improvement with expert users • Promising preliminary results on pilot study Visualization: Interaction: • Clicking on English words allows them to be edited • Dragging English words allows the user to visually experiment with different word orders • Mousing over the definitions highlights the corresponding Chinese character or word • Clicking on the Chinese Syntax Tree lines causes that section of the sentence to collapse (or expand if clicked again later), allowing the user to better focus on difficult parts of the sentence • Clicking and dragging selects a Chinese phrase (and begins the search for similar example translations) • Clicking on an example search result puts that sentence in the main view for more detailed inspection. • Clicking on the translation tab requests N-Best translations • Clicking on a sentence in the document view selects it as the current sentence • Clicking on the edit tab allows the user to type and directly modify the translated text Chinese Text: • Displays the original characters, automatically segmented into approximate words Word Alignments • Displays the mapping (given by the MT system) between Chinese and English words • English words are clustered together based on these alignments. English Text: Chinese Syntax Tree • Shows the automatically generated grammatical structure of the source sentence • Colors correspond to different parts of speech (blue for verbs, red for nouns, etc) • Displays the English translation generated by the MT system for the selected Chinese sentence Screenshot of the Chinese Room By interacting with the various components, the users can better understand the original meaning of the Chinese text. Translation Dictionary: Additional Resources: • Other resources are displayed as text in the rightmost pane: • N-Best Re-Translations: This is a list of candidate English sentences (or phrases) that the Machine Translation system (in this case, Google) was considering for the phrase selected by the user. • Represents definitions for words (first column) and individual Chinese characters (second column) • Definitions are aligned horizontally with the word or character that they define • Document View: Every sentence in the document can be seen at once, giving a better sense of the meaning in the context of the document. • Edit Area: The English translation can be edited in a small text area so that users can quickly edit and annotate the sentence. • Example Search: Search results are displayed in the rightmost column, with the matches shown in pink, and are sorted by relevance. Conclusions • Further applications of this basic collaborative approach (language education, end-user understanding, commercial translation processes, MT design and more) • Extending the tool to other language pairs (shown to the left working with Arabic) • Further efforts in usability and ease-of-use could be very beneficial • Other resources (manually created translation rules, incorporation of translation memory) might be helpful to the user. Related Work Future Work • “The Chinese Room” is an interface that allows users to explore and interact with linguistic resources as they attempt to understand poor automatic translations • Many remaining challenges, including integrating other forms of information, and exploiting uncertain sources of information • Our tool can manageably expose a variety of resources and a huge amount of data to the user, allowing monolingual speakers to determine the most likely translation without any knowledge of the foreign language. • DerivTool – An interface for observing the inner workings of a specific MT system. Required knowledge of both languages and an in-depth knowledge of how MT works. [DeNeefe et. al, 2005] • Design cues from systems such as TreeJuxtaposer [Munzner, 2003], and from Envisioning Information [Tufte, 1990] This work has been supported by NSF Grants IIS-0612791.

More Related