1 / 24

Example-based Machine Translation based on Deeper NLP

Example-based Machine Translation based on Deeper NLP. Toshiaki Nakazawa 1 , Kun Yu 1 , Sadao Kurohashi 2. 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, 113-8656 2. Graduate School of Informatics,

nia
Download Presentation

Example-based Machine Translation based on Deeper NLP

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Example-based Machine Translation based on Deeper NLP Toshiaki Nakazawa1, Kun Yu1, Sadao Kurohashi2 1. Graduate School of Information Science and Technology, The University of Tokyo, Tokyo, Japan, 113-8656 2. Graduate School of Informatics, Kyoto University, Kyoto, Japan, 606-8501

  2. Outline • Why EBMT? • Description of Kyoto-U EBMT System • Japanese Particular Processing • Pronoun Estimation • Japanese Flexible Matching • Result and Discussion • Conclusion and Future Work

  3. Outline • Why EBMT? • Description of Kyoto-U EBMT System • Japanese Particular Processing • Pronoun Estimation • Japanese Flexible Matching • Result and Discussion • Conclusion and Future Work

  4. Why EBMT? • Pursuing deep NLP • Improvement of fundamental analyses leads to improvement of MT • Feedback from MT can be expected • EBMT setting is suitable in many cases • Not a large corpus, but similar translation examples in relatively close domain • e.g. manual translation, patent translation, …

  5. Outline • Why EBMT? • Description of Kyoto-U EBMT System • Japanese Particular Processing • Pronoun Estimation • Japanese Flexible Matching • Result and Discussion • Conclusion and Future Work

  6. came at me from the side Translation Examples at the intersection Input 交差 (cross) 交差点に入る時 私の信号は青でした。 点 で 、 (point) my 家 に to remove 突然 (suddenly) traffic 入る when 飛び出して 来た のです 。 The light (rush out) 交差 entering 時 was green (cross) a house 脱ぐ 点 に (house) when (point) 入る (enter) entering (enter) (when) 時 my 私 の (when) the intersection (put off) signature 私 の サイン (my) (my) 信号 は (signal) Language Model (signal) 青 信号 は traffic (blue) でした 。 The light 青 (signal) Output (was) でした 。 was green My traffic light was green when entering the intersection. (blue) (was) Kyoto-U System Overview

  7. Structure-based Alignment - Step1: Dependency structure transformation - Step2: Word/phrase correspondences detection - Step3: Correspondences disambiguation - Step4: Handling remaining words - Step5: Registration to database

  8. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです J: 交差点で、突然あの車が 飛び出して来たのです。 E: The car came at me from the side at the intersection. Step1 Dependency Structure Transformation • J: JUMAN/KNP • E: Charniak’s nlparser → Dependency tree

  9. Step2 Word Correspondence Detection • KENKYUSYA J-E, E-Jdictionaries (300K entries) • Transliteration (person/place names, Katakana words) Ex) 新宿 ⇔ shinjuku (similarity:1.0) → shinjuku sinjuku synjucu ... 交差 the car 点 で 、 came 突然 at me あの from the side 車 が at the intersection 飛び出して 来た のです

  10. Step3 Correspondence Disambiguation • Calculate correspondence score based on unambiguous alignment • Select correspondence with higher score distJ/E = Distance to unambiguous correspondence in Japanese/English tree

  11. 日本 で you 保険 will have 会社 に to file 対して insurance 保険 an claim 請求の insurance 申し立て が with the office in Japan 可能です よ Step3 Correspondence Disambiguation (cont.) 0.8 1.5 1.0

  12. Step4 Handling Remaining Words • Align root nodes when remained • Merge Base NP nodes • Merge into ancestor nodes 交差 the car 点 で 、 came 突然 at me あの from the side 車 が at the intersection 飛び出して 来た のです

  13. the car came at me 交差 from the side 点 で 、 at the intersection 突然 あの 車 が 飛び出して 来た のです Step5 Registration to Database • Register each correspondence • Register a couple of correspondences

  14. Translation • Translation example (TE) retrieval - for all the sub-trees in the input • TE selection - prefer to large size example • TE combination - greedily from the root node

  15. came at me from the side Translation Examples at the intersection 交差 (cross) 点 で 、 (point) my 家 に to remove 突然 (suddenly) traffic when 入る 飛び出して 来た のです 。 The light (rush out) 交差 entering 時 was green (cross) a house 脱ぐ 点 に (house) when (point) 入る (enter) entering (enter) (when) 時 my 私 の (when) the intersection (put off) signature 私 の サイン (my) (my) 信号 は (signal) (signal) 青 信号 は traffic (blue) でした 。 The light 青 (signal) (was) でした 。 was green (blue) (was) Combination Example Input

  16. came at me from the side Translation Examples at the intersection 交差 (cross) 点 で 、 (point) my 家 に to remove 突然 (suddenly) traffic when 入る 飛び出して 来た のです 。 The light (rush out) 交差 entering 時 was green (cross) a house 脱ぐ 点 に (house) when (point) 入る (enter) entering (enter) (when) 時 my 私 の (when) the intersection (put off) signature 私 の サイン (my) (my) 信号 は (signal) (signal) 青 信号 は traffic (blue) でした 。 The light 青 (signal) (was) でした 。 was green (blue) (was) Combination Example (cont.) Input

  17. Outline • Why EBMT? • Description of Kyoto-U EBMT System • Japanese Particular Processing • Pronoun Estimation • Japanese Flexible Matching • Result and Discussion • Conclusion and Future Work

  18. Pronoun Estimation • Pronouns are often omitted in Japanese sentences • Omitted in TE: - TE胃が痛いのです → I’ve a stomachache - Input私は胃が痛いのです → • Omitted in Input - TEこれを日本に送ってください →Will you mail this to Japan? - Input:日本へ送ってください → I I’ve a stomachache × Will you mail to Japan? × △

  19. Pronoun Estimation (cont.) • Estimate omitted pronoun by modality and subject case • Omitted in TE: - TE胃が痛いのです → I’ve a stomachache - Input私は胃が痛いのです → • Omitted in Input - TEこれを日本に送ってください →Will you mail this to Japan? - Input:日本へ送ってください → (私は)胃が痛いのです → I’ve a stomachache I’ve a stomachache○ (これを)日本へ送ってください → Will you mail this to Japan?○

  20. Various Expressions in Japanese • Synonymous Relation • Hiragana/Katakana/Kanji variations りんご = リンゴ = 林檎 (apple) • Variations of Katakana expressions コンピュータ = コンピューター (computer) • Synonymous words 登山 = 山登り(climbing mountain vs mountain climgbing) • Synonymous phrases 最寄りの = 一番近い Morphological Analyzer Automatically Acquired from Japanese Dictionaries (nearest) (most) (near) • Hypernym-Hyponym Relation • 災難 ← 災害 ← 地震(earthquake)、台風(typhoon) (disaster)

  21. Japanese Flexible Matching

  22. IWSLT06 Evaluation Results • Open data track (JE) • Correct recognition translation & ASR output translation

  23. Results Discussion • Punctuation insertion failure caused parsing error • Dictionary robustness affected alignment accuracy • TE selection criterion failed when choosing among ‘almost equal’ examples - e.g. Input: “買います” (buy a ticket) TE: “買いません” (not buy a ticket)

  24. Conclusion and Future Work • We not only aim at the development of MT, but also tackle this task from the viewpoint of structural NLP. • Implement statistical method on alignment • Improve parsing accuracies (both J and E) • Improve Japanese flexible matching method • J-C and C-J MT Project with NICT

More Related