1 / 16

Linfeng Song, Jun Xie , Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology

Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words. Linfeng Song, Jun Xie , Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences { songlinfeng,xiejun,wangxing,lvyajuan,liuqun }@ ict.ac.cn.

otylia
Download Presentation

Linfeng Song, Jun Xie , Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, YajuanLü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences {songlinfeng,xiejun,wangxing,lvyajuan,liuqun}@ict.ac.cn

  2. Motivation • Spoken language translation suffers serious problem of missing content words no, you need 10 minutes to go to the main street, (the bus) comes every 10 minutes

  3. Motivation • further investigation shows that this happens due to the usage of incorrect MT rules 我 想 买 茶叶 送给 家人 做 礼物 。 rule:#X1# 茶叶#X2#-> #X1# #X2# 我 想 买 I would like to buy 送给 家人 做 礼物 。 souvenir for my family . result: I would like to buysouvenir for my family .

  4. Motivation • There is no specific feature in classic SMT framework to distinguish bad rules from good ones. • An obvious way to tackle this problem is to find a way to distinguish those bad MT rules from the good ones.

  5. two rules 推荐 的 茶 a good rule R1 tea recommended 推荐 的 茶 a bad rule that miss the translation of content word “推荐” R2 tea

  6. two rules 推荐 的 茶 R1 tea recommended 推荐 的 茶 R2 may be favored by classic MT system Since it generate shorter translation result R2 tea

  7. Our Model

  8. Our Model 推荐 的 茶 R1 tea recommended 推荐 的 茶 R2 tea

  9. Training 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info

  10. Training 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info

  11. Training isn’t content phrase content phrase stoplist 么 吗 的 … 这里 有 推荐 的 日本茶 吗 do you have any japanese tea recommended content words are label with bold face …… bilingual corpus with word alignment info

  12. Training 推荐 recommended 茶 tea 日本 japanese 日本 茶 japanese tea … 这里 有 推荐 的 日本 茶 吗 do you have any japanese tea recommended …… bilingual corpus with word alignment information Co-relation table 茶 tea 13.76 茶 Japanese tea 4.89 …

  13. Two penalties • Source Unaligned Penalty • the number of unaligned source content words in a rule • Target Unaligned Penalty • the number of unaligned target content words in a rule

  14. Experiment • Data Sets • training : 280K CH-EN spoken language sentences • tuning : DEVSET2 of IWSLT 2010 • test : DEVSET3 ~ DEVSET6 of IWSLT 2010 • training set is used to our model

  15. Experiment

  16. Thanks Q & A

More Related