Linfeng Song, Jun Xie , Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology

Rule Refinement for Spoken Language Translation by Retrieving the Missing Translation of Content Words Linfeng Song, Jun Xie, Xing Wang, YajuanLü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences {songlinfeng,xiejun,wangxing,lvyajuan,liuqun}@ict.ac.cn

Motivation • Spoken language translation suffers serious problem of missing content words no, you need 10 minutes to go to the main street, (the bus) comes every 10 minutes

Motivation • further investigation shows that this happens due to the usage of incorrect MT rules 我想买茶叶送给家人做礼物。 rule：#X1# 茶叶#X2#-> #X1# #X2# 我想买 I would like to buy 送给家人做礼物。 souvenir for my family . result: I would like to buysouvenir for my family .

Motivation • There is no specific feature in classic SMT framework to distinguish bad rules from good ones. • An obvious way to tackle this problem is to find a way to distinguish those bad MT rules from the good ones.

two rules 推荐的茶 a good rule R1 tea recommended 推荐的茶 a bad rule that miss the translation of content word “推荐” R2 tea

two rules 推荐的茶 R1 tea recommended 推荐的茶 R2 may be favored by classic MT system Since it generate shorter translation result R2 tea

Our Model

Our Model 推荐的茶 R1 tea recommended 推荐的茶 R2 tea

Training 这里有推荐的日本茶吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info

Training 推荐 recommended 茶 tea 日本 japanese 日本茶 japanese tea 这里有推荐的日本茶吗 do you have any japanese tea recommended …… bilingual corpus with word alignment info

Training isn’t content phrase content phrase stoplist 么吗的 … 这里有推荐的日本茶吗 do you have any japanese tea recommended content words are label with bold face …… bilingual corpus with word alignment info

Training 推荐 recommended 茶 tea 日本 japanese 日本茶 japanese tea … 这里有推荐的日本茶吗 do you have any japanese tea recommended …… bilingual corpus with word alignment information Co-relation table 茶 tea 13.76 茶 Japanese tea 4.89 …

Two penalties • Source Unaligned Penalty • the number of unaligned source content words in a rule • Target Unaligned Penalty • the number of unaligned target content words in a rule

Experiment • Data Sets • training : 280K CH-EN spoken language sentences • tuning : DEVSET2 of IWSLT 2010 • test : DEVSET3 ~ DEVSET6 of IWSLT 2010 • training set is used to our model

Experiment

Thanks Q & A

Linfeng Song, Jun Xie , Xing Wang, Yajuan Lü and Qun Liu Institute of Computing Technology