1 / 45

Bagging-based System Combination for Domain Adaptation

Bagging-based System Combination for Domain Adaptation. Linfeng Song , Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences. An Example. An Example. Initial MT system. An Example. Initial MT system. Tuned MT system that fits domain A.

sharla
Download Presentation

Bagging-based System Combination for Domain Adaptation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Bagging-based System Combination for Domain Adaptation Linfeng Song, Haitao Mi, Yajuan Lü and Qun Liu Institute of Computing Technology Chinese Academy of Sciences

  2. An Example

  3. An Example • Initial MT system

  4. An Example • Initial MT system • Tuned MT system that fits domain A Development set A:90% B:10% The translation styles of A and B are quite different

  5. An Example • Initial MT system • Tuned MT system that fits domain A Test set A:10% B:90% Development set A:90% B:10%

  6. An Example The translation style fits A, but we mainly want to translate B • Initial MT system • Tuned MT system that fits domain A Test set A:10% B:90% Development set A:90% B:10%

  7. Traditional Methods Monolingual data with domain annotation

  8. Traditional Methods Monolingual data with domain annotation Domain recognizer

  9. Traditional Methods Bilingual training data

  10. Traditional Methods training data : domain A Bilingual training data training data : domain B Domain recognizer

  11. Traditional Methods training data : domain A MT system domain A Bilingual training data training data : domain B MT system domain B Domain recognizer

  12. Traditional Methods Test set

  13. Traditional Methods Test set domain A Test set Test set domain B Domain recognizer

  14. Traditional Methods Test set domain A The translation result domain A MT system domain A The translation result Test set domain B The translation result domain B MT system domain B

  15. The merits • Simple and effective • Fits Human’s intuition

  16. The drawbacks • Classification Error (CE) • Especially for unsupervised methods • Supervised methods can make CE low, yet requiring annotation data limits its usage

  17. Our motivation • Jump out of the alley of doing adaptation directly • Statistics methods (such as Bagging) can help.

  18. Preliminary The general framework of Bagging

  19. General framework of Bagging Training set D

  20. General framework of Bagging Training set D Training set D1 Training set D2 Training set D3 …… C1 C2 C3 ……

  21. General framework of Bagging C1 C2 C3 …… Test sample

  22. General framework of Bagging Voting result Result of C1 Result of C2 Result of C3 …… C1 C2 C3 …… Test sample

  23. Our method

  24. Training Suppose there is a development set A,A,A,B,B For simplicity, there are only 5 sentences, 3 belong A, 2 belong B

  25. Training We bootstrap N new development sets A,B,B,B,B A,A,B,B,B A,A,A,B,B A,A,B,B,B A,A,A,B,B A,A,A,A,B ……

  26. Training For each set, a subsystem is tuned MT system-1 A,B,B,B,B A,A,B,B,B MT system-2 A,A,A,B,B A,A,B,B,B MT system-3 A,A,A,B,B MT system-4 A,A,A,A,B MT system-5 …… ……

  27. Decoding For simplicity,Suppose only 2 subsystem has been tuned Subsystem-1 W:<-0.8,0.2> Subsystem-1 W:<-0.6,0.4>

  28. Decoding Subsystem-1 W:<-0.8,0.2> A B Subsystem-1 W:<-0.6,0.4> Now a sentence “A B” needs a translation

  29. Decoding After translation, each system generate its N-best candidate a b; <0.2, 0.2> a c; <0.2, 0.3> Subsystem-1 W:<-0.8,0.2> A B a b; <0.2, 0.2> a b; <0.1, 0.3> a d; <0.3, 0.4> Subsystem-1 W:<-0.6,0.4>

  30. Decoding Fuse these N-best listsand eliminate deductions a b; <0.2, 0.2> a c; <0.2, 0.3> Subsystem-1 W:<-0.8,0.2> a b; <0.1, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> A B a b; <0.2, 0.2> a b; <0.1, 0.3> a d; <0.3, 0.4> Subsystem-1 W:<-0.6,0.4>

  31. Decoding a b; <0.2, 0.2> a c; <0.2, 0.3> Subsystem-1 W:<-0.8,0.2> a b; <0.1, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> A B a b; <0.2, 0.2> a b; <0.1, 0.3> a d; <0.3, 0.4> Subsystem-1 W:<-0.6,0.4> Candidates are identical only if their target strings and feature values are entirely equal

  32. S represent the number of subsystems Decoding Subsystem-1 W:<-0.8,0.2> a b; <0.2, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> a b; <0.2, 0.2>; -0.16 a b; <0.1, 0.3>; +0.04 a c; <0.2, 0.3>; -0.1 a d; <0.3, 0.4>; -0.18 Subsystem-1 W:<-0.6,0.4> Calculate the voting score

  33. Decoding Subsystem-1 W:<-0.8,0.2> a b; <0.2, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> a b; <0.2, 0.2>; -0.16 a b; <0.1, 0.3>; +0.04 a c; <0.2, 0.3>; -0.1 a d; <0.3, 0.4>; -0.18 Subsystem-1 W:<-0.6,0.4> The one with the highest score wins

  34. Decoding Subsystem-1 W:<-0.8,0.2> a b; <0.2, 0.2> a b; <0.1, 0.3> a c; <0.2, 0.3> a d; <0.3, 0.4> a b; <0.2, 0.2>; -0.16 a b; <0.1, 0.3>; +0.04 a c; <0.2, 0.3>; -0.1 a d; <0.3, 0.4>; -0.18 Subsystem-1 W:<-0.6,0.4> The one with the highest score wins • Since subsystems are different copies of the same model and share unique training data, calibration is unnecessary

  35. Experiments

  36. Basic Setups • Data: NTCIR9 Chinese-English patent corpus • 1k sentence pairs as development set • Another 1k pairs as test set • The remains are used for training • System: hierarchical phrase based model • Alignment: GIZA++ grow-diag-final

  37. Effectiveness : Show and Prove • Tune 30 subsystems using Bagging • Tune 30 subsystems with random initial weight • Evaluate the fusion results of the first N (N=5,10, 15, 20, 30) subsystems of both and compare

  38. Results: 1-best +0.82 Number of subsystem

  39. Results: 1-best +0.70 Number of subsystem

  40. Results: Oracle +6.22 Number of subsystem

  41. Results: Oracle +3.71 Number of subsystem

  42. Compare with traditional methods • Evaluate a supervised method • For tackling data sparsity only operate on development set and test set • Evaluate a unsupervised method • Similar to Yamada (2007) • To avoid data sparsity, only LM specific

  43. Results

  44. Conclusions • Propose a bagging-based method to address multi-domain translation problem. • Experiments shows that: • Bagging is effective for domain adaptation problem • Our method surpass baseline explicitly, and is even better than some traditional methods.

  45. Thank you for listening And any questions?

More Related