1 / 20

Oasis

Oasis A Beam Search Decoder for Phrase-Based Statistical Machine Translation Models MAHCINE INTELLIGENCE AND TRANSLATION LAB HARBIN INSTITUTE OF TECHNOLOGY. Oasis. A user manual for Oasis The files needed by Oasis Some explanations for the parameters A summary for the parameters

tpuckett
Download Presentation

Oasis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. OasisA Beam Search Decoder for Phrase-Based Statistical Machine Translation Models MAHCINE INTELLIGENCE AND TRANSLATION LAB HARBIN INSTITUTE OF TECHNOLOGY

  2. Oasis • A user manual for Oasis • The files needed by Oasis • Some explanations for the parameters • A summary for the parameters • The details for the decoding process • Oasis for Phrase-based SMT • Core algorithm

  3. A User Manual for Oasis • The Designers • Li Jun, Liang Huashen, Jiang Hongfei, Sun Jiadong • CPU: P4 2.0GHz or higher • Ram: 512M or larger • Windows —— Visual C++ 6.0 or Linux —— gcc 3.4.2

  4. The Files Needed by Oasis • Phrase-Table File • Language-Model File --------ARPA • The Configuration File for Oasis (The User Manual for Oasis)

  5. Some Explanations for The Parameters(1) • [ttable-limit] ---top n Translation, Language Model, English Phrase Length • [stack] & [threshold] -s:maximum size of the beam (default 100) -b:minimum threshold of the beam (default 0.00001)

  6. Some Explanations for The Parameters(2) • [distortion] d = abs(last word position of previously translated phrase + 1- first word position of newly translated phrase) -dl:Maximum distance between two input phrase that are translated to two neighboring output phrase -d:Minimum of the distortion score

  7. Some explanations for the parameters(3) • [lm-limit] -m:minimum score of the language model of the phrase • [nbest] -l: generate n-best lattice

  8. A summary for the parameters • -f Specify the Configuration file • -in Specify the input data • -out Specify the output data • -s Specify the maximum size of the beam, 100 • -b Specify the beam threshold, default 0.00001 • -l Specify the N-best output, default 1 • -d Specify the min distortion score,-2.30259 • -dl Specify the distortion distance, -9 • -m Specify

  9. The details for the decoding process • Translation options • Future cost • Hypothesis element

  10. An example for the details • creating hypothesis 1 from 0 • base score 0 • translation cost -1.28215 • distortion cost 0 • language model cost for 'same' -2.57302 • language model cost for 'the' -1.91582 • word penalty 2 • score -3.77099 + futureCost -15.8278 = -19.5988 • new best estimate for this stack • merged hypothesis on stack 1, now size 1

  11. Oasis for Phrase-based SMT • Translation Option [ 同样 ] • the same, -1.28215, -4.05183 [ 的 ] in, -1.8011, -2.94581 right, -2.71656, -4.88024 of a, -2.82411, -4.96468 flight, -2.62181, -4.90627 's, -2.5144, -4.40283 of the, -2.21918, -3.63043 [ 东西 ] anything, -2.03706, -4.64327 came, -2.2911, -5.35805

  12. Future cost (1) • future costs from 0 to 0 is -4.05183 • future costs from 0 to 1 is -4.95449 • …… • future costs from 0 to 7 is -22.5568 • future costs from 0 to 8 is -19.8797 • future costs from 1 to 1 is -2.29265 • future costs from 1 to 2 is -4.18843 • …… • future costs from 6 to 7 is -5.82631 • future costs from 6 to 8 is -1.82436 • future costs from 7 to 7 is -3.53366 • future costs from 7 to 8 is -0.85651 • future costs from 8 to 8 is -0.68513

  13. Future cost (2) • The calculation of future cost translation option cost language model cost

  14. Core algorithm • The generation of a phrase table Future cost • The hypothesis and state expansion • Beam search • The generation of the English sentence

  15. The structure of the hypothesis data • The present cost • The translation cost for the new translation phrase • The distortion cost          • The penalty for the new translation option • Language cost • The future cost • The new phrase positions in the foreign sentence • The new phrase • The last two english words generated • ID

  16. Recombining hypotheses • The foreign words covered so far • The last two English words generated • The last word of the last foreign phrase covered

  17. Beam search • The fixed and relative threshold • The stack

  18. The Result of the Experiment(1)

  19. The Result of the Experiment(2)

  20. Thank you!

More Related