1 / 10

Statistical Machine Translation

Raghav Bashyal. Statistical Machine Translation. Statistical Machine Translation. Uses pre-translated text (copora) Compare translated text to original Notice patterns, associate words. SMT Process. Knight – A Statistical Translation Workbook Basic probabilities P(word)

krodney
Download Presentation

Statistical Machine Translation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Raghav Bashyal Statistical Machine Translation

  2. Statistical Machine Translation • Uses pre-translated text (copora) • Compare translated text to original • Notice patterns, associate words

  3. SMT Process • Knight – A Statistical Translation Workbook • Basic probabilities • P(word) • Conditional probabilities • P(word | word) • … • Pick the most probable translation

  4. SMT process http://isoft.postech.ac.kr/research/SMT/images/math.jpg

  5. Project • Translate basic text from Spanish to English • Test effectiveness • with/without hard-coded components (syntax) • Specific procedures/algorithms that add speed

  6. Literature • Guides on Statistical Machine Translation • Most research project follow the same procedure as outlined by Knight • “state of the art” implementation • Google

  7. Literature • NLTK • Christina Wallin • UC Berkeley • Modifications • Larger corpora more useful • Syntax based • hard-code • Higher translation quality when used with SMT

  8. Procedure • NLTK – Natural Language ToolKit • Python • Made from Natural Language processing projects • Current procedure – read the SMT worksheet • Code along with worksheet

  9. Development • Create corpora • Tokenization • Clean string • Probability • P(word) in corpora

  10. Expected Results • Probably will be very basic translation • Usually perform better with “sample” text than “real” text • Highlighted errors • Program should use reference data to find some errors • Error frequency plots for certain words • Test the effectiveness of adjustments • Hard coding, other algorithms

More Related