100 likes | 153 Views
Perform statistical machine translation with basic Spanish to English text, test effectiveness with or without hard-coded components. Explore specific algorithms to enhance speed and quality. Utilize NLTK and analyze error frequency for improvement.
E N D
Raghav Bashyal Statistical Machine Translation
Statistical Machine Translation • Uses pre-translated text (copora) • Compare translated text to original • Notice patterns, associate words
SMT Process • Knight – A Statistical Translation Workbook • Basic probabilities • P(word) • Conditional probabilities • P(word | word) • … • Pick the most probable translation
SMT process http://isoft.postech.ac.kr/research/SMT/images/math.jpg
Project • Translate basic text from Spanish to English • Test effectiveness • with/without hard-coded components (syntax) • Specific procedures/algorithms that add speed
Literature • Guides on Statistical Machine Translation • Most research project follow the same procedure as outlined by Knight • “state of the art” implementation • Google
Literature • NLTK • Christina Wallin • UC Berkeley • Modifications • Larger corpora more useful • Syntax based • hard-code • Higher translation quality when used with SMT
Procedure • NLTK – Natural Language ToolKit • Python • Made from Natural Language processing projects • Current procedure – read the SMT worksheet • Code along with worksheet
Development • Create corpora • Tokenization • Clean string • Probability • P(word) in corpora
Expected Results • Probably will be very basic translation • Usually perform better with “sample” text than “real” text • Highlighted errors • Program should use reference data to find some errors • Error frequency plots for certain words • Test the effectiveness of adjustments • Hard coding, other algorithms