Hebrew-to-English XFER MT Project - Update - PowerPoint PPT Presentation

hebrew to english xfer mt project update n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Hebrew-to-English XFER MT Project - Update PowerPoint Presentation
Download Presentation
Hebrew-to-English XFER MT Project - Update

play fullscreen
1 / 12
Hebrew-to-English XFER MT Project - Update
232 Views
Download Presentation
cheyenne-turner
Download Presentation

Hebrew-to-English XFER MT Project - Update

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Hebrew-to-English XFER MT Project - Update Alon Lavie March 17, 2004

  2. The Team • Alon Lavie • Shuly Wintner (Faculty at Haifa Univ.) • Yaniv Eytani (MS student at Haifa Univ.) • Erik Peterson and Kathrin Probst… Hebrew-to-English MT Update

  3. Main Tasks in Month-1 • Hebrew Encoding Issues • Hebrew Language Resources: • H-to-E Translation Lexicon • Morphological Analyzer • Putting together a front-end to the XFER engine: morphology, format conversions • Elicitation for Hebrew (two versions of EC) • Strong Decoder for H-to-E • Installing system on local server in Haifa Hebrew-to-English MT Update

  4. Hebrew Encoding Issues • Input texts are (mostly) in standard Windows encoding for Hebrew • Morphology analyzer and other resources already set to work in an “ascii-like” representation •  Converter script converts the input into the ascii representation • All further processing is done in the ascii representation • Lexicon and grammar rules are also in ascii representation • Elicitation is done in UTF8 Hebrew, output is converted to ascii representation Hebrew-to-English MT Update

  5. Translation Lexicon • “Dahan” H-to-E and E-to-H dictionary available to us • Excel spreadsheet format • Coverage is not great but not bad • H-to-E is about 15K translation pairs • E-to-H is about 7K translation pairs • POS information on both sides • No proper names or named entities • Issue with spelling convention “KTIB XSR” Hebrew-to-English MT Update

  6. Translation Lexicon • Yaniv wrote scripts that • Extract the relevant fields from the excel file • Merge with added lexicons (i.e. names) • Sort and remove duplicate entries • Convert to the XFER lexicon format • Kathrin adapted script that “enhances” lexicon for English generation (plurals of nouns, tensed verb forms) Hebrew-to-English MT Update

  7. Morphological Analyzer • Morphology is a big deal for Hebrew • Not just inflections and derivations, but also • Different words due to omission of vowels from the script • Attached prefixes for conj, det, prepositions, and some attached possessive suffixes • Analyzer program from MS student at Technion already available, works on Windows and with minimal adaptation on Linux • Coverage is reasonable… • Produces all analyses or a disambiguated analysis for each word • Entire sentence passed as input to morpher (not word-by-word) Hebrew-to-English MT Update

  8. Morphology Work Completed • Split attached prefixes and suffixes into separate words for translation • Produce f-structures as output • Convert feature-value codes to our conventions • Install morpher as a server running on our linux machines • Yaniv wrote java scripts to handle input-output from the morpher • Erik integrated a wrapper for running morpher as a server on our linux machines • Currently works in single analysis per word, almost ready to test with all analyses mode Hebrew-to-English MT Update

  9. Elicitation for Hebrew • Erik made sure Elicitation Tool works for Hebrew • Two reduced versions of full EC: Alon version, and Kathrin version • Shuly and Yaniv translated and aligned substantial portion of both • Kathrin trained an initial learned grammar Hebrew-to-English MT Update

  10. Pending Issues • Strong Decoder for H-to-E: • Kathrin and Alon adapting script for running Stephan’s decoder. • No real amounts of parallel text, so no translation model scores for the edges… • It seems to be working, but questions about English LM and parameter settings • Need to consult with Stephan… • Installing system locally in Haifa • Erik is working with Yaniv and Alon to get it working there… Hebrew-to-English MT Update

  11. Demo • Small input file of 10 sentences • Alon wrote very small manual grammar • Lexicon needs a lot of cleanup… • Morphology errors (only single analysis) • Translation output directly from XFER engine (no strong decoding) Hebrew-to-English MT Update

  12. Plans for Month-2 • Expanding and cleaning up translation lexicon • All morphological analyses: adapting to a lattice input! • Translation with Strong Decoder • More extensive manual grammar development for a solid comparison • Elicitation – finish translating and aligning • Testing and Evaluation • Collect some dev and eval sets with parallel translations • Evaluation with METEOR and BLEU • Frequent testing and fixing cycles Hebrew-to-English MT Update