1 / 15

Machine Translation MT – Research Landscape

Machine Translation MT – Research Landscape. Stephan Vogel Spring Semester 2011. Overview. Some influential projects Open source toolkits Conferences MT evaluations Literature and general resources Disclaimer: this all is incomplete, subjective, biased!. MT Projects. Verbmobil

melita
Download Presentation

Machine Translation MT – Research Landscape

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Machine TranslationMT – Research Landscape Stephan Vogel Spring Semester 2011

  2. Overview • Some influential projects • Open source toolkits • Conferences • MT evaluations • Literature and general resources • Disclaimer: this all is incomplete, subjective, biased! 11-711 Machine Translation

  3. MT Projects • Verbmobil • Large speech translation project in Germany • Different translation paradigms • Success story for SMT • TIDES • DARPA funded US MT project • SMT widely used, small and large data track evaluations • Chinese-English and Arabic-English • GALE • DARPA funded • Follow-up to TIDES • TransTac • DARPA funded • Speech-to-Speech Translation • Targeted towards force protection 11-711 Machine Translation

  4. MT Projects • TC-Star • European Project with partners from different universities • Technology and Corpora for Speech-to-Speech Translation • http://tcstar.org/ • EuroMatrix • 2006-2009, EuroMatixPlus 2009-2012 • Translate all European languages • Off-springs: WMT evaluations, MT marathon • euromatrix.net • Quero • French-German project • Kind of TC-Star follow-up • http://www.quaero.org/modules/movie/scenes/home/index.php?FUSEBOX_LANG=2 11-711 Machine Translation

  5. Open Source Toolkits: Word Alignment • Game Changer • Lower barrier to enter the field • Transparency • Word Alignment • GIZA++ • Started out at JHU workshop, subsequently extended by Franz Josef Och (at RWTH and ISI) • Most widely used alignment toolkit • mGIZA++ • Multi-threaded/multi-core extension of GIZA++ • By Qin Gao: http://geek.kyloo.net/software/doku.php/mgiza:overview • Berkeley Aligner • Word alignment via quadratic assignment • http://code.google.com/p/berkeleyaligner/ • PostCAT (Posterior Constrained Alignment Toolkit) • http://www.seas.upenn.edu/~strctlrn/CAT/CAT.html 11-711 Machine Translation

  6. Open Source Toolkits: WA cont. • Word Alignment tools • Alignment Set • Set of tools to manipulate and display alignments • From TALP research group • http://www.talp.upc.edu/talp/index.php/en/resources/tools/alingment-set 11-711 Machine Translation

  7. Open Source Toolkits: Decoders • Decoders • Moses (Edinburgh): phrase-based and recently also hierarchical • Joshua (JHU): hiero reimplementation • sourceforge.net/projects/joshua • Jane (RWTH Aachen): hierarchical • http://www-i6.informatik.rwth-aachen.de/web/Software/index.html • cdec (UMD -> CMU): hierarchical and phrase-based • Marie (TALP): ngram-based (kinda phrase-based) • www.talp.upc.edu/talp/index.php/en/resources/tools/marie • Apertium (University of Alicante): rule-based • Phrasasl (Stanford): phrase-based • http://www-nlp.stanford.edu/wiki/Software/Phrasal 11-711 Machine Translation

  8. Open Source Toolkits: LMs • SRILM • Most widely known and used LM toolkit • SALM • Written by Joy Ying Zhang (while at LTI) • http://projectile.sv.cmu.edu/research/public/tools/salm/salm.htm • IRST-LM • http://sourceforge.net/projects/irstlm/ • Ken-LM • Smaller footprint then SRILM • Written by Kenneth Heafield (LIT PhD student) • http://kheafield.com/code/kenlm/ 11-711 Machine Translation

  9. Conferences • General CL conferences • ACL • HLT • EMNLP • Coling • IJCNLP • Int. Joint Conf on NLP • LREC • Language Resources and Evaluation • RANLP • Recent Advances in NLP • SALTMIL • Speech and Langauge Technology for Minority Languages • Specific MT conferences • MT Summit (every 2 years) • AMTA (US) • EAMT (Europe) • TMI • Translating and the Computer (organised by Aslib) • IWSLT (organized by C-Star consortium) • … • MT Workshops • WMT • Workshop on Machine Translation • SSST • Syntax, Semantics, and Structure in SMT • … 11-711 Machine Translation

  10. Evaluations • It all started with TIDES • Comparative evaluations • Defined training and test data • Automatic evaluation metrics (NIST mteval, Bleu) • Organized by NIST • NIST Open MT Evaluations • Continuation and expansion of TIDES MT evaluations • Chinese-English, Arabic-English, Urdu-English • Restricted and unrestricted track • Originally every year, now going to 2 year cycle • http://www.itl.nist.gov/iad/mig/tests/mt/2009/ 11-711 Machine Translation

  11. Evaluations (cont.) • WMT Evaluations • Organized in connection with EuroMatrix • Based on Europarl corpora • Many languages • Automatic and manual evaluation • http://www.statmt.org/wmt11/translation-task.html • IWSLT Evaluations • Spoken language • Languages vary: Chinese, Japanese, Arabic, Italian, … • Speech 1-best and lattices provided • Based on (small) BTEC corpus (basic traveler expression corpus) • Last time also lecture translations • http://iwslt2010.fbk.eu/node/15 11-711 Machine Translation

  12. Evaluations (cont.) • Specific projects have evaluations • GALE • Arabic-English and Chinese-English • Broadcast news and broadcast conversations, newswire and blogs • Human evaluation (HTER) • Go/No-Go • Quero • European languages, also Arabic-French • This year WMT evaluation was used as Quero evaluation 11-711 Machine Translation

  13. Journals • Machine Translation • Springer Science, formerly Kluwer Academic Publishers, vol.4- ,1989- • Articles available online (abstracts free, full texts on payment of fee) from Springer • Chief editor: Andy Way • http://www.springer.com/computer/ai/journal/10590 • Computation Linguistics • MIT Press • Now open access • http://www.mitpressjournals.org/loi/coli • ACM TSLP • Online publication • Started in 2005 • http://tslp.acm.org/ 11-711 Machine Translation

  14. Journals (cont.) • IEEE Transactions on Audio, Speech, and Langauge Processing • http://www.signalprocessingsociety.org/publications/periodicals/taslp/ • The Prague Bulletin of Mathematical Linguistics • Has papers from recent MT Marathons, i.e. esp. descriptions of open source packages. • http://ufal.mff.cuni.cz/pbml.html 11-711 Machine Translation

  15. Literature • MT-Archive: http://www.mt-archive.info/ • Compiled by John Hutchins for the EAMT • One stop shop! • Also links to books, journals, conferences • Papers listed by author, language, organization • ACL Anthology: http://www.aclweb.org/anthology/ 11-711 Machine Translation

More Related