1 / 8

TectoMT

TectoMT is a software framework developed at UFAL since 2005, allowing experimentation with deep-syntactic transfer in MT and integration of various NLP software components. It emphasizes modularity and enables the creation of real-life applications. In 2008, new blocks and applications were added, large data was processed, and extensions were made. Plans for 2009 include introducing TectoMT to a larger audience, experimenting with sophisticated tools, and performance tuning.

struck
Download Presentation

TectoMT

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. TectoMT • two goals of TectoMT • to allow experimenting with MT based on deep-syntactic (tectogrammatical) transfer • to create a software framework into which various NLP software components could be integrated and tested within real life applications (such as MT) • developed at UFAL since 2005 • around 10 programmers using (and contributing to) TectoMT in 2008

  2. Reminder 1: MT pyramidin terms of PDT layers • Key question in MT: optimal level of abstraction? • Our answer: somewhere around tectogrammatics • high generalization over different language characteristics, but still computationally (and mentally!) tractable

  3. MT triangle: interlingua tectogram. surf.synt. morpho. raw text. source target language language Reminder 2:MT pyramid in TectoMT • modularity is emphasized in TectoMT  the MT task is implemented as a sequence of reusable NLP modules (called blocks) • around 80 blocks in the current version of English-Czech translation

  4. What is new in TectoMT in 2008? • new blocks added • new applications created • large data processed and used

  5. New blocks in TectoMT in 2008 • around 100 new blocks in 2008 • two types of extensions: • adding alternative (usually higher-performance) solutions to already implemented blocks, e.g. • McDonald's parser (Collins' parser and constituency-to-dependency conversion integrated already in 2005), • MORCE tagger (previously integrated taggers: TnT, MxPost, Jan Hajič's tagger, Lingua::EN::Tagger, Schmid's Tree Tagger) • blocks for new tasks • relatively isolated tasks such as Named Entity recognition in Czech and English • sequence of blocks for English sentence synthesis

  6. New applications of TectoMT in 2008 • existing: • real-time tecto-analysis of Czech sentences integrated in tree editor TrEd • English sentence generator (within the Companions project) • sentence analysis for various purposes (intonation in TTS, information extraction) • segmentation of text into finite verb clauses • preprocessing of English text for the purpose of English-to-Hindi translation • pilot version in the very near future • simple man-machine dialog manager • Czech-to-English MT

  7. Processing of large datain TectoMT • roughly 1GW of Czech texts • analyzed up to simplified tecto • for the purposes of modeling Czech sentences or their trees (functions as the target-side language model in our translation scenario) • roughly 60MW of parallel Czech-English texts from the Czeng corpus • analyzed up to simplified tecto and aligned • serves for generating several types of translation models

  8. Plans for 2009 • introduce TectoMT to a larger audience (MT Marathon 2009) • experiment with more sophisticated tools during the tecto-transfer phase (loglinear combinations of translation and target-language tree models, tree HMM) • facilitate addition of new languages to be processed in TectoMT • performance tuning (now: roughly 1 translated sentence per second)

More Related