Current trends in mt
Download
1 / 28

Current Trends in MT - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

Current Trends in MT. Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland away@computing.dcu.ie www.nclt.dcu.ie/mt/. Overview of Talk. Current Trends From EACL-06 to ACL-07 Topics Country of Origin Ongoing and Future Work at DCU Other Important Research

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Current Trends in MT' - Rita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Current trends in mt

Current Trends in MT

  • Andy Way

  • NCLT, School of Computing,

  • Dublin City University,

  • Dublin 9, Ireland

  • away@computing.dcu.ie

  • www.nclt.dcu.ie/mt/


Overview of talk
Overview of Talk

  • Current Trends

    • From EACL-06 to ACL-07

      • Topics

      • Country of Origin

  • Ongoing and Future Work at DCU

  • Other Important Research

  • Future General Directions

    • Increased convergence within MT

    • Increased convergence between MT and rest of NLP

  • Concluding Remarks

NCLT, Dublin, April 2007


Current trends
Current Trends

EACL-06 MT Track featured 24 papers in a number of areas:

NCLT, Dublin, April 2007


Current trends in mt

Current Trends: Country of Origin

  • Of the 24 MT papers:

  • 18 (75%) were from Europe

    • 6 from UK

    • 6 from Spain

    • 3 from Germany

    • 1 each from Romania, Italy & Ireland

  • 6 (25%) were from N. America (5 from USA)

  • 0 were from Asia

  • NCLT, Dublin, April 2007


    Current trends in mt

    Current Trends: Success Rates (by Country)

    • Of the 24 MT papers, 7 (29%) were accepted (general EACL acceptance rate 19.7%: 52/264)

      • 2 from USA (out 0f 5)

      • 2 from Germany (out of 3)

      • 1 from UK (out of 6)

      • 1 from Romania (out of 1)

      • 1 from Canada (out of 1)

    NCLT, Dublin, April 2007


    Current trends in mt

    Current Trends: Success Rates (by Topic)

    • Of the 7 accepted MT papers

      • 2 were on SMT (out of 8)

      • 2 were on word alignment (out of 4)

      • 2 were on evaluation (out of 5)

      • 1 was on hybrid MT (out of 1)

    NCLT, Dublin, April 2007


    Current trends1
    Current Trends

    ACL-07 MT Track features 67 papers in a number of areas:

    NCLT, Dublin, April 2007


    Current trends2
    Current Trends

    ACL-07 SMT Track features 29 papers in a number of areas:

    NCLT, Dublin, April 2007


    Current trends summary of themes
    Current Trends: Summary of Themes

    • Of the 67 MT papers:

      • 54 (80%) involve corpus-based MT

      • 9 (13%) involve evaluation

      • 3 (4%) involve RBMT

    NCLT, Dublin, April 2007


    Current trends country of origin
    Current Trends: Country of Origin

    • Of the 67 MT papers:

    • 32 (48%) are from Asia

    • 19 (28%) are from N. America (18 from USA)

    • 16 (24%) are from Europe

    NCLT, Dublin, April 2007


    Current trends country of origin1
    Current Trends: Country of Origin

    Of the 32 papers from Asia:

    NCLT, Dublin, April 2007


    Current trends country of origin2
    Current Trends: Country of Origin

    Of the 16 papers from Europe:

    NCLT, Dublin, April 2007


    Change 06 07 by topic
    Change 06—07 (by Topic)

    NCLT, Dublin, April 2007


    Change 06 07 by country
    Change 06—07 (by Country)

    NCLT, Dublin, April 2007


    Current trends success rates by country
    Current Trends: Success Rates (by Country)

    • Of the 67 MT papers, 17 were accepted accepted (25.4%; overall acceptance rate 22.4%) from the following countries:

      • USA: 8 (out of 18)

      • China: 3 (out of 20)

      • Ireland: 2 (out of 3)

      • UK: 2 (out of 2)

      • Canada: 1 (out of 1)

      • Singapore: 1 (out of 1)

    NCLT, Dublin, April 2007


    Current trends success rates by topic
    Current Trends: Success Rates (by Topic)

    • Of the 17 successful MT papers:

      • 3 were on language modelling/decoding

      • 2 were on evaluation

      • 2 were on word alignment

      • 2 were on reordering

      • 1 was on word-sense disambiguation

      • 1 was on treestring models

      • 1 was on SMT via pivot languages

      • 1 was on multi-parallel corpora

      • 1 was on hybrid MT

      • 1 was on transductive learning

    NCLT, Dublin, April 2007


    Consequences of these trends
    Consequences of these Trends

    • The ‘system’ is at breaking point

      • Do we need a pre-selection phase?

    • As in many other areas, a ‘new world order’ is emerging

      • There is very little internal QA as yet

      • Standard of English and basic structure is lacking

      • But … they’re doing OK already, and they’ll improve!

    • Relatively few ‘world centres’ in MT at present

    • Despite massive increase in MT use, big decrease in teaching of MT – paradox!

    NCLT, Dublin, April 2007


    Ongoing work in dcu
    Ongoing Work in DCU

    • Integrating Syntax into SMT

      • Supertag translation and target language models

      • Adding source language information

      • Tree-to-Tree Translation (DOT, LFG-DOT: also treestring models), inc. porting monolingual parsing techniques to the bilingual case

    • Applications

      • Automatic Translation of DVD subtitles

      • Sign-Language MT

      • Large-Scale Open Evaluation (inc. parallel computation)

    • New Language Pairs, Corpora etc.

    NCLT, Dublin, April 2007


    System development
    System Development

    NCLT, Dublin, April 2007


    Ongoing work in dcu cont d
    Ongoing Work in DCU (cont’d)

    • Dependency- (and Semantically) Marked-Up Corpora

    • New models of Word Alignment

    • New integrated models of subtree/substring alignment

    • New dependency-based Evaluation metrics

    • New Decoders

      • EBMT

      • Memory-Based

    • Open-Source Components

    NCLT, Dublin, April 2007


    Ongoing work in dcu cont d1
    Ongoing Work in DCU (cont’d)

    Collaborative work:

    • Tilburg (Memory-based Decoding)

    • Donostia (Basque MT)

    • Aachen (Sign-Language MT)

    • Amsterdam (Integrating Syntax & SMT)

    • St. Andrew’s (DOT)

    • Edinburgh (SMT)

    • CMU (Hybrid SMT—EBMT)

    NCLT, Dublin, April 2007


    Future work in dcu
    Future Work in DCU

    • Spoken Language Translation

    NCLT, Dublin, April 2007


    Future work in dcu1
    Future Work in DCU

    • MT via SMS

    • Automatic Interpreting

    • Enhanced hybrid models

    • Scalability

    • Tuning MT to text type & genre

    • MT using Pivot languages (‘triangulation’)

    • Better quality phrases (cf. CONLL monolingual chunking shared task)

    NCLT, Dublin, April 2007


    Future general directions
    Future General Directions

    • Corpus Building (integrating syntax, semantics … discourse …)

      • cf. data size vs. data quality …

      • Filtering/pruning training data (‘safe’ alignments)

    • Word Alignment

    • Language Modelling

    • Decoding

    • Evaluation Methods

    • Large-scale Open Evaluations

    • Further Convergence between models

    NCLT, Dublin, April 2007


    Dekai wu s 3d mt space
    Dekai Wu’s 3D MT Space

    NCLT, Dublin, April 2007


    Convergence between mt and rest of nlp
    Convergence between MT and Rest of NLP

    • For some time now not many MT researchers doing syntax and vice-versa.

    • With move (back) to trees instead of strings:

      • Reconnect with wealth of tree automata literature

      • Get lots of implemented algorithms for free!

    NCLT, Dublin, April 2007


    Concluding remarks
    Concluding Remarks

    So … there’s plenty for us still to do!

    Two worries:

    • MT R&D seems to be at an all-time high, yet we’re not teaching MT any more.

    • Most (S)MT people come from different backgrounds, but huge danger that some people are merely reinventing the wheel …

    NCLT, Dublin, April 2007


    Thanks
    Thanks!

    The end

    beginning!

    NCLT, Dublin, April 2007