current trends in mt
Download
Skip this Video
Download Presentation
Current Trends in MT

Loading in 2 Seconds...

play fullscreen
1 / 28

Current Trends in MT - PowerPoint PPT Presentation


  • 150 Views
  • Uploaded on

Current Trends in MT. Andy Way NCLT, School of Computing, Dublin City University, Dublin 9, Ireland [email protected] www.nclt.dcu.ie/mt/. Overview of Talk. Current Trends From EACL-06 to ACL-07 Topics Country of Origin Ongoing and Future Work at DCU Other Important Research

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Current Trends in MT' - Rita


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
current trends in mt

Current Trends in MT

  • Andy Way
  • NCLT, School of Computing,
  • Dublin City University,
  • Dublin 9, Ireland
  • [email protected]
  • www.nclt.dcu.ie/mt/
overview of talk
Overview of Talk
  • Current Trends
    • From EACL-06 to ACL-07
      • Topics
      • Country of Origin
  • Ongoing and Future Work at DCU
  • Other Important Research
  • Future General Directions
    • Increased convergence within MT
    • Increased convergence between MT and rest of NLP
  • Concluding Remarks

NCLT, Dublin, April 2007

current trends
Current Trends

EACL-06 MT Track featured 24 papers in a number of areas:

NCLT, Dublin, April 2007

slide4

Current Trends: Country of Origin

  • Of the 24 MT papers:
  • 18 (75%) were from Europe
        • 6 from UK
        • 6 from Spain
        • 3 from Germany
        • 1 each from Romania, Italy & Ireland
  • 6 (25%) were from N. America (5 from USA)
  • 0 were from Asia

NCLT, Dublin, April 2007

slide5

Current Trends: Success Rates (by Country)

  • Of the 24 MT papers, 7 (29%) were accepted (general EACL acceptance rate 19.7%: 52/264)
        • 2 from USA (out 0f 5)
        • 2 from Germany (out of 3)
        • 1 from UK (out of 6)
        • 1 from Romania (out of 1)
        • 1 from Canada (out of 1)

NCLT, Dublin, April 2007

slide6

Current Trends: Success Rates (by Topic)

  • Of the 7 accepted MT papers
        • 2 were on SMT (out of 8)
        • 2 were on word alignment (out of 4)
        • 2 were on evaluation (out of 5)
        • 1 was on hybrid MT (out of 1)

NCLT, Dublin, April 2007

current trends1
Current Trends

ACL-07 MT Track features 67 papers in a number of areas:

NCLT, Dublin, April 2007

current trends2
Current Trends

ACL-07 SMT Track features 29 papers in a number of areas:

NCLT, Dublin, April 2007

current trends summary of themes
Current Trends: Summary of Themes
  • Of the 67 MT papers:
      • 54 (80%) involve corpus-based MT
      • 9 (13%) involve evaluation
      • 3 (4%) involve RBMT

NCLT, Dublin, April 2007

current trends country of origin
Current Trends: Country of Origin
  • Of the 67 MT papers:
  • 32 (48%) are from Asia
  • 19 (28%) are from N. America (18 from USA)
  • 16 (24%) are from Europe

NCLT, Dublin, April 2007

current trends country of origin1
Current Trends: Country of Origin

Of the 32 papers from Asia:

NCLT, Dublin, April 2007

current trends country of origin2
Current Trends: Country of Origin

Of the 16 papers from Europe:

NCLT, Dublin, April 2007

change 06 07 by topic
Change 06—07 (by Topic)

NCLT, Dublin, April 2007

change 06 07 by country
Change 06—07 (by Country)

NCLT, Dublin, April 2007

current trends success rates by country
Current Trends: Success Rates (by Country)
  • Of the 67 MT papers, 17 were accepted accepted (25.4%; overall acceptance rate 22.4%) from the following countries:
      • USA: 8 (out of 18)
      • China: 3 (out of 20)
      • Ireland: 2 (out of 3)
      • UK: 2 (out of 2)
      • Canada: 1 (out of 1)
      • Singapore: 1 (out of 1)

NCLT, Dublin, April 2007

current trends success rates by topic
Current Trends: Success Rates (by Topic)
  • Of the 17 successful MT papers:
      • 3 were on language modelling/decoding
      • 2 were on evaluation
      • 2 were on word alignment
      • 2 were on reordering
      • 1 was on word-sense disambiguation
      • 1 was on treestring models
      • 1 was on SMT via pivot languages
      • 1 was on multi-parallel corpora
      • 1 was on hybrid MT
      • 1 was on transductive learning

NCLT, Dublin, April 2007

consequences of these trends
Consequences of these Trends
  • The ‘system’ is at breaking point
    • Do we need a pre-selection phase?
  • As in many other areas, a ‘new world order’ is emerging
    • There is very little internal QA as yet
    • Standard of English and basic structure is lacking
    • But … they’re doing OK already, and they’ll improve!
  • Relatively few ‘world centres’ in MT at present
  • Despite massive increase in MT use, big decrease in teaching of MT – paradox!

NCLT, Dublin, April 2007

ongoing work in dcu
Ongoing Work in DCU
  • Integrating Syntax into SMT
    • Supertag translation and target language models
    • Adding source language information
    • Tree-to-Tree Translation (DOT, LFG-DOT: also treestring models), inc. porting monolingual parsing techniques to the bilingual case
  • Applications
    • Automatic Translation of DVD subtitles
    • Sign-Language MT
    • Large-Scale Open Evaluation (inc. parallel computation)
  • New Language Pairs, Corpora etc.

NCLT, Dublin, April 2007

system development
System Development

NCLT, Dublin, April 2007

ongoing work in dcu cont d
Ongoing Work in DCU (cont’d)
  • Dependency- (and Semantically) Marked-Up Corpora
  • New models of Word Alignment
  • New integrated models of subtree/substring alignment
  • New dependency-based Evaluation metrics
  • New Decoders
    • EBMT
    • Memory-Based
  • Open-Source Components

NCLT, Dublin, April 2007

ongoing work in dcu cont d1
Ongoing Work in DCU (cont’d)

Collaborative work:

  • Tilburg (Memory-based Decoding)
  • Donostia (Basque MT)
  • Aachen (Sign-Language MT)
  • Amsterdam (Integrating Syntax & SMT)
  • St. Andrew’s (DOT)
  • Edinburgh (SMT)
  • CMU (Hybrid SMT—EBMT)

NCLT, Dublin, April 2007

future work in dcu
Future Work in DCU
  • Spoken Language Translation

NCLT, Dublin, April 2007

future work in dcu1
Future Work in DCU
  • MT via SMS
  • Automatic Interpreting
  • Enhanced hybrid models
  • Scalability
  • Tuning MT to text type & genre
  • MT using Pivot languages (‘triangulation’)
  • Better quality phrases (cf. CONLL monolingual chunking shared task)

NCLT, Dublin, April 2007

future general directions
Future General Directions
  • Corpus Building (integrating syntax, semantics … discourse …)
    • cf. data size vs. data quality …
    • Filtering/pruning training data (‘safe’ alignments)
  • Word Alignment
  • Language Modelling
  • Decoding
  • Evaluation Methods
  • Large-scale Open Evaluations
  • Further Convergence between models

NCLT, Dublin, April 2007

dekai wu s 3d mt space
Dekai Wu’s 3D MT Space

NCLT, Dublin, April 2007

convergence between mt and rest of nlp
Convergence between MT and Rest of NLP
  • For some time now not many MT researchers doing syntax and vice-versa.
  • With move (back) to trees instead of strings:
    • Reconnect with wealth of tree automata literature
    • Get lots of implemented algorithms for free!

NCLT, Dublin, April 2007

concluding remarks
Concluding Remarks

So … there’s plenty for us still to do!

Two worries:

  • MT R&D seems to be at an all-time high, yet we’re not teaching MT any more.
  • Most (S)MT people come from different backgrounds, but huge danger that some people are merely reinventing the wheel …

NCLT, Dublin, April 2007

thanks
Thanks!

The end

beginning!

NCLT, Dublin, April 2007

ad