1 / 36

TC-STAR Technology and Corpora for Speech to Speech Translation Introduction & Major Achievement

VI Framework Programme Priority: Multimodal Interfaces IST - 2002 - 2.3.1.6. TC-STAR Technology and Corpora for Speech to Speech Translation Introduction & Major Achievement. Luxembourg, 29 May 2007 Project Coordinator Gianni Lazzari FBK-IRST. Contract n° FP6 506738.

rchaney
Download Presentation

TC-STAR Technology and Corpora for Speech to Speech Translation Introduction & Major Achievement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VI Framework Programme Priority: Multimodal Interfaces IST - 2002 - 2.3.1.6 TC-STARTechnology and Corpora for Speech to Speech Translation Introduction & Major Achievement Luxembourg, 29 May 2007 Project Coordinator Gianni Lazzari FBK-IRST Contract n° FP6 506738

  2. Language Technology in Europe • Europe: Language technology is an economic, political and cultural necessity. • Breaking the language barrier would boost communication and the economy. • While LT is already the focus of considerable European research effort, the strategic importance of the technology to Europe warrants a much higher priority on the research & innovation agenda. TC-STAR

  3. An action towards a multilingual Europe • Communication across languages and cultures has become vitally important for trade, especially now with the globalization of the economy through the internet. • Many Europeans speak two or more languages. But about half of European Union citizens speak no language other than their own. • Wouldn’t it be a great progress if Europeans who do not speak the same language could talk easily with each other? • Any improvement in communication would be progress with respect to the current situation. TC-STAR

  4. Research on speech to speech translation • Human language, both in its spoken and its written form, has been worked on in scientific research in recent decades by thousands of researchers worldwide. TC-STAR

  5. Research on speech to speech translation TC-STAR

  6. SST projects in the last 30 years • Restricted domain of discourse • Pioneers • C-STAR Consortium for Advance Research in Speech Translation • IBM (statistical machine translation) • Demonstration orientedprojects • C-STAR II – VERBMOBIL - NESPOLE! -BABYLON – DIGITAL OLIMPICS –EUTRANS • Technology orientedprojects • TIDES* - C-STAR III (IWSLT) - CESTA • Unrestricted domain of discourse • TC-STAR • GALE * only text translation TC-STAR

  7. Partners TC-STAR

  8. TC-STAR project at a glance • 2002 February Specific Support Action SSA proposal TC-STAR_P submitted for preparing an IP on speech to speech translation technologies • 2002 September TC-STAR_P started • 2003 April Proposal TC-STAR submitted • 2004 April TC-STAR project started • 2005 April Trento 1st Evaluation campaign& workshop TC-STAR

  9. TC-STAR project at a glance • 2006 April Trento “Open Lab” training workshop • 2006 May “Human Language Technology for Europe” public report signed by Viviane Reding and Jan Figuel • 2006 June Barcelona 2ndevaluation campaign & workshop • 2006 November Helsinki IST 2006 • 2007 March Aachen 3rd evaluation campaign & workshop • 2007 March TC-STAR project end TC-STAR

  10. Vision Transcription and Translation of broadcast news, speeches, lectures and interviews Hi, What do you think about Simultaneous Translation Vocal access Web access TC-STAR

  11. Scenario • TC-STAR: a research project on speech to speech translation to cross the language barrier TC-STAR

  12. Application Scenario of TC-STAR • A selection of unconstrained conversational speech domains: • Broadcast news • European Parliament Speeches • A few languages important for Europe society and economy: • European Accented English • European Spanish • Mandarin TC-STAR

  13. Objectives • The objectives of the project were extremely ambitious: • making a breakthrough in SST research to significantly reduce the performance gap between human and machine performance in six years time • measuring in a “objective way” the performance TC-STAR

  14. TC-STAR Project To achieve the project’s ambitious goals in the time frame, the partners adopted the strategic approach of comparative evaluation of the technologies. - Each partner developing a technology (or a technology component) was requested to participate in periodic evaluations organized within the project. - A critical mass of researchers is necessary in order to cope with the challenges and to accelerate the rate of progress For this reason, the operational model of TC-STAR was centered on two key elements designed to speed up the rate of progress: - competitive evaluation - sharing of knowledge TC-STAR

  15. Audio and text files Es 200 hours of audio 200 Million words 90% TRAINING 10% TEST Comparative Evaluation Approach 1° PHASE : Data Collection and Task Definition 2° PHASE : ALGORITHMS & COMPONENTS ASR SLT TTS Performance measures % correct 3° PHASE: EVALUATION WORKSHOP Presentation & Discussions of Results List of Results of Competition TC-STAR

  16. Evaluation Infrastructure • An evaluation infrastructure was implemented through the organization of periodic competitive evaluations of single components for ASR, SLT, TTS and end-to-end systems. • Yearly evaluation campaigns were planned to measure progress by all partners on common Language Resources (LRs) and under equal conditions. Improvements in methods and technologies were systematically demonstrated on common test sets using common evaluation metrics. • Improvements were measured against state-of-the-art reference baselines established by the project. TC-STAR

  17. Sharing of Knowledge • Aside from assessing each partner’s progress, the main role of the evaluations was to provide useful information to the Consortium about the methods and models developed by each single partner. • For this purpose, an evaluation workshop was organized after each evaluation campaign. • Beyond the competitive framework, evaluations allowed direct comparison of alternative methods, models, and implementations. • Significant performance improvements were achieved via system and/or component combination, by taking advantage of partner complementarities TC-STAR

  18. Results • During the three years of TC-STAR, impressive performance improvements in spoken language translation were achieved, …… there is still a long way to go. • TC-STAR has created the infrastructure needed for accelerating the rate of progress in the field. • It has collected the data needed for the data driven methodology pursued in the project, and it has implemented an evaluation infrastructure based on competitive evaluation. TC-STAR

  19. Results • This evaluation driven approach ensures that research progress is recognized as such and that the methods developed by the various project partners can be compared and appropriately validated. • Together with a mixture of cooperation and healthy competition, the adopted approach should maximize scientific progress. • At the same time the Consortium benchmarks within the external scientific community and project partners have reached top ranks in the recent international evaluations. TC-STAR

  20. Scientific advancements • To the best of our knowledge, this was the first time that the problem of speech translation was studied for a real-life task with an unrestricted domain like the plenary speeches given in the European Parliament. TC-STAR

  21. Scientific advancements • Before TC-STAR, the general opinion had been that such a task was too difficult and too ambitious. • Earlier projects on speech translation had worked on limited-domain small-vocabulary tasks like traveling and tourism. TC-STAR

  22. Progress measured through periodic evaluations • The progress of the work was monitored in regular formal evaluation campaigns that were organized by ELDA and that were open to external participants. The evaluation was applied to both the translation, recognition and synthesis tasks in isolation and to the full chain ASR-SLT-TTS. To the best of our knowledge, these evaluation campaigns were the first worldwide to perform such types of performance analysis. TC-STAR

  23. Progress measured through periodic evaluations • Translation quality was measured by means of human judgements and automatic accuracy metrics, such as the BLEU score, which shows to correlate well with the human judgements. • Over the three-year duration of the project, the translation quality of the various translation tasks was improved dramatically. If we use the BLEU measure and compare the performances at the project start and end, the relative improvements vary from 40% to 60%, depending on the specific task and conditions. TC-STAR

  24. Progress measured through periodic evaluations • Although the performance is far from being comparable with professional human translators, the error rates on the European Parliament data are unexpectedly low for such a real life task. More specifically, the best translation systems can get about 70% of the words correct, when we ignore the word positions. TC-STAR

  25. Participation in international evaluation campaigns • Some of the TC-STAR systems participated in international evaluations campaigns, such as the NIST evaluation for MT (Arabic/Chinese to English) and the IWSLT evaluation for spoken language translation (Chinese/Japanese/... to English). TC-STAR systems were consistently among the top performing systems. TC-STAR

  26. No other superior systems or technologies around • The results of both the public and the GALE evaluation campaigns show that the TC -STAR systems are state-of-the-art systems in machine translation and that there is no better technology for MT around. • Subjective measures have been used for evaluation of speech synthesis. Human evaluation has also been used to evaluate spoken translation. TC-STAR

  27. Training Initiative FBK-IRST organized the Open Lab on Speech Translation in Trento, Italy from 30 March to 1 April 2006. This event was a training initiative for students and young researchers with the aim to expanding the TC-STAR research community in ASR and SLT research. A detailed report on this initiative is archived in the internal area of the OPEN LAB Web Site at the following URL: http://www.tc-star.org/openlab2006/ TC-STAR

  28. Open Source Initiatives The OPEN SOURCE initiatives carried out during the TC-STAR project contributed in a relevant way to the dissemination of the project results to the scientific community. The detailed list of the initiative is reported at the following web page: http://www.tc-star.org/pages/opensource.htm TC-STAR

  29. Policy Initiatives • A Booklet on Human Language Technologies for Europe has been published in May 2006 (available in English, Italian, German and French). The report has been signed by the two commissioners Viviane Reding and Jean Figel. The report has been distributed overall Europe. It can be downloaded from the Europa portal: http://europa.eu/languages/en/document/88/17 TC-STAR

  30. Policy Initiatives • Another important policy event organized by TC-STAR in November 2006 in Helsinki in the framework of the IST conference - special session Multilinguism and Language Technology a Challenge for Europe. TC-STAR

  31. Policy Initiatives • Further policy action was the seminar “Human Language Technology for Europe – A regional perspective” held on Wednesday, 28th February 2007, at 10 a.m. at the Representative Office of the European Region Tyrol – South Tyrol – Trentino, 45-47, rue de Pascale, 1040 Bruxelles. This event was organized with the collaboration of the Autonomous Province of Trento and the Autonomous Province of Bozen/South Tyrol. TC-STAR

  32. Dissemination towards general public TC-STAR Websitewww.tc-star.org • The Project Website was developed at the beginning of the Project and it was continuously updated. • Many newspapers, magazines and TV reported on TC-STAR projects and results. TC-STAR

  33. Dissemination towards general public • A video-clip :The Video Clip released during the first project year was translated adding the subtitles in the following languages: German, Spanish, Dutch and Finnish. The TC-STAR video clip is available on DVD support and on the project web site TC-STAR

  34. Scientific Dissemination • The Consortium was also involved in satellite workshops at important conferences and in the co-organization of international workshops as ISWLT in 2005, 2006 and 2007. • The Consortium published scientific results on the major scientific journal during the lifetime of the project and afterwards (about 200 publications). • A joint journal paper on systems combination has been submitted to IEEE TASLP. TC-STAR

  35. In Conclusion • Mobilization of a critical mass of research on a strategic topic for Europe • Creation of an European evaluation infrastructure • To the best of our knowledge, this was the first time that the problem of speech translation was studied for a real-life task with an unrestricted domain like the plenary speeches given in the European Parliament. • The results of both the public and the GALE evaluation campaigns show that the TC -STAR systems are state-of-the-art systems in machine translation and that there is no better technology for MT around • A policy towards a multilingual Europe has been proposed TC-STAR

  36. TC-STAR

More Related