Network-Based Speech Translation: Connecting Languages with S2ST Technology

ITU-T Workshop on“Telecommunications relay services for persons with disabilities”(Geneva, 25 November 2011) Telecommunications Relay Services in Speech-to-Speech translation system in accordance with Recommendations F.745 and H.625 Chiori Hori Ph.D. Spoken Language Communication Laboratory National Institute of Information and Communications Technology (NICT)

Telecommunications Relay Services in Speech-to-Speech translation MC server MC server MC server MC server MC server MC server ASR server TTS server ASR server MT server TTS server MT server in accordance with ITU-T Recommendations F.745 and H.625 Conversion from text in B to text in A Conversion from text in A to text in B Conversion from text in A to speech signal Conversion from speech signal to text in Language B Conversion from speech signal to text in Language A Conversion from text in B to speech signal Network-based S2ST systems Speech-to-Speech Translation Communicating between more languages can be actualized using S2ST technology by connecting distributed S2ST servers, (i.e., ASR, MT, TTS) all over the world. Speech-to-Speech Translation (S2ST) technologies are an effective means to break through language barriers between people who do not speak the same language. Automatic Speech Recognition (ASR) Speech Synthesis (TTS) Machine Translation (MT) MC client MC client Japanese 「私は学校に行く」 English “I go to school” w a t a shi w a g a xtu k o o n i….. Communication between users who speak different languages I go to school 私は学校に行く Convert from phoneme to word Convert from Japanese text to English text Convert from text to waveform Large amount of training data for machine learning Japanese speech and text corpora Japanese-to-English parallel corpora English speech corpora Network Speaker of Language B Speaker of Language A Digitalization of speech signals Digitalization of speech signals

Remote Communication お父さん，お母さんお元気ですか？ Papa, maman, comment vas-tu? Network-based Speech Translation System in accordance with ITU-T Recommendations F.745 and H.625 Network-based S2ST application via multilateral translation on smartphone/tablet/PC/TV English speaker’s device Japanese speaker’s device Chinese speaker’s device On-site communication 飲み水は13：00から市役所前で配給します． Water to drink will be provided in front of the city hall from13：00. 从下午一点开始，在市政府门前供应饮用水。

Network-based Speech Translation System in accordance with ITU-T Recommendations F.745 and H.625 Modality Conversion Markup Language (MCML) XML schema, ITU-T name space (http://www.itu.int/xml-namespace/itu-t/H.645/MCML.xsd) MCML includes information for communication between multiple persons who use different modalities. Ex. speech, text, image, video data input by users or output by MCML servers such as ASR, MT, TTS , Sign Recognition systems. http://www.itu.int/rec/T-REC-F.745-201010-I/en

U-STAR Consortium The Universal Speech Translation Advanced Research (U-STAR) Consortium has been established as an international research collaboration entity with the goal of developing a world wide network-based speech-to-speech translation system. The consortium objective is to create a basic infrastructure for spoken language communication to overcome the language barriers that exist around the world. Currently, there are participant members from 14 countries (15 institutes). Plan for Field experiment Period: One year from April of 2012 including during the 2012 London Olympics Application: Multiparty conversation via a network-based S2ST system on iPhones and Android phones (Free) MCML servers: ASR,MT, TTS servers will be provided by U-STAR members Potential languages: Chinese, Dzongkha, English, Filipino, Hindi, Indonesian Japanese, Korean, Mongolian, Malay Nepali, Sinhala, Thai, Urdu, Vietnamese and some European languages

The U-STAR members

Potential European Language French, German, Italian, Portuguese, Spanish, Turkish, British English

Network-Based Speech Translation: Connecting Languages with S2ST Technology

Network-Based Speech Translation: Connecting Languages with S2ST Technology

Presentation Transcript

Spoken Interactive Open Domain Question Answering System: SPIQA Chiori Hori , Takaaki Hori, Hajime Tsukada and Hideki

Spoken Language

Spoken Language Structure

Spoken Language Processing

Spoken Language

Ethics and Spoken Communication

spoken language

Spoken-Communication Role-Playing

Spoken Language difficulties:

Phonetics and Spoken Language

LABORATORY COMMUNICATION

SPOKEN LANGUAGE COMPREHENSION

Spoken Language Understanding

Spoken Language

Studying spoken language

Wold's Most Spoken language

hori hori

Spoken Language Understanding

Spoken Language Processing

Spoken Language Processing:Summing Up

Spoken Communication Skills

Spoken Language Translation