1 / 1

Rapid Development in new languages

Recognition/Analysis. Thai / English medical. SR+Parsing (CFG-Grammar). Source Lang Speech. SR+LM. Stat. Analysis SOUP. IF. English / Thai medical. Direct SMT. Symbolic Generation GenKit. Target Lang Speech. Target Language Text. Synthesis Cepstral. Statistical Generation

joy-bray
Download Presentation

Rapid Development in new languages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Recognition/Analysis Thai / English medical SR+Parsing (CFG-Grammar) Source Lang Speech SR+LM Stat. Analysis SOUP IF English / Thai medical Direct SMT Symbolic Generation GenKit Target Lang Speech Target Language Text Synthesis Cepstral Statistical Generation IF2NL A Thai Speech Translation System for Medical Dialogs Tanja Schultz, Dorcas Alexander, Alan W Black, Kay Peterson, Sinaporn Suebvisai, Alex Waibel System Architecture Speech Recognition • Rapid Development in new languages • Limited training data (6hrs) provided by NECTEC from 34 speakers, + 8 spks for development and test • Romanization of Thai script in order to: • allows non-Thai researchers to work with the Roman representation like in the grammar development • romanized output basically provides the pronunciation > easier for speech synthesis component • Current dictionary covers the given 6-hours database = 734 words • Rapid bootstrapping of acoustic models using a 7-lingual GlobalPhone model set (Ch, Cr, Fr, Ge, Ja, Sp, Tu) • Results on ASR indicate that rapid bootstrapping can be done successfully for limited domain (see table) • Word accuracy [%] in Thai language on the evaluation set: • CI-AM 83.63% CD-AM (500) 84.44% CD-AM (1000) 82.71% • Tcl/Tk based Communication Server • Runs on Windows and Linux platforms • Integrates several languages: Thai, English, Spa, Ch, ... • Integrates different speech recognition approaches • Decoding along n-grams versus Context Free Grammars • Integrates different translation approaches • IF-based Translation versus statistical MT • Integrates two natural language generations from IF • knowledge-based generation with the pseudo-unification • statistical generation • Allows transmission of IF across devices for (wireless) multi-party translation (see demo: Laptop  PDA ) • Interface: • Hypothesis • Thai+ Roman script • Parse tree (CFG) • Translation • IF representation Translation Speech Synthesis • Interlingua based Machine Translation component - Interchange Format (IF) • abstracts from variation in syntax across languages • allows monolingual development for analysis and generation • provides paraphrase back into source language • can be easily extended to new languages due to STAR structure • Some extensions due to Thai characteristics: • The use of a term to indicate the gender of the person: • Thai: zookhee kha1 - Eng: okay (ending) • s[acknowledge] (zookhee *[speaker=]) • An affirmation that means more than simply "yes." • Thai: saap khrap - Eng: know (ending) • s[affirm+knowledge](saap *[speaker=]) • Verb separation of terms for feasibility and other modalities • First Thai voice built in the Festival Speech Synthesis System • Limited domain targeting the Hotel Reservation domain • 235 sentence that covered the main aspects of immediate interest • Recorded, auto-labeled, and built a synthetic voice using FestVox tools • Converted to small footprint portable version using Cepstral's Theta engine • Rapid synthesis development in new languages: • Phoneme set shared with Speech Recognition • Lexicon of 522 words vocabulary constructed by hand • Statistically trained letter to sound rules to bootstrap the required word coverage • Unit selection concatenative synthesis • Phones tagged with syllable and tone information for more fluent results

More Related