1 / 9

ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM

ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM. J. Ferreiros, J. Colás, J. Macías-Guarasa, A. Ruiz, J. M. Pardo Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica E.T.S.I. Telecomunicación - Universidad Politécnica de Madrid

kieu
Download Presentation

ICSLP’ 98 CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ICSLP’ 98CONTROLLING A HIFI WITH A CONTINUOUS SPEECH UNDERSTANDING SYSTEM J. Ferreiros, J. Colás, J. Macías-Guarasa, A. Ruiz, J. M. Pardo Grupo de Tecnología del Habla - Departamento de Ingeniería Electrónica E.T.S.I. Telecomunicación - Universidad Politécnica de Madrid Ciudad Universitaria s/n, 28040 Madrid Spain

  2. General Architecture Context Dependent Rules Context Dependent Rules SCHMM + Word Pair Tagged Dictionary HIFI Status Speech Recogniser Tagger Tags Refiner Understanding Actuator IR-LED Text to Speech Speech Generation Module Alternative Expresions

  3. Speech recogniser • Characteristics: • Continuous speech commands • One-pass search with word-pair grammar • 163 words • SCHMM phone models • Implementation: • Front-end: DSP LSI board • Rest of processing: PC

  4. Speech understanding (I) • TAGGER: • 78 semantic tags • several tags applied to each word • “garbage” tag used for no-meaning words • Gives robustness against speech recogniser errors • Will allow OOV in the recognised string “Please, set the volume higher” • Tagging directly specified in the lexicon

  5. Speech understanding (II) • TAGS REFINER: • Aims: • Numbers processing • Disambiguation of words with several tags • “garbage” removal • May change the literal of the words “two five”  “25” • May introduce new refined semantic tags • Context dependent rules word: “right” tags: “position increment” rule: “if there exists any other word tagged as a tape parameter, then the word right is the position of this tape else it is a increment indicator”

  6. Speech understanding (III) • UNDERSTANDING STAGE: • Context dependent rules • Gives independence on the order of the concepts • Trying to fill in frames: SUBSYSTEM=(radio,cd-player,cassette,...) PARAMETER=(volume,tone,broadcast station,song,...) VALUE=(higher,number,...) • One or several frames for each command • More specific rules: first to be executed • We also fill in message strings • With the “reasoning” • With the problems in the understanding stage

  7. Speech understanding (IV) • ACTUATOR: • Sends IR commands to the HIFI set • Keeps track of the set status • Informs the user of the actions performed or the problems found USER: “switch the radio on” ACTUATOR: “The radio was already on”

  8. Speech generation • Input: pattern string of both literals and concepts coming from the rest of the architecture • Performs random concepts substitution by text to achieve a certain degree of naturalness / variety Input: “C_SEEING the word higher with an increment meaning, C_THINK that put means an increasing action” C_SEEING “As I can see", "As I have discovered", "As It appears", ... C_THINK "I think", "I imagine", "I suppose"... • Output through a text-to-speech subsystem

  9. CONCLUSIONS & FUTURE WORK • Supporting ideas of the system: • Semantic-like tagging • Context dependent rules • “garbage” tag • pattern-based generation • random concepts substitution for generation • Desirable new aspects: • Use of more information of the recognised sentences • Handle more complex commands • Introducing semantic-syntactic parsing of the sentence structure • Introduce dialogue to complete not understood or not given information and as a confirmation strategy

More Related