ETSI STQ Aurora Distributed Speech Recognition (DSR)

ETSI STQ Aurora Distributed Speech Recognition (DSR) Distributed Speech Recognition Dieter Kopp Alcatel Research & Innovation email:Dieter.Kopp@alcatel.de

DSR system vision

ETSI STQ Aurora • Participants • Alcatel, AT&T, British Telecom, Ericsson, France Telecom, Hewlett Packard, Motorola, Nokia, Qualcomm, Siemens, Sony, Texas Instruments, IBM, Conversay, etc. • MEL-Cepstrum DSR Front-End & Compression • Complete - ETSI standard published in February 2000 • Advanced Noise Robust DSR Front-End • Current activity - standard expected in 2002 • DSR Application & Protocols • Architecture definition, Client /Server protocol specification & contribution to other standardization group

ETSI STQ Aurora Front- End Standardization

DSR Elements

Performance Enhancement with DSR Telephone Application & DSR

Benefit of DSR for IP transmission • Worst performance obtained using speech codec • Speech Recognition over IP using DSR has at 50% packet lost only 3% recognition rate degradation compared to 63% for coded speech transmission (Simulation done by BT)

Advanced Noise Robust DSR Front-End • Goals: • Standardization of a Noise Robust DSR Front-End algorithm under following conditions: • 50% recognition rate improvement compared to the existing DSR Front-End standard • Latency below 250ms • Complexity below 17wMOPs • Selection process using: • Aurora database, SpeechDatCar (top 2/3 cluster selection) • Large vocabulary database (final winner)

ETSI STQ Aurora Application & Protocols

Application & Protocols Subgroup • Definition of DSR scenarios for applications • Information applications • Voice portals (flight, weather, news, movies) • Location-specific information • Voice Navigation of maps • Transaction-based applications • Finance • e-commerce (various) • Information capture • Dictation • Form filling

Application & Protocols Subgroup • Specification of the Client /Server architecture • Specification of the communications elements (voice transport interface, synchronization between Client/Server, etc.) • Contribution to other standardization groups • Participants:Alcatel, British Telecommunications, Ericsson, HP, IBM, ICSI, Intel Labs, Motorola, Nokia, Qualcomm, SpeechWorks, Temic/Daimler Chrysler, TI, Verbaltek, WaveMakers, Philips, etc.

ETSI/STQ-Aurora Protocol & Application Voice Recognition URL Voice page Graphic I/O DSR Speech output GUI page Speech output Mobile Network Open & establish connection, Capability negotiation Connection to DSR Back-End Server Pre-processing data, Speech output, contents exchange

Applications for Multi-modal Distributed Speech Recognition • Advanced Applications towards 3G terminals

Multi-modal User Interaction Output:Speech, Display Capability Feedback/ Interaction Application PresentationManager Environment Service Request Dependent on the environment (background noise) and the user preferences more or less speech I/O could be used Input:Speech, Key, Pen, etc. User Profile

1 Tell me todays schedule! 3 2 Who will participating the 9 o’clock phone call? You have meetings at 9, 11:30 and 1 p.m.. You have have two meeting requests. Details: 9 until 10 o’clock, phone-conference MAP 10:30 possible meeting with M. Hauser Marketing, 11:30 until 12:30 lunch ... Tuesday, 26.6.2001 8:30 9:00 MAP TP 4 9:30 phone conference 10:00 10:30 ? M. Hauser 11:00 ? Marketing 11:30 Lunch 12:00 12:30 1:00 department conv. Mobile `02 How may I help you? Menu WAP Select 4 9:00 e-business O’Neill, Scott Dumont, Denise 5 Invite Jim Mason! Scenario: Personal Information Manager

DSR decoder Audio Codec (s) Conversational Engines DSR encoder Audio drivers Voice Browser Audio I/O DOM Wrapper DOM GUI Browser Wrapper GUI drivers Content Server MM Shell GUI I/O Multi-modal Architecture Network Server Communication Manager Voice Transport Interface Gateway and router with Voice transport and Synchronization Support Network Transport Layer Network Transport Layer Synchronization Interface Synchronization Protocols Data Transport Interface HTTP

P&A next steps • Voice Transport protocol specification and contribution to 3GPP • Definition of the Multi-modal Shell function. How the synchronization could be managed • Liaison offer to W3C for the standardization of the DOM interface for VoiceXML • Contribution to W3C Multi-modality group with ETSI multi-modal architecture • Common interface to all speech recognizers (IBM activity)

Thank You

ETSI STQ Aurora Distributed Speech Recognition (DSR)

ETSI STQ Aurora Distributed Speech Recognition (DSR)

Presentation Transcript

Speech Recognition

Speech Recognition

Speech Recognition

Speech recognition

Speech Recognition

Speech Recognition

Aurora Activities

Speech Recognition

Speech Recognition

In-car Speech Recognition Using Distributed Microphones

Speech Recognition

SPEECH RECOGNITION:

Single and Multi Channel Feature Enhancement for Distant Speech Recognition

Speech Recognition

Speech Recognition

DSR Front-end Extension for Tonal-language Recognition and Speech Reconstruction

Speech Recognition

Speech Recognition

ETSI STQ Aurora Distributed Speech Recognition (DSR)

Speech Recognition

Speech Recognition

Speech Recognition