etsi stq aurora distributed speech recognition dsr l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
ETSI STQ Aurora Distributed Speech Recognition (DSR) PowerPoint Presentation
Download Presentation
ETSI STQ Aurora Distributed Speech Recognition (DSR)

Loading in 2 Seconds...

play fullscreen
1 / 18

ETSI STQ Aurora Distributed Speech Recognition (DSR) - PowerPoint PPT Presentation


  • 180 Views
  • Uploaded on

ETSI STQ Aurora Distributed Speech Recognition (DSR). Distributed Speech Recognition. Dieter Kopp Alcatel Research & Innovation email:Dieter.Kopp@alcatel.de. DSR system vision. ETSI STQ Aurora. Participants

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'ETSI STQ Aurora Distributed Speech Recognition (DSR)' - abby


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
etsi stq aurora distributed speech recognition dsr

ETSI STQ Aurora Distributed Speech Recognition (DSR)

Distributed Speech Recognition

Dieter Kopp

Alcatel Research & Innovation

email:Dieter.Kopp@alcatel.de

etsi stq aurora
ETSI STQ Aurora
  • Participants
    • Alcatel, AT&T, British Telecom, Ericsson, France Telecom, Hewlett Packard, Motorola, Nokia, Qualcomm, Siemens, Sony, Texas Instruments, IBM, Conversay, etc.
  • MEL-Cepstrum DSR Front-End & Compression
    • Complete - ETSI standard published in February 2000
  • Advanced Noise Robust DSR Front-End
    • Current activity - standard expected in 2002
  • DSR Application & Protocols
    • Architecture definition, Client /Server protocol specification & contribution to other standardization group
etsi stq aurora4
ETSI STQ Aurora

Front- End Standardization

performance enhancement with dsr
Performance Enhancement with DSR

Telephone Application

& DSR

slide7

Benefit of DSR for IP transmission

  • Worst performance obtained using speech codec
  • Speech Recognition over IP using DSR has at 50% packet lost only 3% recognition rate degradation compared to 63% for coded speech transmission

(Simulation done by BT)

advanced noise robust dsr front end
Advanced Noise Robust DSR Front-End
  • Goals:
    • Standardization of a Noise Robust DSR Front-End algorithm under following conditions:
      • 50% recognition rate improvement compared to the existing DSR Front-End standard
      • Latency below 250ms
      • Complexity below 17wMOPs
  • Selection process using:
    • Aurora database, SpeechDatCar (top 2/3 cluster selection)
    • Large vocabulary database (final winner)
etsi stq aurora9
ETSI STQ Aurora

Application & Protocols

application protocols subgroup
Application & Protocols Subgroup
  • Definition of DSR scenarios for applications
    • Information applications
      • Voice portals (flight, weather, news, movies)
      • Location-specific information
      • Voice Navigation of maps
    • Transaction-based applications
      • Finance
      • e-commerce (various)
    • Information capture
      • Dictation
      • Form filling
application protocols subgroup11
Application & Protocols Subgroup
  • Specification of the Client /Server architecture
  • Specification of the communications elements (voice transport interface, synchronization between Client/Server, etc.)
  • Contribution to other standardization groups
  • Participants:Alcatel, British Telecommunications, Ericsson, HP, IBM, ICSI, Intel Labs, Motorola, Nokia, Qualcomm, SpeechWorks, Temic/Daimler Chrysler, TI, Verbaltek, WaveMakers, Philips, etc.
etsi stq aurora protocol application
ETSI/STQ-Aurora Protocol & Application

Voice

Recognition

URL

Voice page

Graphic I/O

DSR

Speech

output

GUI page

Speech

output

Mobile Network

Open & establish connection, Capability negotiation

Connection to DSR Back-End Server

Pre-processing data, Speech output, contents exchange

slide13

Applications for Multi-modal Distributed Speech Recognition

  • Advanced Applications towards 3G terminals
multi modal user interaction
Multi-modal User Interaction

Output:Speech, Display

Capability

Feedback/

Interaction

Application

PresentationManager

Environment

Service

Request

Dependent on the environment (background noise) and the user preferences more or less speech I/O could be used

Input:Speech, Key, Pen, etc.

User Profile

slide15

1

Tell me todays

schedule!

3

2

Who will participating

the 9 o’clock phone call?

You have meetings at 9, 11:30 and 1 p.m..

You have have two meeting requests.

Details:

9 until 10 o’clock, phone-conference MAP

10:30 possible meeting with M. Hauser Marketing,

11:30 until 12:30 lunch ...

Tuesday, 26.6.2001

8:30

9:00 MAP TP 4

9:30 phone conference

10:00

10:30 ? M. Hauser

11:00 ? Marketing

11:30 Lunch

12:00

12:30

1:00 department conv.

Mobile `02

How may I help you?

Menu WAP Select

4

9:00 e-business

O’Neill, Scott

Dumont, Denise

5

Invite Jim Mason!

Scenario: Personal Information Manager

slide16

DSR decoder

Audio Codec (s)

Conversational

Engines

DSR encoder

Audio drivers

Voice Browser

Audio I/O

DOM

Wrapper

DOM

GUI Browser

Wrapper

GUI drivers

Content

Server

MM Shell

GUI I/O

Multi-modal Architecture

Network

Server

Communication

Manager

Voice

Transport

Interface

Gateway and router with Voice transport and Synchronization Support

Network Transport Layer

Network Transport Layer

Synchronization

Interface

Synchronization

Protocols

Data

Transport

Interface

HTTP

slide17

P&A next steps

  • Voice Transport protocol specification and contribution to 3GPP
  • Definition of the Multi-modal Shell function. How the synchronization could be managed
  • Liaison offer to W3C for the standardization of the DOM interface for VoiceXML
  • Contribution to W3C Multi-modality group with ETSI multi-modal architecture
  • Common interface to all speech recognizers (IBM activity)