Speech in net
Download
1 / 28

speech in - PowerPoint PPT Presentation


  • 251 Views
  • Updated On :

Speech in .NET. Sphinx CMU November 2002. Presenter. casey chesnut brains-N-brawn.com Web Services Mobile / Wireless Speech. Audience. Java / C++ / VB / C# ? VoiceXml ? SALT / Speech .NET ?. Outline. MS Technologies VoiceXml Demo Speech .NET Demo Future Questions (throughout)

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'speech in ' - libitha


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Speech in net l.jpg

Speech in .NET

Sphinx CMU

November 2002


Presenter l.jpg
Presenter

  • casey chesnut

  • brains-N-brawn.com

    • Web Services

    • Mobile / Wireless

    • Speech


Audience l.jpg
Audience

  • Java / C++ / VB / C# ?

  • VoiceXml ?

  • SALT / Speech .NET ?


Outline l.jpg
Outline

  • MS Technologies

  • VoiceXml

    • Demo

  • Speech .NET

    • Demo

  • Future

  • Questions (throughout)

  • ~25 slides


Ms technologies l.jpg
MS Technologies

  • Tools

  • Devices

    • Phone

    • Desktop PC

    • Pocket PC

    • Tablet PC


Tools l.jpg
Tools

  • MS Agents

  • SAPI / Speech SDK 5.1 (.NET wrappable)

  • Office

  • AutoPC ???

  • ASP .NET (VoiceXml)

  • (beta) Speech .NET / IE Speech Add-In

  • … SALT Telephony gateway (early 2003)

  • … Pocket IE Speech Add-In (mid 2003)


Devices l.jpg
Devices

  • Phone

    • billions of devices, people are comfortable speaking to

  • Desktop PC

    • large market, speech input is slower and uncomfortable

  • Pocket PC

    • small market, opportunities for speech (device limitations)

  • Tablet PC

    • new market, speech friendly (slate models don’t have keyboards)


Phone l.jpg
Phone

  • ASP .NET w/ VoiceXml 2.0

    • Production quality now

    • Multiple vendor support

  • Speech .NET VoiceOnly

    • Currently no way to deploy and test over a phone

    • Speech .NET Beta 2 has telephony simulation

    • MS target market for Speech .NET


Desktop pc l.jpg
Desktop PC

  • Web

    • Speech .NET MultiModal

      • Beta 2 IE Speech Add-In

    • Embedded control w/SAPI

    • MS Agents

  • Fat

    • SAPI

    • MS Agents


Pocket pc l.jpg
Pocket PC

  • Web

    • SALT Pocket IE Speech Add-Ins (mid 2003)

  • Fat

    • 3rd parties only

    • MS Reader does not support TTS


Tablet pc today l.jpg
Tablet PC - TODAY!

  • Web

    • … same as desktop PC

    • Beta 2 has added support for Tablet PC

    • Virtual keyboard has speech control

  • Fat

    • … same as desktop PC

    • Virtual keyboard has speech control

    • MS Reader should be able to support TTS

    • Digital Ink is currently more compelling to MS


Voicexml l.jpg
VoiceXml

  • XML-based language

    • Declarative – XML tags, grammars

    • Procedural – Javascript

      • Telephony Gateway is the client

    • Event driven – Bargein, Goodbye

    • Object oriented – Properties


Usage l.jpg
Usage

  • Input

    • Speech Recognition (Command and Control)

    • DTMF

    • Voice recording and posting to a server

  • Output

    • Text-To-Speech

    • Prerecorded audio files

  • Telephony control

    • Hang-up, Transfers, …



Voicexml15 l.jpg
VoiceXml

  • DEMO

    • /vxml (VS.NET)

    • Mobile ADK (menu1.aspx)

    • BeVocal


Voicexml salt l.jpg
VoiceXml - SALT

  • VoiceXml : ??? : : SALT : Speech .NET

    • Nuance has some WYSIWYG

  • SALT is considered lightweight to VoiceXml

  • SALT was submitted to W3C August 2002

  • VoiceXml is v2.0 in W3C

    • Mandatory W3C grammar spec

      • Beta 2 Speech .NET has moved to W3C SRGS

  • VoiceXml has complementary specs (ccXml)

  • VoiceXml is moving to MultiModal as well


Voicexml salt17 l.jpg
VoiceXml - SALT

  • VoiceXml = AT&T, Motorola, TellMe, (IBM)

  • SALT = MS, SpeechWorks, Intel, (BeVocal)

  • VoiceXml has multiple vendor support with venture capital from before the burst

  • Most vendors will support both specs

  • VoiceXml has ~ 15,000 developers

  • SALT has potentially millions


Slide18 l.jpg
SALT

  • I have not read the new spec 

  • Remember doing an in-head mapping to VoiceXml when reading an early spec

  • Why

    • Common spec for MultiModal operation

    • Multiple modes of interaction with the same syntax

    • Speech enabling existing sites

  • Why not VoiceXml

    • MultiModal retrofit harder than redo


Speech net l.jpg
Speech .NET

  • MS implementation of SALT

  • (VoiceWebSolutions + DreamWeaver MX)

  • Some Beta 1 Speech .NET apps still work, because SALT has not changed much, but Speech .NET Beta 2 controls have

  • VoiceXml not as portable between vendors as it should be, the Speech .NET controls could help mitigate this for SALT

    • i.e. layer of abstraction for voice browser wars



Slide21 l.jpg
Code

  • Creating static grammars and prompts

  • Very little server-side code

    • Only dynamic grammars / prompts

    • Server-side code mods to better support speech

  • Mainly setting properties on Speech controls and tying to client-side javascript

  • Tie javascript to mouse-click events to avoid redundant code


Impression l.jpg
Impression

  • Separate app layers to reduce complexity

    • Voice UI will be less functional, design is key

  • Learning low level SALT might be easier than high level Speech .NET controls

  • Application controls change this in Beta 2

  • Speech .NET has a great debugger (now server side too), grammar, and prompt tools

  • Speech Control Editor was needed for dev

  • IE Audio meter was needed for MultiModal

  • MultiModal has some time to grow


Speech net23 l.jpg
Speech .NET

  • DEMO

    • Speech .NET Beta 2 (VS .NET)

    • /noHands (VoiceOnly web app)


Industry l.jpg
Industry

  • Wrote 1st VoiceXml article a year ago

    • Received 1st proposal request last month

    • 1 other proposal request since then

  • Wrote 1st Speech .NET article 5 months ago

    • Request for an article from MSDN magazine


Voice recognition l.jpg
Voice Recognition

  • PSTN is less secure than Internet!

    • More accessible and easier to automate hack

  • Traditionally spoken password OR DTMF pin, also #

  • Clients always confuse with speech recognition

  • Not a part of VoiceXml or SALT specs

    • Telephony gateways proprietary implementations

  • Not useful for identifying somebody

  • Useful for confirming somebody is whom they say they are

  • Prints have to change when device changes


Future ms speech l.jpg
Future (MS Speech)

  • SALT Telephony gateways

  • Speech .NET (VoiceOnly then MultiModal)

  • Pocket IE Speech Add-In

  • NET Fat-client Speech APIs

    • Desktop / Tablet / PPC

  • MS or 3rd party VS .NET VoiceXml controls

  • Possibility for Speech .NET controls to render both SALT and VoiceXml


Future l.jpg
Future

  • Lots of W3C Voice specs …

  • VoiceXml MultiModal browser

  • Auto (hands-free, navigation, radio)

  • 3G (bridge voice and wireless web)

    • offload Speech processing

    • VOIP or PSTN

    • Pocket PC Phone Edition / SmartPhones

  • IBM recently announced chip for Speech on mobile devices



ad