390 likes | 470 Views
Explore the latest advancements in speech technology including synthesis, recognition, and voice user interfaces (VUI). Learn about Speech Server 2007, Vista Speech Recognition, SAPI 5.3, and more. Discover how VUI modes, applications, and tips can optimize user interactions. Dive into demos and articles to understand the workings of speech synthesis and recognition. Enhance your knowledge of VoiceXML, SALT, and Speech Workflow for effective integration. Unlock the potential of speech technology for a seamless user experience.
E N D
The Speech Speech casey chesnut brains-N-brawn.com Madison .NET April 2007
Powerpoint • Page Up • Page Down
brains-N-brawn.com • Pervasive Computing • Tablet PC (MVP 03) • Compact Framework (MVP 04) • Advanced Web Services (MVP 05) • Media Center (MVP 06) • Speech • Location Based Services • Artificial Intelligence • 3D
Outline • Speech Overview • Vista Speech Recognition • SAPI 5.3 / System.Speech • Speech Server 2007
Outline : Speech Overview • Voice User Interface • How does it work? • Synthesis (TTS) • Recognition (SR)
Overview • Speech is just another presentation system • Synthesis = Output to user • Recognition = User input • Voice User Interface (VUI)
VUI Modes • Applications • Multi-modal • Voice-only
VUI Tips • Don't replicate the touch-tone-based menu system • Restrict options on the main (opening) menu to 4 or fewer • Make sure your opening greeting is short • Don't design the app solely for the new user • Focus on task completion above all • What can I say? http://blogs.msdn.com/anandis_thoughts/archive/2006/02/08/528181.aspx
Speech Synthesis • Text to Speech • Dynamic • Prompt database
How Synthesis Works • Text parsing • Sentences, numbers, symbols, pauses • Natural language processing • Part of speech, tense • Phonemes are looked up or sounded out • Diphones are appended together • Post process audio to add emphasis • Play speech audio
Demo /xnaSynth app Article http://www.brains-N-brawn.com/ttSpeech/ http://www.brains-N-brawn.com/xnaSynth/ (codebase from /ttSpeech) How Synthesis Works
Speech Recognition • Speech to Text • Dictation • Command and Control
Audio signal is processed Look for signals which might be speech Phonemes are found in audio signals Phonemes are mapped to a dictionary or words Dictation or grammar-based Apply natural language processing How Recognition Works
How Recognition Works • Demo • /wavReader app • Article • http://www.brains-N-brawn.com/noReco/ • http://www.brains-N-brawn.com/speakerVerify/ (codebase from /noReco)
Built-in to Vista’s shell Microphone bar Language support Can be trained to improve accuracy Command-and-control, also Dictation Automagic application support Horrible Office integration UAC problems Outline : Vista Speech Recognizer
Demo • Say what you see • Show numbers • Correct • Spell it • Mouse grid http://www.istartedsomething.com/20060808/vista-speech-recognition-screencast/
Hack http://news.bbc.co.uk/1/hi/technology/6320865.stm • /micBarExtend – tap and talk
Narrator • Vista’s screen reader
Desktop applications SAPI 5.3 System.Speech Outline : SAPI 5.3 / System.Speech
SAPI 5.3 • COM based • Native applications • Managed apps which need more control
System.Speech • Part of .NET 3.0 WPF • Managed wrapper built on SAPI 5.3 • Simple API • Standards support (SSML, SRGS) • Language support • Vista Speech Recognition integration • Does not work in XBAP
System.Speech.Synthesis • SpeechSynthesizer • SSML • PromptBuilder • Voices
System.Speech.Synthesis • Demo • /speechSamples - /speechSynth
System.Speech.Recognition • SpeechRecognizer / SpeechRecognizerEngine • SRGS • GrammarBuilder • Advanced users • Deep-link functionality • Mixed initiative
System.Speech.Recognition • Demo • /speechSamples - /speechReco
System.Speech • Demo • /micBarExtend • /mceSapiMcpl • Article • http://www.brains-N-brawn.com/speechSamples/ • http://www.brains-N-brawn.com/micBarExtend/ • http://www.brains-N-brawn.com/mceSapi/ (not updated for Vista yet)
What about Mobile Devices • OEMs can add VoiceCommand • VoiceCommand is not accessible to developers • WindowsMobile has the SAPI API, but no engines • PlatformBuilder is supposed to have engines • There are 3rd party engines for purchase
Speech Server 2007 • Telephony Applications • Outgoing calls • Speaker Independent
VOIP Language support VoiceXML / SALT Workflow development model Reports Still in beta Speech Server 2007
Speech Server 2007 • Speech Synthesis • Inline • PromptBuilder • SSML • Prompt databases • Speech Recognition • Inline • Dynamic Grammar • SRGS • Conversational Grammar Builder • DTMF
VoiceXML • Declarative language • Article • http://www.brains-N-brawn.com/vxml/ • http://www.brains-N-brawn.com/myVoices/ • http://www.brains-N-brawn.com/voiceBio/
SALT • Yet another declarative language • Multimodal support has been dropped • Article • http://www.brains-N-brawn.com/noHands/ • http://www.brains-N-brawn.com/speechMulti/ • http://www.brains-N-brawn.com/tabletWeb/ • http://www.brains-N-brawn.com/mceSalt/
Speech Workflow • Speech Sequence Workflow designer • Speech activities • Statement • QuestionAnswer • Debugging tools
Speech Workflow • Demo • /speechTextAdv • /speakerVerify • /mobileRecord • Article • http://www.brains-N-brawn.com/speechTextAdv/ • http://www.brains-N-brawn.com/speakerVerify/
Where • Accessibility • Telephony • Telematics • Home automation • Mobile Devices / Tablets • Gaming • Warehouses • …
Possible Future • Telematics • Service Pack for Office Support • Exchange Server 2007 • Speech Server 2007 release • Rumors that WindowsMobile will get a public API • Dictation has room to improve • Hope that System.Speech will ultimately work in XBAP