slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
NEX i WAVE.COM CMUSphinx PowerPoint Presentation
Download Presentation
NEX i WAVE.COM CMUSphinx

Loading in 2 Seconds...

play fullscreen
1 / 12

NEX i WAVE.COM CMUSphinx - PowerPoint PPT Presentation


  • 137 Views
  • Uploaded on

NEX i WAVE.COM CMUSphinx.org. The Use of Open Source Speech Recognition . Nickolay Shmyrev VP of Research. The state of speech-related open source products AT&T Crystal vs Flite Kal Voxeo Prophecy vs JVoiceXML G729 vs Speex.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'NEX i WAVE.COM CMUSphinx' - hillary


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

NEXiWAVE.COM CMUSphinx.org

The Use of Open Source Speech Recognition

Nickolay Shmyrev

VP of Research

slide2

The state of speech-related open source products

AT&T Crystal vs Flite Kal

Voxeo Prophecy vs JVoiceXML

G729 vs Speex

slide3

The reasons for such difference are complex

Lack of resources

Lack of knowledge

Patents (PSOLA, US Patent 6766295, ...)

slide4

Even open source projects exist, it's hard to use them

Always a prototype

No documentation

No support/community

No releases

Single-person knowledge

slide5

You could do many very intersting things

Intelligent dialog management

Talk topic detection (speech adsense, anti-advertising)

Transcription of the talks/voicemail

System integration

Accurate conference transcription

Real-Time transcription

slide6

It's quite common to see the following

User on CMUSphinx forum:

We need someone to help us get things going with Sphinx. We are looking for sequences of numbers within an audio file and returning the Timed Results to be analyzed by an external program.

Just using the basic "what you get when you download" sphinx 4 we have a proof of concept, but when it comes to working with the actual grammars/models we are completely lost.

slide7

It's enjoying to see that computer can understand some of your commands

Download the package

Setup it with a lot of pain

Make sure it doesn't work (for example it's very hard to recognize a single word)

Do you know what is "fMPE discriminative training, lextree search, count-based language model"? You shouldn't know that.

slide8

It's a huge amount of work

Collect test/train database

Tune and adapt the system

Test extensively

slide9

The Plan

Stable and frequent releases

Packages

Stable and usable API

Good documentation/website

Online support (#cmusphinx @ freenode)

Pure BSD license (no JSAPI)

IVR in Freeswitch

Missing part implementation

Commercial support

http://cmusphinx.org

slide10

Voxforge, The Free Speech Database

http://voxforge.org

Free speech recordings, ready for processing

Acoustic databases

Many languages

Free acoustic models

slide11

The Plan for TTS

Support OpenMARY (http://mary.dkfi.de)

or

Develop a usable practical TTS, mostly from scratch

slide12

What If You Want It Now

Visit

http://nexiwave.com

Customizable speech recognition, boxes, appliances, web-services

Try it for free

http://searchmymeeting.com