1 / 12

Voice Recognition (Presentation 2)

Voice Recognition (Presentation 2). By: Priya Devi A. S/W Developer, Xsys technologies Bangalore. Preparing Grammar. Grammar file currently extended to 56 tokens. Dynamic generation of grammar file is possible. User Interface for entering grammar token and action is implemented.

toby
Download Presentation

Voice Recognition (Presentation 2)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Voice Recognition(Presentation 2) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.

  2. Preparing Grammar • Grammar file currently extended to 56 tokens. • Dynamic generation of grammar file is possible. • User Interface for entering grammar token and action is implemented. • Tokens are entered into grammar file which are recognized by sphinx recognizer on detection from microphone input. • Action are associated to tokens and recorded in form of hash table. • Grammar file is according to JSGF format.

  3. JSGF (Java Speech Grammar Format) • The JSpeech Grammar Format (JSGF) is a platform-independent, vendor-independent textual representation of grammars for use in speech recognition. • Example token definition according to JSGF is as follows : public <desktopAction> = open (Computer | Document | Recycle | Network | <defaultApplication> ); public <defaultApplication> = player | word | powerpoint | internet | start | tasks ;

  4. Major Challenge - Accuracy • Accuracy now is only 45 %. • Accuracy depends on a lot of factors like noise, microphone quality. • Accuracy highly depends on Recognizer. • Recognizer search grammar file for tokens according to Best first scheme. • Best first scheme fails due to wrong textual comparison. For eg. Word can be recognized as ward.

  5. Improving Accuracy • Limit the size of grammar file. • Remove trivial tokens from grammar file. • All the tokens given on slide 3 are trivial tokens. • Trivial tokens can be identified by .WAV file trainingand not included in grammar file. • Which reduces search space of grammar file. • Accuracy is increased to 72 % • With this command and control application is completed.

  6. .WAV file training • .Wav file training is process of recording small .wav files in user’s voice to improve accuracy in speech recognition application. • User are provided with the interface to read set of lines before starting with the speech recognition application. • Set of lines consists of words which are trivial for command and control application like , open, close, file, computer, document, player, internet. • Recognizer first match token with .wav file. If token is not found in .wav file the grammar file is searched.

  7. Next task : Dictation • Dictation is different from command and control. It requires large number of words to be recognized. • Dictation should be start on recognizing “Start dictation” token and then input from microphone should not be used as command but as keystrokes. • Complex task as grammar file and .wav file training fails in this case because user can speak anything which may be not present in grammar file and .wav files.

  8. Thank You

  9. Voice Recognition(Presentation 3) By: Priya Devi A. S/W Developer, Xsys technologies Bangalore.

  10. Dictation Functionality • Speech dictation is to consider input voice not as command but as text. • Recognition of spoken word is similar to as it was in command and control application. • Once the spoken word is recognized as “Start Dictation”; Rest all word is considered as text till recognizer recognizes “Stop Dictation”. • After recognizing “Stop Dictation” ; application again will work as command and control • Dictation is implemented by using algorithm given in the next slide.

  11. Algorithm Dictation Changes in Command and control If ( Recognizer(spoken_word)= “Start Dictation” ) call function RecognizeDictation() else match in hashtable. Recognize Dictation While(true) Start Recording If ( Recognizer(spoken_word) != “Stop Dictation” ) Create object of Robot Class present in java.awt package for i=0 to Recognizer(spoken_word).length-1 RobotObject.keyPress(recognizeword.charAt(i).toAscii()) RobotObject.keyRelease(recognizeword.charAt(i).toAscii()) End for Else return End While

  12. Open Points • Paragraph framing for training .wav files • Modification in dictation functionality as “Stop Dictation” can not be dictated. • Proper GUI creation with logo and standard design. • Deployment with the existing system on centos. • Testing on centos. • Code Cleanup. • Complete Testing of command and control and Dictation • Documentation.

More Related