1 / 15

Two digits recognition

Two digits recognition. By: Meghal Bhatt. Sphinx4. Sphinx4 is a state of the art speaker independent , continuous speech recognition system written entirely in java programming language.

malaya
Download Presentation

Two digits recognition

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Two digits recognition By: Meghal Bhatt

  2. Sphinx4 • Sphinx4 is a state of the art speaker independent , continuous speech recognition system written entirely in java programming language. • The design of sphinx4 is based on patterns that have emerged from the design of past systems as well as new requirements based on that researchers currently want to explore. • Sphinx4 also includes several implementation of both simple and state of art technique.

  3. Sphinx4 • It has different parts: 1) Recognizer 2) Decoder 3) linguistic 4) Acoustic model 5) Front end 6) Instrumentation

  4. Recognizer • It recognizes the audio signal spoken by the human and the searches the same in the transcript file. • And it is capable of recognizing discreet and continuous speech.

  5. Decoder • The decoder of the sphinx -4 speech recognition systems incorporates several new designs strategies which have not been used in hmm based large vocabulary speech recognition systems. • Contains the search manager performs search using the algorithm used like breadth search, best first search, death first search and also contain feature scorer and pruner. • It uses the new aspects of graph construction by using multi level parallel decoding with independent simultaneous features streams without the use of compound HMM structure.

  6. FRONT END • Performs the digital signal processing on the incmoing data. The sequence of operation performed by sphinx -4 front end is that it creates mel-cepstra from an audio file. • It also includes pluggable language model support for ASCII,, Hamming window, FFT , Mel frequency filter bank, discrete cosine transform , cepstral mean normalization and feature extraction of cepstra, delta cepstra features.

  7. Acoustic model • In sphin-4 we have two important models that are for difference purpose • TIDIGITS_8GAU_13dcep_16K_40 mel_130Hz_6800.jar is designed and created for number that you should use this model for the acoustic Model. • WSJ_8gau_13dCep_16k_40mel_130Hz_6800.jar is designed and created for the text data.if a user wants to recognize text then should use this model for the text.

  8. Dictionary • Dictionary provides pronounciation for words found in language model. The pronounciations splits words into sequences of phonemes which which are found in the acoustic model. • Responsible for how the word is pronounced this is the main task.

  9. Language model • It contains representation of probability of occurrence of words.There are basically two types of model that describe the language: • Statistical language model: • Statistical language model estimate the probability of the distribution of natural language . The most widely used statistical language model is N-gram. • Grammar language model: • Grammar describes a very simple parts and types of languages for command and control, and you are written by hand or is generated automatically by plain code.

  10. XML configuration File • Configuration file determines the configuration of a open source frame network sphinx-4 . This configuration files defines the following: • The different types of components and its names. • The in between connectivity of the components how they corresponds to each other. • And also shows the detailed configuration for each of these elements.

  11. To use model in sphinx-4 • Basically there are three steps to use new model from sphinx-4 • Defining a language model. • Defining a dictionary. • Defining a acoustic model.

  12. Defined language model <component name="jsgfGrammar" type="edu.cmu.sphinx.jsapi.JSGFGrammar"> <property name="grammarLocation“ value=" the path to the grammar folder "/> <property name="dictionary" value="dictionary"/> <property name="grammarName" value=“the name of grammar"/> <property name="logMath“ value="logMath"/> </component>

  13. Defined acoustic model <component name="sphinx3Loader" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.Sphinx3Loader"> <property name="logMath" value="logMath"/> <property name="unitManager" value="unitManager"/> <property name="location" value="the path to the model folder"/> <property name="location" value="the path to the model folder"/> </component> <component name="acousticModel" type="edu.cmu.sphinx.linguist.acoustic.tiedstate.TiedStateAcousticModel"> <property name="loader" value="sphinx3Loader"/> <property name="unitManager" value="unitManager"/> </component>

  14. Defined dictionary model <component name="dictionary" type="edu.cmu.sphinx.linguist.dictionary.FastDictionary"> <property name="dictionaryPath" value="the name of the dictionary file" <property name="fillerPath" /> value="the name of the filler file"/> <property name="addSilEndingPronunciation" value="false"/> <property name="allowMissingWords" value="false"/> <property name="unitManager" value="unitManager"/> </component>

  15. Thank you

More Related