1 / 8

SPEECH RECOGNITION

SPEECH RECOGNITION. Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526. How does Sphinx3 work?. Sphinx3 uses ---HMM with continuous probability density function Flat initialization state :

gunnar
Download Presentation

SPEECH RECOGNITION

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SPEECH RECOGNITION Presented to Dr. V. Kepuska Presented by Lisa & Za ECE 5526

  2. How does Sphinx3 work? • Sphinx3 uses ---HMM with continuous probability density function • Flat initialization state: • Mixture weights: the weights given to every Gaussian in the Gaussian mixture corresponding to a state • transition matrices: the matrix of state transition probabilities • means: means of all Gaussians • variances: variances of all Gaussians

  3. How does Sphinx3 work? • forward-backward re-estimation algorithm (Baum-Welch algorithm) • Use for converging the likelihood training • Untied Modeling - Training for all context-dependent phones (usually triphones) that are seen in the training corpus

  4. How does Sphinx3 work? • Building decision tree • Used to decide which of the HMM states of all the triphones (seen and unseen) are similar to each other • Pruning the decision trees

  5. Our project:::Spelling Bees Use Sphinx3 to train the recorded data Compare the train data with the test data Result: We have used 224 train data and 73 test data. The dictionary has 46 words and 33 phones are used. 32.7% word error rate and 49.3% sentence error rate

  6. The result:::

  7. The result::: id: (fash-cen2-fash-b) Scores: (#C #S #D #I) 3 0 0 0 REF: a m y HYP: a m y Speaker sentences 1: moe #utts: 8 id: (moe-m_oses1) Scores: (#C #S #D #I) 4 0 1 1 REF: * m o s e S HYP: E m o s e * Eval: I D id: (moe-m_oses2) Scores: (#C #S #D #I) 5 0 0 0 REF: m o s e s HYP: m o se s Eval:

  8. Reference: http://www.speech.cs.cmu.edu/sphinxman/fr4.html Lecture notes from Speech recognition class http://www.ele.uri.edu/~hansenj/projects/ele585/ makeraw.m record.m

More Related