Download
real time speech recognition n.
Skip this Video
Loading SlideShow in 5 Seconds..
Real-Time Speech Recognition PowerPoint Presentation
Download Presentation
Real-Time Speech Recognition

Real-Time Speech Recognition

240 Views Download Presentation
Download Presentation

Real-Time Speech Recognition

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter

  2. Background • Types of speech recognition systems: • Word recognition, • Connected speech recognition, • Speech understanding systems • Simplest: • user-dependent limited vocabulary • Hard to design any system • Variations of speech, i.e. • amplitude, • duration, • and signal to noise • Background noise • Reverberation noise. • Implemented in banking, telephone, etc. • IBM ViaVoice

  3. Project Outline • Design a user-dependent speech recognition system to control the movement of a small remote control car • Limited in vocabulary: Backward, Forward, Left, and Right • Trained to my voice • Different speech recognition algorithms were examined to understand the advantages and disadvantages of each system • Linear Predictive Coding • Cepstrum Coefficients • Mel-frequency Cepstrum Coefficients

  4. TI 6713 DSP Board Sample word at 8 kHz Microphone Segment word into time frames Recognized word Find Mel-Cepstrum coefficients for each frame Compare input word to a codebook of defined words using dynamic time warping System Design

  5. Components List • Texas Instruments TMS320C6713 DSP Board • Audio Technica Omnidirectional Microphone ATR35S • Two step motors

  6. Linear Predictive Coding • Provides a good model of the speech signal. • Can approximate a speech sample at time n from past samples. where a1,a2,…,ap are coefficients that weight each sample.

  7. Mel-frequency Cepstrum Coefficients • Research has shown mel-frequency cepstrum coefficients to be better than cepstrum coefficients and LPC • Modeled around human auditory system (ear) where cn is the nth order mel-frequency cepstrum, and Sk is the power of the kth mel filter. • 12 mel-frequency cepstrum coefficients characterize each time frame

  8. Dynamic Time Warping • Arranged mel-frequency coefficients into vectors • Use dynamic time warping to find best match • Compare words that are uttered in a different time frame. • You have a referenced word that you are listening for • You have a sampled word • Want to compared both words, sampled and referenced, and see if they match • Compare mel-frequency cepstrum coefficients for each frame of speech

  9. Dynamic Time Warping • Example of DTW:

  10. Dynamic Time Warping • Solution:

  11. Results Sources of error: 1. Noise, i.e. computer fan, fluorescent light. 2. Voice changes, i.e. a word spoken on a day might not sound the same on the next day 3. Trained to one word template

  12. Problems Encountered • Warping frequency domain into mel-frequency, i.e. Log10. • Translation of MATLAB code into C, i.e. dynamic arrays, debugging process • Dynamic time warping, i.e. theory, algorithm

  13. Future Work • The C implementation of this system is being developed. The implementation will be uploaded onto the TI 6713 DSP Board once it is completed. • The code will be modified to allow the recognition system to operate in real-time. • A more comprehensive testing of the system will be performed under a variety of noise conditions.

  14. That is all.