Real-Time Speech Recognition Thang Pham Advisor: Shane Cotter
Background • Types of speech recognition systems: • Word recognition, • Connected speech recognition, • Speech understanding systems • Simplest: • user-dependent limited vocabulary • Hard to design any system • Variations of speech, i.e. • amplitude, • duration, • and signal to noise • Background noise • Reverberation noise. • Implemented in banking, telephone, etc. • IBM ViaVoice
Project Outline • Design a user-dependent speech recognition system to control the movement of a small remote control car • Limited in vocabulary: Backward, Forward, Left, and Right • Trained to my voice • Different speech recognition algorithms were examined to understand the advantages and disadvantages of each system • Linear Predictive Coding • Cepstrum Coefficients • Mel-frequency Cepstrum Coefficients
TI 6713 DSP Board Sample word at 8 kHz Microphone Segment word into time frames Recognized word Find Mel-Cepstrum coefficients for each frame Compare input word to a codebook of defined words using dynamic time warping System Design
Components List • Texas Instruments TMS320C6713 DSP Board • Audio Technica Omnidirectional Microphone ATR35S • Two step motors
Linear Predictive Coding • Provides a good model of the speech signal. • Can approximate a speech sample at time n from past samples. where a1,a2,…,ap are coefficients that weight each sample.
Mel-frequency Cepstrum Coefficients • Research has shown mel-frequency cepstrum coefficients to be better than cepstrum coefficients and LPC • Modeled around human auditory system (ear) where cn is the nth order mel-frequency cepstrum, and Sk is the power of the kth mel filter. • 12 mel-frequency cepstrum coefficients characterize each time frame
Dynamic Time Warping • Arranged mel-frequency coefficients into vectors • Use dynamic time warping to find best match • Compare words that are uttered in a different time frame. • You have a referenced word that you are listening for • You have a sampled word • Want to compared both words, sampled and referenced, and see if they match • Compare mel-frequency cepstrum coefficients for each frame of speech
Dynamic Time Warping • Example of DTW:
Dynamic Time Warping • Solution:
Results Sources of error: 1. Noise, i.e. computer fan, fluorescent light. 2. Voice changes, i.e. a word spoken on a day might not sound the same on the next day 3. Trained to one word template
Problems Encountered • Warping frequency domain into mel-frequency, i.e. Log10. • Translation of MATLAB code into C, i.e. dynamic arrays, debugging process • Dynamic time warping, i.e. theory, algorithm
Future Work • The C implementation of this system is being developed. The implementation will be uploaded onto the TI 6713 DSP Board once it is completed. • The code will be modified to allow the recognition system to operate in real-time. • A more comprehensive testing of the system will be performed under a variety of noise conditions.