640 likes | 713 Views
Explore neural network topologies, structures, and spectral analysis models for speech recognition. Learn about filter bank implementation and practical examples in speech processing. Discover signal processing methods like Mel-scaling, FFT, IDCT, and Cepstra analysis.
E N D
2.5.4.6 Neural Network Structures for Speech Recognition
3.2.2 Implementations of Filter Banks • Instead of direct convolution, which is computationally expensive, we assume each bandpass filter impulse response to be represented by: Where w(n) is a fixed lowpass filter
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.1 Frequency Domain Interpretation of the Short-Time Fourier Transform
3.2.2.7 Tree Structure Realizations of Nonuniform Filter Banks
سیگنال زمانی Mel-scaling فریم بندی |FFT|2 Logarithm IDCT Cepstra Low-order coefficients Delta & Delta Delta Cepstra Differentiator روش مل-کپستروم
Time-Frequency analysis • Short-term Fourier Transform • Standard way of frequency analysis: decompose the incoming signal into the constituent frequency components. • W(n): windowing function • N: frame length • p: step size
Critical band integration • Related to masking phenomenon: the threshold of a sinusoid is elevated when its frequency is close to the center frequency of a narrow-band noise • Frequency components within a critical band are not resolved. Auditory system interprets the signals within a critical band as a whole
Feature orthogonalization • Spectral values in adjacent frequency channels are highly correlated • The correlation results in a Gaussian model with lots of parameters: have to estimate all the elements of the covariance matrix • Decorrelation is useful to improve the parameter estimation.