Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonology. Mark Hasegawa-Johnson [email protected] University of Illinois at Urbana-Champaign, USA. Lecture 6: Speech Recognition Acoustic & Auditory Model Features.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Cepstrum = Even Part of Complex Cepstrum
… but Windowed Cepstral Distance = Distance Between Smoothed Spectra
Cepstrally smoothed spectra Distance…
Figure from Niyogi & Burges, 2002
(Hasegawa-Johnson, JASA 2000)
Y(b) = S(b)0.33
x[n] = Sk=0∞ak s[n-dk]
X(z) = R(z) S(z)
log|Xt(w)| = log|Rt(w)| + log|Tt(w)| + log|Pt(w)|
log|Tt*(w)| = Sk hk log|Xt-k(w)|
ct*[m] = Sk hk ct-k[m]
Inner and Outer Hair Cells on the Basilar Membrane. Each column of hair cells is tuned to a slightly different center frequency.
Close-up view of outer hair cells, in a “V” configuration
Gandhi and Hasegawa-Johnson, ICSLP 2004 Excites a Neural Response