1 / 31

Introduction

Introduction. The PDF of cepstral coefficients of speech signals Is usually regarded as a quasi-Gaussian distribution Under this assumption, the purpose of moment normalization of order N is then to have For odd order moments : For even order moments : CMS is to normalize the first moment

abel-spence
Download Presentation

Introduction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Introduction • The PDF of cepstral coefficients of speech signals • Is usually regarded as a quasi-Gaussian distribution • Under this assumption, the purpose of moment normalization of order N is then to have • For odd order moments : • For even order moments : • CMS is to normalize the first moment • CN is to normalize the second moment

  2. -3 -3 x 10 x 10 7 7 6 6 5 5 4 4 3 3 2 2 1 1 0 0 -15 -10 -5 0 5 10 15 -15 -10 -5 0 5 10 15 Silence 132097 frames DIM=0 With CN Without CN

  3. Total 1401927 frames DIM=11 With CN Without CN

  4. Higher Order Cepstral Moment Normalization For Robust Speech Recognition Chang-Wen Hsu and Lin-Shan Lee Graduate Institute of Communication Engineering National Taiwan University ICASSP 2004 2004/1/7 Presented by Chen-Wei Liu

  5. Introduction • In real world speech recognition applications • Robust features are highly desired for better recognition performance under various noisy conditions • MFCC have been well accepted as a good choice • Many advanced techniques have been developed based on them • CMS and CN have been two commonly used methods

  6. Introduction • CMS could • Reduce the effects of channel distortion • Avoids the low frequency noise to be further amplified • CN could • Reduce the differences in PDF between the clean and noisy signals • PIC_1PIC_2 • It is also proposed that • The normalization of the third order cepstral moment may achieve better performance than CMS and CN

  7. HOCMN • What’s the so called Nth moment? • The purpose of moment normalization of order N is then to have

  8. HOCMN First moment • For example • With the above • We could extend the moment normalization to higher orders Second moment

  9. HOCMN for even integer • When performing HOCMN with an even integer • Simply scale the first-order moment normalized coefficients by a constant • Such normalization usually co-exists with the first-order normalization or CMS

  10. HOCMN for even integer • We could obtain b with the following • As the above shows • Different N gives different values of b

  11. HOCMN for an odd integer • It usually also co-exists with the first-order normalization • Could be expressed with the first as well as the (N-1)th order moments

  12. HOCMN for an odd integer • It could be extended as the following • As a is small, we can delete the higher order term

  13. HOCMN for both odd & even • Figure 1

  14. Experimental Setup • Aurora 2.0 • Training set • Clean condition / Multi condition (8 kinds of noise) • Testing set • A - 4 kinds of noise • B - another 4 kinds of noise • C - 8 kinds of noise • HOCMN approaches • Full utterances • Segments

  15. Experimental Results • Baseline Experiments • Clean-condition training for all 3 testing sets A,B,C • Word accuracy was averaged for different noise types and different SNRS (0db ~ 20db)

  16. Experimental Results • Curve (a) • Full utterance with CN

  17. Experimental Results • Baseline for CN

  18. Experimental Results • Averaged for all SNR values

  19. Experimental Results • Averaged for all noise types

  20. Weighting Observation Vectors for Robust Speech Recognition in Noisy Environments Zhenyu Xiong, Thomas Fang Zheng, and Wenhu Wu Tsinghua University, Beijing, 100084, China ICSLP 2004 2004/1/7 Presented by Chen-Wei Liu

  21. Introduction • The key issues in practical speech recognition • To improve the robustness against the mismatch between the training and testing environments • Such as background noise, channel distortion, acoustic echo, … ,etc. . • In most recognition systems • The probability of generating a sequence of observation vectors for some models is calculated as the product of the probabilities of generating each observation • Each observation vector is treated with an equal weight

  22. Introduction • In noisy environments, clean speech and background noise are both time-varying • Speech is corrupted slightly at some time, and corrupted violently at other time • Hence • Observation vectors extracted from the slightly-corrupted speech should be more believable

  23. Front-end Module • pic

  24. Noise Estimation and Spectral Subtraction • Noise estimation is based on the result of speech/non-speech detection • Spectral subtraction

  25. Frame SNR Estimation • This indicates the degree how the current speech frame is uncorrupted with noise

  26. Weighting Algorithm • Conventional • Weighted

  27. Weighting Factor • The weighting factor should be an indicator of • The degree how the corresponding speech frame is uncorrupted with the noise

  28. Relationship between SNR and The Weighting Factor • pic

  29. Experiment Setup • Database • Clean speech with isolated words by 10 males & females • 7893 word utterances in total • Almost each speaker speaks 100 Chinese names 4 times • Noise types are • Factory noise, pink noise, white noise, babble noise • SNR amplitude • (-5, 0, 5, 10, 15, 20) db

  30. Experiment Results • pic1

  31. Experiment Results • pic2

More Related