1 / 30

Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise. Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005.

blithe
Download Presentation

Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RCC-Mean Subtraction Robust Feature and Compare Various Feature based Methods for Robust Speech Recognition in presence of Telephone Noise Amin Fazel Sharif University of Technology Hossein Sameti, Mohammad T. Manzuri February 2005 Computer Engineering Department, Sharif University of Technology

  2. Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

  3. Effect of Noise on ASR • Two phase in most ASR systems • Train • Operating (Testing) • Mismatch causes reduction in accuracy • Mismatch occur because of • Environment • Microphone, babble, distance, transmission canal • Speaker • Specific speaker: speed,… • Various speakers: gender, age, accent,… Computer Engineering Department Sharif University of Technology

  4. noise Non-stationary Stationary Effect of Noise on ASR • Noise • Additive noise • Babble, car, subway • Exhibit, office, … • Convolutional Noise • Canal, telephone line • Microphone effect • Distance of speaker to microphone • Others • Lombard noise, Reflection of building Computer Engineering Department Sharif University of Technology

  5. Convolutional noise Corrupted Speech Clean Speech Additive noise Effect of Noise on ASR • Simple model • Robust Speech Recognition is the study of building speech recognition that handle mismatch condition. Computer Engineering Department Sharif University of Technology

  6. Feature Extraction Model Training Speech Signal Features Model Training phase Speech Signal Features Model Testing phase Robustness Methods • Signal • Speech enhancement • Feature • Robust feature extraction • Model • Change of the model parameters • Model training Computer Engineering Department Sharif University of Technology

  7. Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

  8. Mel-Frequency Cepstral Coefficient • Compute magnitude-squared of Fourier transform • Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution • Take log of outputs ( for RCC we take root instead of log) • Compute cepstral using discrete cosine transform • Smooth by dropping higher-order coefficients Computer Engineering Department Sharif University of Technology

  9. Temporal processing • To capture the temporal features of the spectral envelop; to provide the robustness: • Delta Feature: first and second order differences; regression • Cepstral Mean Subtraction: • For normalizing for channel effects and adjusting for spectral slope Computer Engineering Department Sharif University of Technology

  10. Perceptual Linear Prediction (PLP) • Compute magnitude-squared of Fourier transform • Apply triangular frequency weights that represent the effects of peripheral auditory frequency resolution • Apply compressive nonlinearities • Compute discrete cosine transform • Smooth using autoregressive modeling • Compute cepstral using linear recursion Computer Engineering Department Sharif University of Technology

  11. Equal Loudness Pre-Emphasis Critical Band Analysis Speech signal Find Autoregressive Coefficients Inverse DFT All pole model Intensity-Loudness Conversion PLP (Cont.) • Algorithm Computer Engineering Department Sharif University of Technology

  12. RelAtive SpecTral Analysis • Which makes PLP (and possibly also some other short-term spectrumbased techniques) more robust to linear spectral distortions • The new spectral estimate is less sensitive to slow variations in the short-term spectrum • Filtering of the temporal trajectories of some function of each of the spectral values; to provide more reliable spectral features • This is usually a bandpass filter, maintaining the linguistically important spectral envelop modulation (1-16Hz) Computer Engineering Department Sharif University of Technology

  13. SPEECH SIGNAL SPECTRAL ANALYSIS Bank of Compressing Static Nonlinearities Bank of Linear Band pass Filters Bank of Expanding Static Nonlinearities OPTIONAL PROCESSING RASTA (Cont.) • Algorithm Computer Engineering Department Sharif University of Technology

  14. RASTA-PLP • Algorithm Computer Engineering Department Sharif University of Technology

  15. Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

  16. RCC-Mean Normalization • Root Cepstral Coefficients (RCC) • Derived using root compression rather than log compression on the filterbank energies • Advantage of RCC to MFCC • More immune to noise • Faster decoding Computer Engineering Department Sharif University of Technology

  17. RCC-Mean Normalization • Mean normalization • If we approximate root with logarithm Computer Engineering Department Sharif University of Technology

  18. Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

  19. Experiment 1 • Database • TFARSDAT • 64 Speakers • 8 hours telephony speech data • ASR • Sharif ASR System • HMM based • Training: Segmental K-means • Search: Beam Viterbi Computer Engineering Department Sharif University of Technology

  20. Test results Experiment 1 Computer Engineering Department Sharif University of Technology

  21. Experiment 2 • Aurora 2.0 • Noisy connected digits recognition • 4 hours training data, 2 hours test data in 70 Noise Types/SNR conditions • HTK • HMM based • Model for each digit • 16 states with 3 Gaussian mixtures Computer Engineering Department Sharif University of Technology

  22. Experiment 2 • Average results on AURORA • Average obtained on various SNRs of a noise Computer Engineering Department Sharif University of Technology

  23. Experiment 2 • Subway noise in various SNRs Computer Engineering Department Sharif University of Technology

  24. Experiment 2 • Babble noise in various SNRs Computer Engineering Department Sharif University of Technology

  25. Experiment 2 • Car noise in various SNRs Computer Engineering Department Sharif University of Technology

  26. Experiment 2 • Exhibition noise in various SNRs Computer Engineering Department Sharif University of Technology

  27. Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

  28. Summery • Various robust features was tested • Introduce of RCC_MN • In first experiment • RASTA-PLP • Although RCC_MN is good • In second experiment • RCC_MN Computer Engineering Department Sharif University of Technology

  29. Outline • Introduction • Feature based methods • MFCC, RCC, CMN, PLP, RASTA • Mean Normalization Root Cepstral Coefficients • Experimental Results • Experiment 1 – Sharif CSR and TFARSDAT Database • Experiment 2 – HTK CSR and AURORA 2 Database • Summery Computer Engineering Department Sharif University of Technology

  30. Thanks for your patience !

More Related