Compensating speaker-to-microphone playback system for robust speech recognition

Compensating speaker-to-microphone playback system for robust speech recognition So-Young Jeong and Soo-Young Lee Brain Science Research Center and Department of Electrical Engineering and Computer Science Korea Advanced Institute of Science and Technology

Distorted speech Clean speech Channel Additive noise Motivation • ASR in mismatched environments • Environmental information • Background noise, acoustic/transmission channel • Assume environment degradation model

Channel Impacts on feature Channel Assumption 1 • P.S • F.B. • L.S. • C.S. Channel Assumption 2

Speaker-to-Microphone compensation • Speaker-to-Microphone playback • Speaker distortion • Nonlinearity caused by voice coil • Microphone distortion • Frequency response caused by different fabrication • Nonlinearity caused by dynamic range • Ambient noise by directionality

F.E. Mapper distorted Speaker-to-Microphone mapping • Mapper train • Where and which type of mapper should be deployed? • Mapper apply Error F.E. clean + F.E. Trained Mapper distorted To recognizer

Mapping error at L.S. • Diamond, plus, cross denotes PS,FB.LS level

Frequency correlation plots

Recognition Experiments • Task • Phoneme recognition for 40 TIMIT phone sets • Phone accuracy = (N-D-S-I) * 100 /N • Database • HTIMIT : re-recording TIMIT sentence thru. 10 various telephone handsets • Training : 246 speaker * 8 sent. = 1968sent. • Test : 48 speaker * 8 = 384 sent. • Baseline • 3-state monophone HMM with 16 gaussian mixture

Experiment I – CI result

Conclusion • Speech signal distorted by low-quality speaker-to-microphone playback system can be compensated with feature mapping network • Feature mapping scheme would be useful in cases that environmental condition is tough for collecting database

Compensating speaker-to-microphone playback system for robust speech recognition

Compensating speaker-to-microphone playback system for robust speech recognition

Presentation Transcript

Robust Speech recognition

Audio-Visual Speech and Speaker Recognition

ROBUST SIGNAL REPRESENTATIONS FOR AUTOMATIC SPEECH RECOGNITION

Speaker Recognition

Speaker recognition Phase 1: Detecting speech

Robust Recognition of Emotion from Speech

Robust speaker recognition over varying channels

Quantile Based Histogram Equalization for Noise Robust Speech Recognition

Automatic Speech Recognition System

Histogram-based Quantization for Distributed / Robust Speech Recognition

MODULATION SPECTRUM EQUALIZATION FOR ROBUST SPEECH RECOGNITION

Enhanced Speech Models for Robust Speech Recognition

Speaker Recognition

A Baseline System for Speaker Recognition

IRISA 2003 SPEAKER RECOGNITION SYSTEM

Speaker Recognition

Robust Speaker Recognition

CMU Robust Vocabulary-Independent Speech Recognition System

Robust Endpoint Detection and Energy Normalization for Real-Time Speech and Speaker Recognition

Isolated word, speaker independent speech recognition

A Robust Speaker Identification System

Prosodic Constraints for Robust Speech Recognition