Create Presentation
Download Presentation

Download Presentation
## A Tutorial on Bayesian Speech Feature Enhancement

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -

**SCALE Workshop, January 2010**A Tutorial on Bayesian Speech Feature Enhancement Friedrich Faubel**I**Motivation**Speech Recognition SystemOverview**• A speech recognition system converts speech to text. It basically consists of two components: • Front End: extracts speech features from the audio signal • Decoder: finds that sentence (sequence of acoustical states), which is the most likely explanation for the observed sequence of speech features Front End Decoder Text Speech**Speech Feature ExtractionTime Frequency Analysis**• Performing spectral analysis separately for each frame yields a time-frequency representation**Speech Feature ExtractionTime Frequency Analysis**• Performing spectral analysis separately for each frame yields a time-frequency representation**Speech Feature ExtractionPerceptual Representation**• Emulation of the logarithmic frequency and intensity perception of the human auditory system**Background Noise**• Background noise distorts speech features • Result: features don’t match the features used during training • Consequence: severely degraded recognition performance**Overview of the Tutorial**I - Motivation II - The effect of noise to speech features III - Transforming probabilities IV - The MMSE solution to speech feature enhancement V - Model-based speech feature enhancement VI - Experimental results VII - Extensions**II**Interaction Function The Effect of Noise**Interaction Function**• Principle of Superposition: signals are additive noise clean speech noisy speech = +**Interaction Function**• In the signal domain we have the following relationship: noisy speech noise clean speech**Interaction Function**• In the signal domain we have the following relationship:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:**Interaction Function**• Taking the magnitude square on both sides, we get:**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have:**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: phase term**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase**Interaction Function**• The relative phase between two waves describes their relative offset in time (delay) time relative phase**Interaction Function**• When 2 sound sources are present the following can happen: = = amplification amplification = = cancellation attenuation**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: zero in average**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: Acero, 1990**Interaction Function**• Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: But is that really right?**Interaction Function**• The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform**Interaction Function**• The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform**Interaction Function**• Phase-averaged relationship between clean and noisy speech:**III**Transforming Probabilities