SCALE Workshop, January 2010. A Tutorial on Bayesian Speech Feature Enhancement. Friedrich Faubel. I. Motivation. Speech Recognition System Overview. A speech recognition system converts speech to text. It basically consists of two components:

Download Presentation

A Tutorial on Bayesian Speech Feature Enhancement

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

Speech Recognition SystemOverview • A speech recognition system converts speech to text. It basically consists of two components: • Front End: extracts speech features from the audio signal • Decoder: finds that sentence (sequence of acoustical states), which is the most likely explanation for the observed sequence of speech features Front End Decoder Text Speech

Background Noise • Background noise distorts speech features • Result: features don’t match the features used during training • Consequence: severely degraded recognition performance

Overview of the Tutorial I - Motivation II - The effect of noise to speech features III - Transforming probabilities IV - The MMSE solution to speech feature enhancement V - Model-based speech feature enhancement VI - Experimental results VII - Extensions

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • In the signal domain we have the following relationship: • After Fourier transformation, this becomes: • Taking the magnitude square on both sides, we get:

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have:

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: phase term

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase

Interaction Function • The relative phase between two waves describes their relative offset in time (delay) time relative phase

Interaction Function • When 2 sound sources are present the following can happen: = = amplification amplification = = cancellation attenuation

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: relative phase

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: zero in average

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes:

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: Acero, 1990

Interaction Function • Taking the magnitude square on both sides, we get: • Hence, in the power spectral domain we have: • In the log power spectral domain that becomes: But is that really right?

Interaction Function • The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform

Interaction Function • The mean of a nonlinearly transformed random variable is not necessarily equal to the nonlinear transform of the random variable’s mean. nonlinear transform

Interaction Function • Phase-averaged relationship between clean and noisy speech: