Microphone Array Post-filter based on Spatially-Correlated
1 / 22

Kenichi Kumatani , Disney Research, Pittsburgh Bhiksha Raj, Carnegie Mellon University - PowerPoint PPT Presentation

  • Uploaded on

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition. Kenichi Kumatani , Disney Research, Pittsburgh Bhiksha Raj, Carnegie Mellon University Rita Singh, Carnegie Mellon University John McDonough Carnegie Mellon University.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Kenichi Kumatani , Disney Research, Pittsburgh Bhiksha Raj, Carnegie Mellon University' - armine

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Kenichi kumatani disney research pittsburgh bhiksha raj carnegie mellon university

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition

Kenichi Kumatani, Disney Research, Pittsburgh

BhikshaRaj, CarnegieMellon University

Rita Singh, CarnegieMellonUniversity

John McDonough CarnegieMellonUniversity

Organization of presentation

Our Goal: Distant Speech Recognition (DSR)


Conventional Post-filtering Methods


Our Post-filtering Method

DSR Experiments on Real Array Data


Organization of Presentation

Our goal distant speech recognition dsr system
Our Goal ~ Distant Speech Recognition (DSR) System


Replace the close-talking microphone with the far-field sensors to make human-machine interfaces more interactive.

Overview of our DSR System:

Speaker Tracking

Speaker’s position

Distant speech



Enhanced speech

Microphone array

Speech Recognition

Merits of this Approach:

Recognition result

  • By using the geometry of the microphone array and speaker’s position,

  • our system has the following merits:

  • stable performance in real environments and

  • straightforward extension to the use of other information sources.

Avoid being blind!

Backgrounds of this work
Backgrounds of this Work

Backgrounds :

  • Beamforming would not provide the optimal solution in a sense of the minimum mean square error (MMSE).

  • Post-filtering can further improve speech recognition performance.

Basic Block Chart:

Key issue:

  • Estimate the power spectral densities (PSD) of target and noise signals

  • to build the Wiener filter.


Multi-channel Input Vector X

Time Delay Compensation

Post-filtering H



Conventional post filter design method 1
Conventional Post-filter Design Method 1

Zelinski Post-filter :

  • Zelinski assumed that

    • The target and noise signals are uncorrelated,

    • The noise signals are uncorrelated between different channels, and

    • The noise PSD is the same among all the channels.

  • Then, the cross- and auto- spectral densities between two channels can be simplified as




  • By substituting them into the Wiener filter formulation, we have the Zelinski post-filter:

Conventional post filter design method 2
Conventional Post-filter Design Method 2

Issues of the Zelinski Post-filter :

  • In many situations, the noise signals are spatially correlated.

McCowan Post-filter :

  • McCowan and Bourlard introduced the coherence of the diffuse noise field:

an indicator of the similarity of signals at different positions

and compute the cross- and auto- spectral densities as

This is different from the Zelinski method.

  • Then, the McCowan post-filter can be written as

where is an PSD estimate of the target signal for each sensor pair.

Lefkimmiatis Post-filter:

  • Lefkimmiatis et al. more accurately model the diffuse noise field by applying the coherence to the denominator of the McCowan post-filter.

Motivation of o ur method
Motivation of our Method

Common Problem of Conventional Methods:

  • The static noise field model will not match to every situation.

Example of Noise Coherence in a Car:

  • Figures show the magnitude-squared coherence

observed in a car.

Engine idling State

Driving at a speed of 65 mph

  • It is clear that the actual noise field is neither uncorrelated nor diffuse field.

Our Motivation:

measure the most dominant noise signal instead of those static noise field assumptions.

Our strategy how can we measure a noise s ignal
Our Strategy- How can we measure a noise signal?

Estimate a speaker’s position,

Build a beamformer and steer a beam toward the target source,

Find where the most dominant interfering source is, and

Build another beamformer to measure a noise signal.



Steering direction for the noise source


Beamformer 2

(Noise Extractor)

Beamformer 1

for the target speech

Enhanced speech

Separated noise


Further Noise Removal

Our post filter system
Our Post-filter System

  • We build a maximum negentropy beamformer for a target source and

  • null-steering beamformer for extracting the noise signal.

Maximum Negentropy Beamformer










For the target source

Null-steering Beamformer

Post-filter estimation



For the noise source

Our post filter system maximum negentropy mn beamformer speech emphasizer
Our Post-filter System- Maximum Negentropy (MN) Beamformer (Speech emphasizer)

MN Beamformer for the target source










Maximum Negentropy Criterion:

For the noise source

  • The distribution of clean speech is non-Gaussian and

  • that of noisy and reverberant speech becomes Gaussian.

  • Negentropy is an indicator of how far the distribution of signals is from Gaussian.

Post-filter estimation



Maximum Negentropy Beamformer:

  • Build a super-directive beamformer for the quiescent vector wSD.

  • Compute the blocking matrix Bto maintain the distortionless constraint for the look direction BHwSD= 0.

  • Find the active weight vector which provides the maximum negentropy of the outputs:

  • wa= argmaxYSDMN=(wSD- B wa)HX.


  • We can enhance a structured-information signal coming from the direction of interest without signal cancelation and distortion.

Our post filter system null steering beamformer noise extractor
Our Post-filter System- Null-Steering Beamformer (Noise extractor)










For the noise source

Post-filter estimation



Null-steering Beamformer (Noise Extractor):

  • Place a null on the direction of interest (DOI) while maintaining the unity gain for the direction of the noise source.

  • Assuming the array manifold vectors for the target source vand for the noise source vN,

  • we obtain such a beamformer’s weight by solving the linear equation:

  • [ v vN]H wnull= [ 0 1 ]T.


  • We can extract a noise signal only by eliminating the target signal arriving directly from the source point.

Our post filter system1
Our Post-filter System

For the target source










For the noise source

Post-filter estimation



Our post-filter design:

  • Now that we have estimates of the target signalYSDMN=(wSD- B wa)HX and

  • an noise observation Ynull = wnullX,


We can design the post-filter as

Speech recognition results word error rates in different conditions
Speech Recognition ResultsWord Error Rates in Different Conditions

Word Error Rate


  • We used actual noise measurements for the microphone array post-filter.

  • It turned out that the noise fields in car conditions are neither uncorrelated nor spherically isotropic (diffuse).

  • It has been demonstrated that our post-filter method can provide the best recognition performance among the popular post-filter methods.

  • This is because our method can update a noise PSD adaptively without any static noise coherence assumption.

Speech samples 65 wind
Speech Samples (65-Wind)

Single Distant Channel

Post-filtered Speech

Extracted Noise Signal

Actual speech distribution super gaussian
Actual Speech Distribution ~ Super-Gaussian

Distributions of clean speech with super-Gaussian distributions

*The histograms are computed from the real part of actual subband samples.

  • The distribution of speech is not Gaussian but non-Gaussian.

  • It has “spikey” and “heavy-tailed” characteristics.

How about maximizing a degree of super-Gaussianity?

Why do we need non gaussianity measures
Why do we need non-Gaussianity measures?

The reasoning is briefly grounded on 2 points:

  • The distribution of independent random variables (r.v.s.) will approach Gaussian in the limit as more components are added.

  • Information-bearing signals havea structure which makes them predictable.

If we want original independent components which bear information,

we have to look for a signal that is not Gaussian.

Distributions of clean and noise-corrupted speech

Distributions of clean and reverberated speech

  • The distributions of noise-corrupted and reverberated speech are closer to the Gaussian than clean speech.

Negentropy criterion for super gaussianity
Negentropy Criterion for super-Gaussianity

Definition of entropy:

  • Entropy of r.v. Y is defined as:

  • Entropy indicates a degree of uncertainty of information.

Definition of negentropy:

  • Negentropy is defined as the difference between entropy of Gaussian and Super-Gaussian r.v.s:

Entropy of Gaussian r.v

Entropy of super-Gaussian r.v

  • Higher negentropy indicates how far the distribution of the r.v.s. is from Gaussian.

  • Negentropy is generally more robust than the other criterion.

Analysis of the mn beamforming algorithm
Analysis of the MN Beamforming Algorithm

Simulated environment by the image method

Target source


The signal cancellation will occur because of the strong reflection.





Observe that MN beamforming can enhance the target signal by strengthening the reflection, which suggests it does not suffer from the signal cancellation.



Measures for non gaussianity
Measures for non-Gaussianity

  • Negentropy

  • Empirical kurtosis

Definition of kurtosis:

Kurtosis of r.v. is defined as:

is positive value

  • Super-Gaussian: positive kurtosis,

  • Sub-Gaussian: those with negative kurtosis,

  • The Gaussian pdf : zero kurtosis.

Kurtosis can measure the degree of non-Gaussianity.

Empirical approximation of kurtosis:

where K is the number of frames.