Microphone Array Post-filter based on Spatially-Correlated
This presentation is the property of its rightful owner.
Sponsored Links
1 / 22

Kenichi Kumatani , Disney Research, Pittsburgh Bhiksha Raj, Carnegie Mellon University PowerPoint PPT Presentation


  • 101 Views
  • Uploaded on
  • Presentation posted in: General

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition. Kenichi Kumatani , Disney Research, Pittsburgh Bhiksha Raj, Carnegie Mellon University Rita Singh, Carnegie Mellon University John McDonough Carnegie Mellon University.

Download Presentation

Kenichi Kumatani , Disney Research, Pittsburgh Bhiksha Raj, Carnegie Mellon University

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Kenichi kumatani disney research pittsburgh bhiksha raj carnegie mellon university

Microphone Array Post-filter based on Spatially-Correlated Noise Measurements for Distant Speech Recognition

Kenichi Kumatani, Disney Research, Pittsburgh

BhikshaRaj, CarnegieMellon University

Rita Singh, CarnegieMellonUniversity

John McDonough CarnegieMellonUniversity


Organization of presentation

Our Goal: Distant Speech Recognition (DSR)

Backgrounds

Conventional Post-filtering Methods

Motivations

Our Post-filtering Method

DSR Experiments on Real Array Data

Conclusions

Organization of Presentation


Our goal distant speech recognition dsr system

Our Goal ~ Distant Speech Recognition (DSR) System

Goal:

Replace the close-talking microphone with the far-field sensors to make human-machine interfaces more interactive.

Overview of our DSR System:

Speaker Tracking

Speaker’s position

Distant speech

Beamforming

Post-filtering

Enhanced speech

Microphone array

Speech Recognition

Merits of this Approach:

Recognition result

  • By using the geometry of the microphone array and speaker’s position,

  • our system has the following merits:

  • stable performance in real environments and

  • straightforward extension to the use of other information sources.

Avoid being blind!


Backgrounds of this work

Backgrounds of this Work

Backgrounds :

  • Beamforming would not provide the optimal solution in a sense of the minimum mean square error (MMSE).

  • Post-filtering can further improve speech recognition performance.

Basic Block Chart:

Key issue:

  • Estimate the power spectral densities (PSD) of target and noise signals

  • to build the Wiener filter.

Beamforming

Multi-channel Input Vector X

Time Delay Compensation

Post-filtering H

Post-filter

Estimation


Conventional post filter design method 1

Conventional Post-filter Design Method 1

Zelinski Post-filter :

  • Zelinski assumed that

    • The target and noise signals are uncorrelated,

    • The noise signals are uncorrelated between different channels, and

    • The noise PSD is the same among all the channels.

  • Then, the cross- and auto- spectral densities between two channels can be simplified as

0

0

0

  • By substituting them into the Wiener filter formulation, we have the Zelinski post-filter:


Conventional post filter design method 2

Conventional Post-filter Design Method 2

Issues of the Zelinski Post-filter :

  • In many situations, the noise signals are spatially correlated.

McCowan Post-filter :

  • McCowan and Bourlard introduced the coherence of the diffuse noise field:

an indicator of the similarity of signals at different positions

and compute the cross- and auto- spectral densities as

This is different from the Zelinski method.

  • Then, the McCowan post-filter can be written as

where is an PSD estimate of the target signal for each sensor pair.

Lefkimmiatis Post-filter:

  • Lefkimmiatis et al. more accurately model the diffuse noise field by applying the coherence to the denominator of the McCowan post-filter.


Motivation of o ur method

Motivation of our Method

Common Problem of Conventional Methods:

  • The static noise field model will not match to every situation.

Example of Noise Coherence in a Car:

  • Figures show the magnitude-squared coherence

observed in a car.

Engine idling State

Driving at a speed of 65 mph

  • It is clear that the actual noise field is neither uncorrelated nor diffuse field.

Our Motivation:

measure the most dominant noise signal instead of those static noise field assumptions.


Our strategy how can we measure a noise s ignal

Our Strategy- How can we measure a noise signal?

Estimate a speaker’s position,

Build a beamformer and steer a beam toward the target source,

Find where the most dominant interfering source is, and

Build another beamformer to measure a noise signal.

Noise

Speaker

Steering direction for the noise source

microphones

Beamformer 2

(Noise Extractor)

Beamformer 1

for the target speech

Enhanced speech

Separated noise

Post-filter

Further Noise Removal


Our post filter system

Our Post-filter System

  • We build a maximum negentropy beamformer for a target source and

  • null-steering beamformer for extracting the noise signal.

Maximum Negentropy Beamformer

X

wSD

Hp

H

-

B

wa

H

H

For the target source

Null-steering Beamformer

Post-filter estimation

wnull

H

For the noise source


Our post filter system maximum negentropy mn beamformer speech emphasizer

Our Post-filter System- Maximum Negentropy (MN) Beamformer (Speech emphasizer)

MN Beamformer for the target source

X

wSD

Hp

H

-

B

wa

H

H

Maximum Negentropy Criterion:

For the noise source

  • The distribution of clean speech is non-Gaussian and

  • that of noisy and reverberant speech becomes Gaussian.

  • Negentropy is an indicator of how far the distribution of signals is from Gaussian.

Post-filter estimation

wnull

H

Maximum Negentropy Beamformer:

  • Build a super-directive beamformer for the quiescent vector wSD.

  • Compute the blocking matrix Bto maintain the distortionless constraint for the look direction BHwSD= 0.

  • Find the active weight vector which provides the maximum negentropy of the outputs:

  • wa= argmaxYSDMN=(wSD- B wa)HX.

Advantage:

  • We can enhance a structured-information signal coming from the direction of interest without signal cancelation and distortion.


Our post filter system null steering beamformer noise extractor

Our Post-filter System- Null-Steering Beamformer (Noise extractor)

X

wSD

Hp

H

-

B

wa

H

H

For the noise source

Post-filter estimation

wnull

H

Null-steering Beamformer (Noise Extractor):

  • Place a null on the direction of interest (DOI) while maintaining the unity gain for the direction of the noise source.

  • Assuming the array manifold vectors for the target source vand for the noise source vN,

  • we obtain such a beamformer’s weight by solving the linear equation:

  • [ v vN]H wnull= [ 0 1 ]T.

Advantage:

  • We can extract a noise signal only by eliminating the target signal arriving directly from the source point.


Our post filter system1

Our Post-filter System

For the target source

X

wSD

Hp

H

-

B

wa

H

H

For the noise source

Post-filter estimation

wnull

H

Our post-filter design:

  • Now that we have estimates of the target signalYSDMN=(wSD- B wa)HX and

  • an noise observation Ynull = wnullX,

H

We can design the post-filter as


Distant speech recognition experiments

Distant Speech Recognition Experiments


Speech recognition results word error rates in different conditions

Speech Recognition ResultsWord Error Rates in Different Conditions

Word Error Rate


Conclusions

Conclusions

  • We used actual noise measurements for the microphone array post-filter.

  • It turned out that the noise fields in car conditions are neither uncorrelated nor spherically isotropic (diffuse).

  • It has been demonstrated that our post-filter method can provide the best recognition performance among the popular post-filter methods.

  • This is because our method can update a noise PSD adaptively without any static noise coherence assumption.


Thank you

Thank you


Speech samples 65 wind

Speech Samples (65-Wind)

Single Distant Channel

Post-filtered Speech

Extracted Noise Signal


Actual speech distribution super gaussian

Actual Speech Distribution ~ Super-Gaussian

Distributions of clean speech with super-Gaussian distributions

*The histograms are computed from the real part of actual subband samples.

  • The distribution of speech is not Gaussian but non-Gaussian.

  • It has “spikey” and “heavy-tailed” characteristics.

How about maximizing a degree of super-Gaussianity?


Why do we need non gaussianity measures

Why do we need non-Gaussianity measures?

The reasoning is briefly grounded on 2 points:

  • The distribution of independent random variables (r.v.s.) will approach Gaussian in the limit as more components are added.

  • Information-bearing signals havea structure which makes them predictable.

If we want original independent components which bear information,

we have to look for a signal that is not Gaussian.

Distributions of clean and noise-corrupted speech

Distributions of clean and reverberated speech

  • The distributions of noise-corrupted and reverberated speech are closer to the Gaussian than clean speech.


Negentropy criterion for super gaussianity

Negentropy Criterion for super-Gaussianity

Definition of entropy:

  • Entropy of r.v. Y is defined as:

  • Entropy indicates a degree of uncertainty of information.

Definition of negentropy:

  • Negentropy is defined as the difference between entropy of Gaussian and Super-Gaussian r.v.s:

Entropy of Gaussian r.v

Entropy of super-Gaussian r.v

  • Higher negentropy indicates how far the distribution of the r.v.s. is from Gaussian.

  • Negentropy is generally more robust than the other criterion.


Analysis of the mn beamforming algorithm

Analysis of the MN Beamforming Algorithm

Simulated environment by the image method

Target source

Image

The signal cancellation will occur because of the strong reflection.

30°

4m

70.9°

Reflection

Observe that MN beamforming can enhance the target signal by strengthening the reflection, which suggests it does not suffer from the signal cancellation.

650Hz

1600Hz


Measures for non gaussianity

Measures for non-Gaussianity

  • Negentropy

  • Empirical kurtosis

Definition of kurtosis:

Kurtosis of r.v. is defined as:

is positive value

  • Super-Gaussian: positive kurtosis,

  • Sub-Gaussian: those with negative kurtosis,

  • The Gaussian pdf : zero kurtosis.

Kurtosis can measure the degree of non-Gaussianity.

Empirical approximation of kurtosis:

where K is the number of frames.


  • Login