1 / 35

Multi-channel speech enhancement

Multi-channel speech enhancement. Chunjian Li DICOM, Aalborg University. Methods & applied fields. Dual-channel spectral subtraction - noise reduction in speech Adaptive Noise Canceling (ANC) - noise reduction and interference elimination - echo canceling - adaptive beamforming

sophie
Download Presentation

Multi-channel speech enhancement

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multi-channel speech enhancement Chunjian Li DICOM, Aalborg University Lecture notes for Speech Communications

  2. Methods & applied fields • Dual-channel spectral subtraction - noise reduction in speech • Adaptive Noise Canceling (ANC) - noise reduction and interference elimination - echo canceling - adaptive beamforming • Blind Source Separation (BSS) • Blind Source Extraction (BSE) Lecture notes for Speech Communications

  3. Dual-channel spectral subtraction - Hanson and Wong, ICASSP84. Lecture notes for Speech Communications

  4. The method • The exponent is chosen to be a=1 based on listening test and spectral distortion measure. • The noisy phase is used in the reconstruction of signal. • The estimate of noise spectrum is either obtained from a reference channel or estimated from the noisy signal assuming the SNR is very low (about -12 dB). Lecture notes for Speech Communications

  5. Revisiting the phase issue To see the dependency of magnitude on phase: where is the phase difference between the two signals. It is clear that the estimate of signal magnitude spectrum depends on both the SNR and the phase difference. But phase is not estimated in this method because the enhanced quality is acceptable. Lecture notes for Speech Communications

  6. Comments • The simplest (and a bit unrealistic) form of exploiting multi-channel. • Aims at improving intelligibility. • Significant intel. gains only at very low SNR (-12dB). • Unvoiced speech is not processed. Lecture notes for Speech Communications

  7. Adaptive Noise Canceling • First proposed by Widrow et al. [1] in 1975. • It is adaptive because of the use of adaptive filter such as the LMS algorithm. • The objective: estimate the noise in the primary channel using the noise recorded in the secondary channel, and subtract the estimate from the primary channel recordings. [1] B. Widrow, J. R. Grover, J. M. McCool et al. ”Adaptive noise canceling: Principles and applications,” Proceedings of the IEEE, vol.63, pp. 1692-1716, Dec. 1975. Lecture notes for Speech Communications

  8. Signal model Lecture notes for Speech Communications

  9. Signal estimation The estimated signal: The optimization criterion: Lecture notes for Speech Communications

  10. Signal estimation The minimization can be solved by applying the orthogonality principle: • This can be solved in the same way as solving the normal equations. • But it is usually solved by sequential algorithms such as the LMS • algorithm. The advantages of the LMS are: • No matrix inversion, low complexity • Fully adaptive, suitable to non-stationary signal and noise • Low delay Lecture notes for Speech Communications

  11. LMS • It is a sequential, gradient descent minimization method, • The estimate of the weights is updated each time a new sample • is available: Where the element of the gradient vector: Lecture notes for Speech Communications

  12. LMS Or, in matrix form: The most important trick is, in this sequential implementation, to approximate the correlation matrix and cross-correlation vector by The instantaneous estimates. Lecture notes for Speech Communications

  13. LMS The step size is often chosen empirically, as long as the following condition is satisfied for stability reason: where is the largest eigenvalue of the matrix The larger the step-size, the faster the convergence, but also the larger estimation variance. Lecture notes for Speech Communications

  14. Comments • The LMS belongs to the stochastic gradient algorithm. • The algorithm is based on the instantaneous estimates of correlation function, which are of high variance. But the algorithm works well because of its iterative nature, which averages the estimate over time. • Low complexity: O(M), where Mis the filterorder. • Although the derivation is based on WSS assumption, the algorithm is applicable to stationary signals, due to the sequential implementation. Lecture notes for Speech Communications

  15. Implementation issues of ANC • Microphones must be sufficiently separated in space or contain acoustic barriers. • Typically 1500 taps are needed => large misadjustment => pronounced echo => must use small step-size => long convergence time. • Different delays from the sources to the two microphones must be taken care of. • Frequency domain LMS can reduces the number of taps needed. • ANC can be generalizes to a multi-channel system, which can be seen as a generalized beamforming system. Lecture notes for Speech Communications

  16. Eliminating cross-talk Cross-talk: If the signal is also captured in the reference channel, the ANC will suppress part of the signal. Cross-talk can be reduced by employing two adaptive filter within a feedback loop. Lecture notes for Speech Communications

  17. Beamforming • Compared to ANC, beamforming is truly a spatial filtering technique. • First, locate the source direction; then form a beam directing to the source. • The source location problem is a analogy of the spectral analysis problem, with the frequency domain replaced by the spatial domain. Lecture notes for Speech Communications

  18. A simple array model • Planar wave • Uniform linear array • Sensors responses are identical and LTI • Sensors are omni directional • One parameter to estimate: DOA Lecture notes for Speech Communications

  19. ULA Lecture notes for Speech Communications

  20. ULA The signal model: where the array transfer vector : Where is the delay with reference to the first sensor, and is the center frequency of the signal. By defining the spatial frequency as: we can write the array transfer vector as: Lecture notes for Speech Communications

  21. ULA • A direct analogy between frequency analysis and spatial analysis using the spatial frequency. • To avoid spatial aliasing: • All frequency analysis techniques can be applied to the DOA estimation problem. Lecture notes for Speech Communications

  22. Spatial filtering • Analogy between spatial filter and temporal filter Lecture notes for Speech Communications

  23. Spatial filtering • The spatially filtered signal: • Objective: find the filter that passes undistorted the signals with a given DOA; and attenuates all the other DOAs as much as possible. Lecture notes for Speech Communications

  24. The beam pattern Lecture notes for Speech Communications

  25. Restrictions to beamforming • Very sensitive to array geometry, need good calibration • Has only directivity, no selectivity in range or other location parameters • Frequency response is not flat • Ambient noises are assumed to be spatially white • Beam width (or selectivity) depends on the size of the array • Spatial aliasing problem Lecture notes for Speech Communications

  26. Blind Source Separation (BSS) • MIMO systems • Spatial processing techniques with no knowledge of array geometry • Invisible beam • Arbitrarily high spatial resolution • Do not depend on signal frequency • Spatial noise is not assumed to be white • Not a spatial sampling system Lecture notes for Speech Communications

  27. Solutions to BSS • Independent Component Analysis (ICA) [2] • Independent Factor Analysis (IFA) [3] [2] A. Hyvarinen, J. Karhunen, and E. Oja, Independent Component Analysis, John Wiley & Sons, Inc. 2001 [3] H. Attias, “Independent factor analysis”, Neural Computation, 1999. Lecture notes for Speech Communications

  28. Independent component analysis (ICA) • Instantaneous mixing • The number of sensors is greater than or equal to the number of sources • No system noise • The sources (components) are independent of each other • The sources are non-Gaussian processes Lecture notes for Speech Communications

  29. ICA model Cocktail party problem. Three sources, three sensors: Or, in matrix form Neither s nor A are known. Can not be solved by linear algebra. If the sources are independent non-Gaussian, the A matrix can be found by maximizing the non-Gaussianity of the sources. Lecture notes for Speech Communications

  30. Contrast function An iterative gradient method. First initialize the A matrix. If the mixing matrix A is square and non-singular, move it to the left: Calculate the non-Gaussianity of s, and find the next estimate of A that gives a higher non-Gaussianity. Iterate until convergence. The contrast function is the objective function to maximize or minimize. Lecture notes for Speech Communications

  31. Maximizing non-Gaussianity • Non-Gaussian is independent • Measuring non-Gaussianity - by kurtosis - by negentropy Lecture notes for Speech Communications

  32. ICA methods • ICA by maximizing non-Gaussianity • ICA by Maximum Likelihood • ICA by minimizing mutual information • ICA by nonlinear decorrelation Lecture notes for Speech Communications

  33. Extensions to ICA • Noisy ICA • ICA with non-square mixing matrix • Independent Factor Analysis • Convolutive mixture • Methods using time structure Lecture notes for Speech Communications

  34. Blind Source Extraction • Only interested in one or a few sources out of many (feature extraction) • Save computation • Don’t know the exact number of sources Lecture notes for Speech Communications

  35. BSE D. Mandic and A. Cichocki, An Online Algorithm For Blind Extraction Of Sources With Different Dynamical Structures. Lecture notes for Speech Communications

More Related