1 / 57

Blind Separation of Speech Mixtures

Blind Separation of Speech Mixtures. Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University. Introduction. Blind Source Separation. Convolutive. Mixing process:. s 1. s 2. Unmixing process:. Introduction.

chapa
Download Presentation

Blind Separation of Speech Mixtures

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Blind Separation of Speech Mixtures Vaninirappuputhenpurayil Gopalan REJU School of Electrical and Electronic Engineering Nanyang Technological University

  2. Introduction Blind Source Separation Convolutive • Mixing process: s1 s2 • Unmixing process:

  3. Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation

  4. Introduction Convolutive Blind Source Separation Instantaneous Blind Source Separation Difficult to separate Easy to separate • In frequency domain:

  5. Introduction No. of sources < No. of sensor Overdetermined mixing Easy to separate No. of sources = No. of sensor Determined mixing No. of sources > No. of sensor Difficult to separate Underdetermined mixing

  6. Approaches for BSS of Speech Signals Types of mixing Instantaneous mixing Convolutive mixing

  7. Approaches for BSS of Speech Signals Instantaneous mixing Step 1: Selection of cost function Step 2: Minimization or maximization of the cost function X1 S1 Y1 H W S2 Y2 X2 Separated?

  8. Approaches for BSS of Speech Signals Instantaneous mixing Selection of cost function Statistical independence Signals from two different sources are independent Information theoretic Non-Gaussianity Central limit theorem: Mixture of two or more sources will be more Gaussian than their individual components Non Gaussianity measures: Kurtosis Negentropy Nonlinear cross moments Temporal structure of speech Non-stationarity of speech

  9. Approaches for BSS of Speech Signals Instantaneous mixing Minimization or maximization of the cost function simple gradient method Natural gradient method e.g. Informax ICA algorithm Newton’s method e.g. FastICA

  10. Approaches for BSS of Speech Signals Convolutive Mixing Time Domain: Frequency Domain: • Advantage: • No permutation problem • Disadvantage: • Slow convergence • High computational cost for long filter taps • Advantage: • Low computational cost • Fast convergence • Disadvantage: • Permutation Problem X1 S1 Y1 Y2 H W or S2 Y2 Y1 X2

  11. Permutation Problem in Frequency Domain BSS Corresponding to y3 One frequency bin Instantaneous ICA algorithm f1 BSS K point FFT K point IFFT Solving permutation Problem y1 y1 x1 f2 BSS y2 y2 x2 x3 y3 y3 fk BSS Mixed signals Still signals are mixed Separated signals Corresponding to different sources Due to permutation problem

  12. Motivation Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

  13. My Contribution - I Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

  14. Algorithm for Solving the Permutation Problem One frequency bin Instantaneous ICA algorithm f1 BSS K point FFT Solving permutation Problem K point IFFT y1 x1 f2 BSS y2 x2 x3 y3 fk BSS Mixed signals Separated signals Permutation problem solved Permutation problem

  15. Existing Method forSolving the Permutation Problem Direction Of Arrival (DOA) method: Direction of y1 = -30o Direction of y2 = 20o Position of the pth sensor Velocity of sound

  16. Existing Method forSolving the Permutation Problem Direction Of Arrival (DOA) method: • Disadvantages: • Fails at lower frequencies. • Fails when sources are near. • Room reverberation. • Sensor positions must be known. • Reasons for failure at lower freq: • Lower spacing causes error in phase difference measurement. • The relation is approximated for plane wave front under anechoic condition

  17. Existing Method forSolving the Permutation Problem Adjacent bands correlation method: High correlation Low correlation Low correlation f1 BSS K point FFT Solving permutation Problem K point IFFT y1 x1 f2 BSS y2 x2 y3 x3 fk BSS Mixed signals Separated signals

  18. Existing Method forSolving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 s1 …….. K-1 K K+1 K+2 K+3 …….. Correlation matrix r12 r21 r12 r21 r12 r21 r12 r21 r11 r12 r21 r22 s2 …….. K-1 K K+1 K+2 K+3 …….. r22 r22 r22 r22 Example Example With confidence Without confidence Change permutation No change

  19. Existing Method forSolving the Permutation Problem Adjacent bands correlation method: r11 r11 r11 r11 Correlation matrix s1 …….. K-1 K K+1 K+2 K+3 …….. r11 r12 r21 r22 r12 r21 r12 r21 r12 r21 r12 r21 s2 …….. K-1 K K+1 K+2 K+3 …….. r22 r22 r22 r22 Disadvantage: The method is not robust

  20. Existing Method forSolving the Permutation Problem Combination of DOA and Correlation methods method: DOA + Harmonic Correlation + Adjacent bands correlation Advantage: Increased robustness

  21. Proposed algorithm: Partial separation method(Parallel configuration)Reference: V. G. Reju, S. N. Koh and I. Y. Soon, “Partial separation method for solving permutation problem in frequency domain blind source separation of speech signals,” Neurocomputing, Vol. 71, NO. 10–12, June 2008, pp. 2098–2112. Time domain stage Frequency domain stage

  22. Partial separation method(Parallel configuration) Time domain stage Frequency domain stage

  23. Partial separation method(Cascade configuration) Parallel configuration Frequency domain stage Time domain stage

  24. Advantages of Partial Separation method • Robustness

  25. Comparison with Adjacent Bands Correlation Method

  26. Comparison with DOA method PS - Partial Separation method with confidence check, C1 - Correlation between the adjacent bins without confidence check, C2 - Correlation between adjacent bins with confidence check, Ha - Correlation between the harmonic components with confidence check, PS1 - Partial separation method alone without confidence check.

  27. My Contribution -II Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

  28. Underdetermined Blind Source Separation of Instantaneous Mixtures

  29. Mathematical Representation of Instantaneous MixingReference: V. G. Reju, S. N. Koh and I. Y. Soon, “An algorithm for mixing matrix estimation in instantaneous blind source separation,” Signal Processing, Vol. 89, Issue 9, September 2009, pp. 1762–1773. Time domain: P – No. of mixtures Q – No. of sources Time-Frequency domain:

  30. Single Source Points in Time-Frequency domain Single source point 1 Single source point 2 0 0

  31. Single Source Points in Time-Frequency domain Single source point 1 Single source point 2

  32. Single Source Points in Time-Frequency domain Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar .·. At single source point 1: .·. At single source point 2:

  33. Scatter Diagram of the Mixtures When Source are Perfectly Sparse Example: 0 0 0 0 0

  34. Scatter Diagram of the Mixtures When Source are Not Perfectly Sparse Example: 0 0 0 0 0 0

  35. Scatter Diagram of the Mixtures when Sources are Sparse No. of sources = 6 No. of mixtures = 2

  36. Scatter Diagram of the Mixtures when Sources are Sparse, After Clustering No. of sources = 6 No. of mixtures = 2

  37. Scatter Diagram of the Mixtures when Sources are NotPerfectly Sparse Objective: Estimation of the single source points. No. of sources = 6 No. of mixtures = 2

  38. Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar Multi source point

  39. Principle of the Proposed Algorithm for the Detection of Single Source Points Single source point 1 Single source point 2 Scalar Scalar Scalar Scalar Multi source point

  40. Principle of the Proposed Algorithm for the Detection of Single Source Points Average of 15 pairs of speech utterances of length 10 s each SSP MSP

  41. Proposed Algorithm for the Detection of Single Source Points SSP MSP

  42. Elimination of Outliers SSPs detection Clustering Outlier elimination

  43. Experimental Results No. of mixtures =2, No. of sources =6

  44. Detected Single Source Points,Three mixtures No. of mixtures =3, No. of sources =6

  45. Comparison with Classical Algorithms for Determined Case Average of 500 experimental results No. of mixtures =2 No. of sources =2 ->

  46. Comparison with Method Proposed in [1], Underdetermined case Normalized mean square error (NMSE) in mixing matrix estimation (dB) P – No. of mixtures Q – No. of sources Order of the mixing matrices (PxQ) [1] Y. Li, S. Amari, A. Cichocki, D. W. C. Ho, and S. Xie, “Underdetermined blind source separation based on sparse representation,” IEEE Transactions on Signal Processing, vol. 54, p. 423–437, Feb. 2006.

  47. Advantages of the Proposed algorithm 1) Much simpler constrain: the algorithm does not require “single source zone”. 2) Separation performance is better. 3) The algorithm is extremely simple but effective Step 1: Convert x in the time domain to the TF domain to get X. Step 2: Check the condition Step 3: If the condition is satisfied, then X(k, t)is a sample at the SSP, and this sample is kept for mixing matrix estimation; otherwise, discard the point. Step 4: Repeat Steps 2 to 3 for all the points in the TF plane or until sufficient number of SSPs are obtained. ->

  48. My Contributions – III, IV and V Instantaneous Determined/ Overdetermined Frequency domain Frequency bin-wise separation Permutation problem # mixtures ≥ # sources Convolutive Time domain BSS Instantaneous Mixing matrix estimation Source estimation Underdetermined # mixtures < # sources Frequency domain Frequency bin-wise separation Permutation problem Convolutive Automatic detection of no. of sources Time domain

  49. Underdetermined Convolutive Blind Source Separation via Time-Frequency MaskingReference: V. G. Reju, S. N. Koh and I. Y. Soon, “Underdetermined Convolutive Blind Source Separation via Time- Frequency Masking,” IEEE Transactions on Audio, Speech and Language Processing, Vol. 18, NO. 1, Jan. 2010, pp. 101–116. STFT Apply mask Mic 1 Mixture in TF domain STFT Apply Mask Mic P Mask estimation Separated signals in TF domain

  50. Mathematical Representation Time domain: P – No. of mixtures Q – No. of sources Frequency domain:

More Related