html5-img
1 / 43

Brian King , bbking@uw Advised by Les Atlas Electrical Engineering, University of Washington

Brian King , bbking@uw.edu Advised by Les Atlas Electrical Engineering, University of Washington This research was funded by Air Force Office of Scientific Research. A Framework for Complex Probabilistic Latent Semantic Analysis and its Application to Single-Channel Source Separation.

Download Presentation

Brian King , bbking@uw Advised by Les Atlas Electrical Engineering, University of Washington

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Brian King, bbking@uw.edu Advised by Les Atlas Electrical Engineering, University of Washington This research was funded by Air Force Office of Scientific Research A Framework for Complex Probabilistic Latent Semantic Analysis and its Application to Single-Channel Source Separation

  2. Problem Statement • Develop a theoretical framework for complex probabilistic latent semantic analysis (CPLSA) and its application in single-channel source separation Intro Background Current Proposed

  3. Outline • Introduction • Background • My current contributions • Proposed work Intro Background Current Proposed

  4. Nonnegative Matrix Factorization (NMF) Basis Index (k) Xf,t Bf,k X Wk,t Frequency (f) Frequency (f) Time (t) Basis Index (k) [1] D.D. Lee and H.S. Seung, “Algorithms for Non-Negative Matrix Factorization,” Neural Information Processing Systems, 2001, pp. 556--562. Intro Background Current Proposed

  5. Using Matrix Factorization for Source Separation Xindiv STFT* Find Bases xindiv ISTFT** Y1 y1 Xmixed B, W Separation STFT* Find Weights Separation Y2 xmixed ISTFT** y2 *Short Time Fourier Transform **Inverse Short Time Fourier Transform Intro Background Current Proposed

  6. Using Matrix Factorization for Synthesis / Source Separation B, W Y1 Source Separation Matrix Factorization Synthesis Y1 X Y2 W W1 Y1 B B1 B2 X W2 Y2 Basesf,k Synthesized Signalf,t Separated Signalsf,t Weightsk,t Intro Background Current Proposed

  7. NMF Cost Function: Frobenius Norm with Sparsity where Frobenius2 L1 Sparsity Intro Background Current Proposed X Xf,t Bf,k Wk,t

  8. Probabilistic Latent Semantic Analysis (PLSA) • Views the magnitude spectrogram as a joint probability distribution [2] M. Shashanka, B. Raj, and P. Smaragdis, “Probabilistic Latent Variable Models as Nonnegative Factorizations,” Computational Intelligence and Neuroscience, vol. 2008, 2008, pp. 1-9. Intro Background Current Proposed

  9. Probabilistic Latent Semantic Analysis (PLSA) • Uses the following generative model • Pick a time, P(t) • Pick a base from that time, P(k|t) • Pick a frequency of that base, P(f|k) • Increment the chosen (f,t) by one • Repeat • Can be written as Intro Background Current Proposed

  10. Probabilistic Latent Semantic Analysis (PLSA) • Relationship to NMF • P(t) is the sum of all magnitude at time t • P(k|t) similar to weight matrix Wk,t • P(f|k) similar to base matrix Bf,k • NMF • PLSA Intro Background Current Proposed

  11. Probabilistic Latent Semantic Analysis • Advantage of PLSA over NMF: Extensibility • A tremendous amount of applicable literature on generative models • Entropic priors [2] • HMM’s with state-dependent dictionaries [6] [2] M. Shashanka, B. Raj, and P. Smaragdis, “Probabilistic Latent Variable Models as Nonnegative Factorizations,” Computational Intelligence and Neuroscience, vol. 2008, 2008, pp. 1-9. [6] G.J. Mysore, “A Non-Negative Framework for Joint Modeling of Spectral Structures and Temporal Dynamics in Sound Mixtures,” PhD Thesis, Stanford University, 2010.   Intro Background Current Proposed

  12. … but superposition? #1 #2 Original Sources Mixture Proper Separation !!! !!! NMF Separation Intro Background Current Proposed

  13. CMF Cost Function: Frobenius Norm with Sparsity where Frobenius2 L1 Sparsity [3] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, “Complex NMF: A New Sparse Representation for Acoustic Signals,” International Conference on Acoustics, Speech, and Signal Processing, 2009. Intro Background Current Proposed X Xf,t Bf,k Wk,t

  14. Comparing NMF and CMF via ASR: Introduction • Data • Boston University news corpus [7] • 150 utterances (72 minutes) • Two talkers synthetically mixed at 0 dB target/masker ratio • 1 minute each of clean speech used for training • Recognizers • Sphinx-3 (CMU) • SRI [7] M. Ostendorf, “The Boston University Radio Corpus,” 1995. Intro Background Current Proposed

  15. Comparing NMF and CMF via ASR: Results Better Word Accuracy % Unprocessed Non-negative Complex * Error bars mark 95% confidence level Intro Background Current Proposed

  16. Comparing NMF and CMF via ASR: Conclusion • Incorporating phase estimates into matrix factorization can improve source separation performance • Complex matrix factorization is worth further research [4] B. King and L. Atlas, “Single-Channel Source Separation Using Complex Matrix Factorization,” IEEE Transactions on Audio, Speech, and Language Processing (submitted). [5] B. King and L. Atlas, “Single-channel Source Separation using Simplified-training Complex Matrix Factorization,” International Conference on Acoustics, Speech, and Signal Processing, Dallas, TX: 2010. Intro Background Current Proposed

  17. … but overparameterization? • can result in a potentially infinite number of solutions… which isn’t a good thing! • Example: estimate observation with 3 bases, #1 #2 #3 Intro Background Current Proposed

  18. Review of Current Methods • Difficult to • Extend • Extendible PLSA ? • Overparameterization • Unique • Superposition • Additive NMF CMF Intro Background Current Proposed

  19. Proposed Solution:Complex Probabilistic Latent Semantic Analysis (CPLSA) • Goal: incorporate phase observation and estimation into current nonnegative PLSA framework • Implicitly solves • Extensibility • Superposition • Proposal to solve • Overparameterization Intro Background Current Proposed

  20. Proposed Solution: Outline • Transform complex to nonnegative data • 3 CPLSA variants • Phase constraints for STFT consistency • Unique solution Intro Background Current Proposed

  21. Transform Complex to Nonnegative Data • Why is this important? • Modeling observed data Xf,tas a probability mass function • PMF’s are nonnegative, real • Observation needs to be nonnegative, real If then Intro Background Current Proposed

  22. Transform Complex to Nonnegative Data • Starting point: Shashanka[8] • N real → N+1 nonnegative • Algorithm • N+1-length orthogonal vectors (AN+1,N) • Affine transform (for nonnegativity) • Normalize • My new, proposed method • N complex → 2N real • 2N real data → 2N+1 nonnegative [8] M. Shashanka, “Simplex Decompositions for Real-Valued Datasets,” IEEE International Workshop on Machine Learning for Signal Processing, 2009, pp. 1-6. Intro Background Current Proposed

  23. Transform Complex to Nonnegative Data Intro Background Current Proposed

  24. 3 Variants of CPLSA • #1 Complex bases • Phase is associated with bases • Not a good model for STFT • #2 Nonnegative bases + base-dependent phases • Good model for audio, but overparameterized Intro Background Current Proposed

  25. 3 Variants of CPLSA • Nonnegative bases + source-dependent phases • Additive source model • Good model for audio • Fewer parameters • Simplifies to NMF for single-source case • Compare with CPLSA #2 Intro Background Current Proposed

  26. Phase Constraints for STFT Consistency • STFT is consistent when • Incorporate STFT consistency [9] into phase estimation step for separated sources • Unique solution! [9] J. Le Roux, N. Ono, and S. Sagayama, “Explicit Consistency Constraints for STFT Spectrograms and Their Application to Phase Reconstruction,” 2008. Intro Background Current Proposed

  27. Summary of Proposed Theory • Goal: incorporate phase observation and estimation into current nonnegative PLSA framework (extensible, additive, unique) • Theory • Transform complex to nonnegative data • 3 CPLSA variants • Phase constraints for STFT consistency Intro Background Current Proposed

  28. Proposed Experiments • Separating speech in structured, nonstationary noise • Methods • CPLSA, PLSA, CMF • Noise • Babble noise • Automotive noise • Measurements • Objective perceptual • ASR Intro Background Current Proposed

  29. Objective Measurement Tests • Goal: explore parameter space • How they affect performance in CPLSA • Find best-performing parameters • Compare performance of CPLSA with PLSA, CMF • Data • TIMIT corpus [10] • Measurements • Blind Source Separation Evaluation Toolbox [11] • Perceptual Evaluation of Speech Quality (PESQ) [12] [10] J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus, NIST, 1993. [11] E. Vincent, R. Gribonval, and C. Fevotte, “Performance Measurement in Blind Audio Source Separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, 2006, pp. 1462-1469. [12] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual Evaluation of Speech Quality (PESQ) - A New Method for Speech Quality Assessment of Telephone Networks and Codecs,” ICASSP, 2001, pp. 749-752 vol.2. Intro Background Current Proposed

  30. Automatic Speech Recognition Tests • Goal: test robustness of parameters • Use best-performing parameters from objective measurements • Compare performance of CPLSA with PLSA, CMF • Data • Wall Street Journal corpus [13] • ASR System • Sphinx-3 (CMU) [13] D.B. Paul and J.M. Baker, “The Design for the Wall Street Journal-Based CSR Corpus,” Proceedings of the workshop on Speech and Natural Language, Stroudsburg, PA, USA: Association for Computational Linguistics, 1992, pp. 357–362. Intro Background Current Proposed

  31. Examples

  32. Subway Noise NMF 4.3 dB improvement Frequency (Hz) Time (s)

  33. Subway Noise NMF 4.2 dB improvement Frequency (Hz) Time (s)

  34. Fountain Noise Example #1 • Target speaker synthetically added at -3 dB SNR • Speaker model trained on 60 seconds clean speech

  35. Fountain Noise Example #2 • No “clean speech” available for training of target talker • Generic speaker modelused

  36. Mixed Speech (0 dB, no reverb)

  37. Mixed Speech (0 dB, reverb)

  38. Thank you!

  39. Why not encode phase into bases? Individual phase term X B W ejθ Intro Background Current Proposed

  40. Why not encode phase into bases? Complex B, W X B W Intro Background Current Proposed

  41. BSS Evaluation Measures

  42. … but superposition? Intro Background Current Proposed

More Related