slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Brian King , bbking@uw Advised by Les Atlas Electrical Engineering, University of Washington PowerPoint Presentation
Download Presentation
Brian King , bbking@uw Advised by Les Atlas Electrical Engineering, University of Washington

Loading in 2 Seconds...

play fullscreen
1 / 43

Brian King , bbking@uw Advised by Les Atlas Electrical Engineering, University of Washington - PowerPoint PPT Presentation


  • 82 Views
  • Uploaded on

Brian King , bbking@uw.edu Advised by Les Atlas Electrical Engineering, University of Washington This research was funded by Air Force Office of Scientific Research. A Framework for Complex Probabilistic Latent Semantic Analysis and its Application to Single-Channel Source Separation.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Brian King , bbking@uw Advised by Les Atlas Electrical Engineering, University of Washington' - cally-allison


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Brian King, bbking@uw.edu

Advised by Les Atlas

Electrical Engineering, University of Washington

This research was funded by Air Force

Office of Scientific Research

A Framework for Complex Probabilistic Latent Semantic Analysis and its Application to Single-Channel Source Separation

problem statement
Problem Statement
  • Develop a theoretical framework for complex probabilistic latent semantic analysis (CPLSA) and its application in single-channel source separation

Intro Background Current Proposed

outline
Outline
  • Introduction
  • Background
  • My current contributions
  • Proposed work

Intro Background Current Proposed

nonnegative matrix factorization nmf
Nonnegative Matrix Factorization (NMF)

Basis Index (k)

Xf,t

Bf,k

X

Wk,t

Frequency (f)

Frequency (f)

Time (t)

Basis Index (k)

[1] D.D. Lee and H.S. Seung, “Algorithms for Non-Negative Matrix Factorization,” Neural Information Processing Systems, 2001, pp. 556--562.

Intro Background Current Proposed

using matrix factorization for source separation
Using Matrix Factorization for Source Separation

Xindiv

STFT*

Find Bases

xindiv

ISTFT**

Y1

y1

Xmixed

B, W

Separation

STFT*

Find Weights

Separation

Y2

xmixed

ISTFT**

y2

*Short Time Fourier Transform

**Inverse Short Time Fourier Transform

Intro Background Current Proposed

using matrix factorization for synthesis source separation
Using Matrix Factorization for Synthesis / Source Separation

B, W

Y1

Source Separation

Matrix Factorization

Synthesis

Y1

X

Y2

W

W1

Y1

B

B1

B2

X

W2

Y2

Basesf,k

Synthesized Signalf,t

Separated Signalsf,t

Weightsk,t

Intro Background Current Proposed

nmf cost function frobenius norm with sparsity
NMF Cost Function: Frobenius Norm with Sparsity

where

Frobenius2

L1 Sparsity

Intro Background Current Proposed

X

Xf,t

Bf,k

Wk,t

probabilistic latent semantic analysis plsa
Probabilistic Latent Semantic Analysis (PLSA)
  • Views the magnitude spectrogram as a joint probability distribution

[2] M. Shashanka, B. Raj, and P. Smaragdis, “Probabilistic Latent Variable Models as Nonnegative Factorizations,” Computational Intelligence and Neuroscience, vol. 2008, 2008, pp. 1-9.

Intro Background Current Proposed

probabilistic latent semantic analysis plsa1
Probabilistic Latent Semantic Analysis (PLSA)
  • Uses the following generative model
    • Pick a time, P(t)
    • Pick a base from that time, P(k|t)
    • Pick a frequency of that base, P(f|k)
    • Increment the chosen (f,t) by one
    • Repeat
  • Can be written as

Intro Background Current Proposed

probabilistic latent semantic analysis plsa2
Probabilistic Latent Semantic Analysis (PLSA)
  • Relationship to NMF
    • P(t) is the sum of all magnitude at time t
    • P(k|t) similar to weight matrix Wk,t
    • P(f|k) similar to base matrix Bf,k
  • NMF
  • PLSA

Intro Background Current Proposed

probabilistic latent semantic analysis
Probabilistic Latent Semantic Analysis
  • Advantage of PLSA over NMF: Extensibility
    • A tremendous amount of applicable literature on generative models
      • Entropic priors [2]
      • HMM’s with state-dependent dictionaries [6]

[2] M. Shashanka, B. Raj, and P. Smaragdis, “Probabilistic Latent Variable Models as Nonnegative Factorizations,” Computational Intelligence and Neuroscience, vol. 2008, 2008, pp. 1-9.

[6] G.J. Mysore, “A Non-Negative Framework for Joint Modeling of Spectral Structures and Temporal Dynamics in Sound Mixtures,” PhD Thesis, Stanford University, 2010.  

Intro Background Current Proposed

but superposition
… but superposition?

#1

#2

Original

Sources

Mixture

Proper

Separation

!!!

!!!

NMF

Separation

Intro Background Current Proposed

cmf cost function frobenius norm with sparsity
CMF Cost Function: Frobenius Norm with Sparsity

where

Frobenius2

L1 Sparsity

[3] H. Kameoka, N. Ono, K. Kashino, and S. Sagayama, “Complex NMF: A New Sparse Representation for Acoustic Signals,” International Conference on Acoustics, Speech, and Signal Processing, 2009.

Intro Background Current Proposed

X

Xf,t

Bf,k

Wk,t

comparing nmf and cmf via asr introduction
Comparing NMF and CMF via ASR: Introduction
  • Data
    • Boston University news corpus [7]
    • 150 utterances (72 minutes)
    • Two talkers synthetically mixed at 0 dB target/masker ratio
    • 1 minute each of clean speech used for training
  • Recognizers
    • Sphinx-3 (CMU)
    • SRI

[7] M. Ostendorf, “The Boston University Radio Corpus,” 1995.

Intro Background Current Proposed

comparing nmf and cmf via asr results
Comparing NMF and CMF via ASR: Results

Better

Word Accuracy %

Unprocessed Non-negative Complex

* Error bars mark 95% confidence level

Intro Background Current Proposed

comparing nmf and cmf via asr conclusion
Comparing NMF and CMF via ASR: Conclusion
  • Incorporating phase estimates into matrix factorization can improve source separation performance
  • Complex matrix factorization is worth further research

[4] B. King and L. Atlas, “Single-Channel Source Separation Using Complex Matrix Factorization,” IEEE Transactions on Audio, Speech, and Language Processing (submitted).

[5] B. King and L. Atlas, “Single-channel Source Separation using Simplified-training Complex Matrix Factorization,” International Conference on Acoustics, Speech, and Signal Processing, Dallas, TX: 2010.

Intro Background Current Proposed

but overparameterization
… but overparameterization?
  • can result in a potentially infinite number of solutions… which isn’t a good thing!
  • Example: estimate observation with 3 bases,

#1

#2

#3

Intro Background Current Proposed

review of current methods
Review of Current Methods
  • Difficult to
  • Extend
  • Extendible

PLSA

?

  • Overparameterization
  • Unique
  • Superposition
  • Additive

NMF

CMF

Intro Background Current Proposed

proposed solution complex probabilistic latent semantic analysis cplsa
Proposed Solution:Complex Probabilistic Latent Semantic Analysis (CPLSA)
  • Goal: incorporate phase observation and estimation into current nonnegative PLSA framework
  • Implicitly solves
    • Extensibility
    • Superposition
  • Proposal to solve
    • Overparameterization

Intro Background Current Proposed

proposed solution outline
Proposed Solution: Outline
  • Transform complex to nonnegative data
  • 3 CPLSA variants
  • Phase constraints for STFT consistency
    • Unique solution

Intro Background Current Proposed

transform complex to nonnegative data
Transform Complex to Nonnegative Data
  • Why is this important?
    • Modeling observed data Xf,tas a probability mass function
    • PMF’s are nonnegative, real
    • Observation needs to be nonnegative, real

If

then

Intro Background Current Proposed

transform complex to nonnegative data1
Transform Complex to Nonnegative Data
  • Starting point: Shashanka[8]
    • N real → N+1 nonnegative
  • Algorithm
    • N+1-length orthogonal vectors (AN+1,N)
    • Affine transform (for nonnegativity)
    • Normalize
  • My new, proposed method
    • N complex → 2N real
    • 2N real data → 2N+1 nonnegative

[8] M. Shashanka, “Simplex Decompositions for Real-Valued Datasets,” IEEE International Workshop on Machine Learning for Signal Processing, 2009, pp. 1-6.

Intro Background Current Proposed

transform complex to nonnegative data2
Transform Complex to Nonnegative Data

Intro Background Current Proposed

3 variants of cplsa
3 Variants of CPLSA
  • #1 Complex bases
    • Phase is associated with bases
    • Not a good model for STFT
  • #2 Nonnegative bases + base-dependent phases
    • Good model for audio, but overparameterized

Intro Background Current Proposed

3 variants of cplsa1
3 Variants of CPLSA
  • Nonnegative bases + source-dependent phases
    • Additive source model
    • Good model for audio
    • Fewer parameters
    • Simplifies to NMF for single-source case
  • Compare with CPLSA #2

Intro Background Current Proposed

phase constraints for stft consistency
Phase Constraints for STFT Consistency
  • STFT is consistent when
  • Incorporate STFT consistency [9] into phase estimation step for separated sources
  • Unique solution!

[9] J. Le Roux, N. Ono, and S. Sagayama, “Explicit Consistency Constraints for STFT Spectrograms and Their Application to Phase Reconstruction,” 2008.

Intro Background Current Proposed

summary of proposed theory
Summary of Proposed Theory
  • Goal: incorporate phase observation and estimation into current nonnegative PLSA framework (extensible, additive, unique)
  • Theory
    • Transform complex to nonnegative data
    • 3 CPLSA variants
    • Phase constraints for STFT consistency

Intro Background Current Proposed

proposed experiments
Proposed Experiments
  • Separating speech in structured, nonstationary noise
  • Methods
    • CPLSA, PLSA, CMF
  • Noise
    • Babble noise
    • Automotive noise
  • Measurements
    • Objective perceptual
    • ASR

Intro Background Current Proposed

objective measurement tests
Objective Measurement Tests
  • Goal: explore parameter space
    • How they affect performance in CPLSA
    • Find best-performing parameters
    • Compare performance of CPLSA with PLSA, CMF
  • Data
    • TIMIT corpus [10]
  • Measurements
    • Blind Source Separation Evaluation Toolbox [11]
    • Perceptual Evaluation of Speech Quality (PESQ) [12]

[10] J.S. Garofolo, L.F. Lamel, W.M. Fisher, J.G. Fiscus, D.S. Pallett, and N.L. Dahlgren, DARPA TIMIT Acoustic Phonetic Continuous Speech Corpus, NIST, 1993.

[11] E. Vincent, R. Gribonval, and C. Fevotte, “Performance Measurement in Blind Audio Source Separation,” IEEE Transactions on Audio, Speech, and Language Processing, vol. 14, 2006, pp. 1462-1469.

[12] A. Rix, J. Beerends, M. Hollier, and A. Hekstra, “Perceptual Evaluation of Speech Quality (PESQ) - A New Method for Speech Quality Assessment of Telephone Networks and Codecs,” ICASSP, 2001, pp. 749-752 vol.2.

Intro Background Current Proposed

automatic speech recognition tests
Automatic Speech Recognition Tests
  • Goal: test robustness of parameters
    • Use best-performing parameters from objective measurements
    • Compare performance of CPLSA with PLSA, CMF
  • Data
    • Wall Street Journal corpus [13]
  • ASR System
    • Sphinx-3 (CMU)

[13] D.B. Paul and J.M. Baker, “The Design for the Wall Street Journal-Based CSR Corpus,” Proceedings of the workshop on Speech and Natural Language, Stroudsburg, PA, USA: Association for Computational Linguistics, 1992, pp. 357–362.

Intro Background Current Proposed

slide32

Subway Noise NMF

4.3 dB improvement

Frequency (Hz)

Time (s)

slide33

Subway Noise NMF

4.2 dB improvement

Frequency (Hz)

Time (s)

fountain noise example 1
Fountain Noise Example #1
  • Target speaker synthetically added at -3 dB SNR
  • Speaker model trained on 60 seconds clean speech
fountain noise example 2
Fountain Noise Example #2
  • No “clean speech” available for training of target talker
    • Generic speaker modelused
why not encode phase into bases individual phase term
Why not encode phase into bases? Individual phase term

X

B

W

ejθ

Intro Background Current Proposed

why not encode phase into bases complex b w
Why not encode phase into bases? Complex B, W

X

B

W

Intro Background Current Proposed

but superposition1
… but superposition?

Intro Background Current Proposed