Loading in 5 sec....

A Sparse Non-Parametric Approach for Single Channel Separation of Known SoundsPowerPoint Presentation

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Download Presentation

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Loading in 2 Seconds...

- 235 Views
- Uploaded on
- Presentation posted in: Science / Technology

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Paris Smaragdis, MadhusudanaShashanka, Bhiksha Raj

NIPS 2009

- Problem : Single channel signal separation
- Separating out signals from individual sources in a mixed recording

- General approach
- Derive a generalizable model that captures the salient features of each source
- Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources

Recover sources by reweighting of frequency subbands from a single recording

- Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins
- At a given time frame t, Pt(f) represents the probabilty of drawing frequency f
- The model assumes that Pt(f) is comprised of bases indexed by a latent variable z

- Now let the matrix VF×T of entries vft represent the magnitude spectrogram of the mixture sound and vt represent time frame t (the t-th column vector of matrix V)
- First we assume that we have an already trained model in the form of basis vector Ps (f/z)
- These bases represent a dictionary of spectra that best describe each source

- Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source
- Use EM algorithm to estimate Pt (z/s) and Pt(s)

- The reconstruction of the contribution of source s in the mixture is given by

- Use training data directly as a dictionary
- Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window)
- Side-step the need for separate model training step
- Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models
- Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates

- Use each frame of the spectrograms of the training sequences as the bases Ps(f/z)
- Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T(s) values, and the z-th basis function will be given by the z-th column vector of W(s)

- With the above model ideally one would want to use one dictionary element per source at any point of time
- Ensure output lie on the source manifold
- Similar to a nearest neighbor model (search is computationally very expensive)
- In this paper authors propose using sparsity

- Given a probability distribution θ the entropic prior is defined as
- α is a weighting factor and determines the level of sparsity
- A sparse representation has a low entropy (since only few elements are ‘active”)
- Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation

- We would like to minimize the entropies of both the speaker dependent mixture weights
and the source priors at every frame

- However,
- Thus reducing the entropy of the joint distribution
is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors

- Thus reducing the entropy of the joint distribution

- The model written in terms of this parameter is given by,
- To impose sparsity we apply the entropic prior given by,
- Apply EM to estimate
- Reconstructed source is given by,

- The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise
- Unfortunate side effect is the need to use a very large dictionary
- However significant reduction in dictionary size may be achieved by using an energy threshold to select the loudest frames of he training spectrogram as bases
- Outperforms trained basis models of same size