Loading in 5 sec....

A Sparse Non-Parametric Approach for Single Channel Separation of Known SoundsPowerPoint Presentation

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

- 251 Views
- Uploaded on

Download Presentation
## PowerPoint Slideshow about 'A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds' - Pat_Xavi

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

### A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Paris Smaragdis, MadhusudanaShashanka, Bhiksha Raj

NIPS 2009

Introduction Separation of Known Sounds

- Problem : Single channel signal separation
- Separating out signals from individual sources in a mixed recording

- General approach
- Derive a generalizable model that captures the salient features of each source
- Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources

Physical Intuition Separation of Known Sounds

Recover sources by reweighting of frequency subbands from a single recording

Latent Variable Model Separation of Known Sounds

- Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins
- At a given time frame t, Pt(f) represents the probabilty of drawing frequency f
- The model assumes that Pt(f) is comprised of bases indexed by a latent variable z

Latent Variable Model (Contd.) Separation of Known Sounds

- Now let the matrix VF×T of entries vft represent the magnitude spectrogram of the mixture sound and vt represent time frame t (the t-th column vector of matrix V)
- First we assume that we have an already trained model in the form of basis vector Ps (f/z)
- These bases represent a dictionary of spectra that best describe each source

Source separation Separation of Known Sounds

- Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source
- Use EM algorithm to estimate Pt (z/s) and Pt(s)

- The reconstruction of the contribution of source s in the mixture is given by

Contribution of this paper Separation of Known Sounds

- Use training data directly as a dictionary
- Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window)
- Side-step the need for separate model training step
- Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models
- Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates

Using Training data as Dictionary Separation of Known Sounds

- Use each frame of the spectrograms of the training sequences as the bases Ps(f/z)
- Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T(s) values, and the z-th basis function will be given by the z-th column vector of W(s)

- With the above model ideally one would want to use one dictionary element per source at any point of time
- Ensure output lie on the source manifold
- Similar to a nearest neighbor model (search is computationally very expensive)
- In this paper authors propose using sparsity

Entropic prior Separation of Known Sounds

- Given a probability distribution θ the entropic prior is defined as
- α is a weighting factor and determines the level of sparsity
- A sparse representation has a low entropy (since only few elements are ‘active”)
- Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation

Sparse approximation Separation of Known Sounds

- We would like to minimize the entropies of both the speaker dependent mixture weights
and the source priors at every frame

- However,
- Thus reducing the entropy of the joint distribution
is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors

- Thus reducing the entropy of the joint distribution

Sparse approximation Separation of Known Sounds

- The model written in terms of this parameter is given by,
- To impose sparsity we apply the entropic prior given by,
- Apply EM to estimate
- Reconstructed source is given by,

Results on real data Separation of Known Sounds

Results on real data Separation of Known Sounds

Comments Separation of Known Sounds

- The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise
- Unfortunate side effect is the need to use a very large dictionary
- However significant reduction in dictionary size may be achieved by using an energy threshold to select the loudest frames of he training spectrogram as bases
- Outperforms trained basis models of same size

Download Presentation

Connecting to Server..