# A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds - PowerPoint PPT Presentation

1 / 14

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds. Paris Smaragdis , Madhusudana Shashanka , Bhiksha Raj NIPS 2009. Introduction. Problem : Single channel signal separation Separating out signals from individual sources in a mixed recording General approach

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

## A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

NIPS 2009

### Introduction

• Problem : Single channel signal separation

• Separating out signals from individual sources in a mixed recording

• General approach

• Derive a generalizable model that captures the salient features of each source

• Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources

### Physical Intuition

Recover sources by reweighting of frequency subbands from a single recording

### Latent Variable Model

• Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins

• At a given time frame t, Pt(f) represents the probabilty of drawing frequency f

• The model assumes that Pt(f) is comprised of bases indexed by a latent variable z

### Latent Variable Model (Contd.)

• Now let the matrix VF×T of entries vft represent the magnitude spectrogram of the mixture sound and vt represent time frame t (the t-th column vector of matrix V)

• First we assume that we have an already trained model in the form of basis vector Ps (f/z)

• These bases represent a dictionary of spectra that best describe each source

### Source separation

• Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source

• Use EM algorithm to estimate Pt (z/s) and Pt(s)

• The reconstruction of the contribution of source s in the mixture is given by

### Contribution of this paper

• Use training data directly as a dictionary

• Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window)

• Side-step the need for separate model training step

• Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models

• Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates

### Using Training data as Dictionary

• Use each frame of the spectrograms of the training sequences as the bases Ps(f/z)

• Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T(s) values, and the z-th basis function will be given by the z-th column vector of W(s)

• With the above model ideally one would want to use one dictionary element per source at any point of time

• Ensure output lie on the source manifold

• Similar to a nearest neighbor model (search is computationally very expensive)

• In this paper authors propose using sparsity

### Entropic prior

• Given a probability distribution θ the entropic prior is defined as

• α is a weighting factor and determines the level of sparsity

• A sparse representation has a low entropy (since only few elements are ‘active”)

• Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation

### Sparse approximation

• We would like to minimize the entropies of both the speaker dependent mixture weights

and the source priors at every frame

• However,

• Thus reducing the entropy of the joint distribution

is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors

### Sparse approximation

• The model written in terms of this parameter is given by,

• To impose sparsity we apply the entropic prior given by,

• Apply EM to estimate

• Reconstructed source is given by,