A sparse non parametric approach for single channel separation of known sounds
Download
1 / 14

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds - PowerPoint PPT Presentation


A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds. Paris Smaragdis , Madhusudana Shashanka , Bhiksha Raj NIPS 2009. Introduction. Problem : Single channel signal separation Separating out signals from individual sources in a mixed recording General approach

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha

Download Presentation

A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


A Sparse Non-Parametric Approach for Single Channel Separation of Known Sounds

Paris Smaragdis, MadhusudanaShashanka, Bhiksha Raj

NIPS 2009


Introduction

  • Problem : Single channel signal separation

    • Separating out signals from individual sources in a mixed recording

  • General approach

    • Derive a generalizable model that captures the salient features of each source

    • Separation is achieved by abstracting components from the mixed signal that conform to the characterization of the individual sources


Physical Intuition

Recover sources by reweighting of frequency subbands from a single recording


Latent Variable Model

  • Given magnitude spectrogram of a single source, each spectral frame is modeled as a histogram of repeated draws from a multinomial distribution over the frequency bins

    • At a given time frame t, Pt(f) represents the probabilty of drawing frequency f

    • The model assumes that Pt(f) is comprised of bases indexed by a latent variable z


Latent Variable Model (Contd.)

  • Now let the matrix VF×T of entries vft represent the magnitude spectrogram of the mixture sound and vt represent time frame t (the t-th column vector of matrix V)

  • First we assume that we have an already trained model in the form of basis vector Ps (f/z)

    • These bases represent a dictionary of spectra that best describe each source


Source separation

  • Decompose a new mixture of these known sources in terms of the contributions of the dictionaries of each source

    • Use EM algorithm to estimate Pt (z/s) and Pt(s)

  • The reconstruction of the contribution of source s in the mixture is given by


Contribution of this paper

  • Use training data directly as a dictionary

    • Authors argue that given any sufficiently large collection of data from a source the best possible characterization of any data is quite simply the data themselves (e.g., non-parametric density learning using Parzen-window)

    • Side-step the need for separate model training step

    • Large dictionary provides a better description of the sources, as opposed to the less expressive learned basis models

    • Source estimates are guaranteed to lie on the source manifold as opposed to trained approaches which can produce arbitrary outputs that will not necessarily be plausible source estimates


Using Training data as Dictionary

  • Use each frame of the spectrograms of the training sequences as the bases Ps(f/z)

    • Let be the training spectrogram from source s. In this case, the latent variable z for source s takes T(s) values, and the z-th basis function will be given by the z-th column vector of W(s)

  • With the above model ideally one would want to use one dictionary element per source at any point of time

    • Ensure output lie on the source manifold

    • Similar to a nearest neighbor model (search is computationally very expensive)

    • In this paper authors propose using sparsity


Entropic prior

  • Given a probability distribution θ the entropic prior is defined as

    • α is a weighting factor and determines the level of sparsity

    • A sparse representation has a low entropy (since only few elements are ‘active”)

    • Imposing this prior during MAP estimation is a way to minimize entropy during estimation which will result in sparse θ representation


Sparse approximation

  • We would like to minimize the entropies of both the speaker dependent mixture weights

    and the source priors at every frame

  • However,

    • Thus reducing the entropy of the joint distribution

      is equivalent to reducing the conditional entropy of the source dependent mixture weights and the entropy of the source priors


Sparse approximation

  • The model written in terms of this parameter is given by,

  • To impose sparsity we apply the entropic prior given by,

  • Apply EM to estimate

  • Reconstructed source is given by,


Results on real data


Results on real data


Comments

  • The use of sparsity ensures that the output is a plausible speech signal devoid of artifacts like distortion and musical noise

  • Unfortunate side effect is the need to use a very large dictionary

    • However significant reduction in dictionary size may be achieved by using an energy threshold to select the loudest frames of he training spectrogram as bases

    • Outperforms trained basis models of same size


ad
  • Login