Presenter : 張庭豪 - PowerPoint PPT Presentation

Modulation Spectrum Factorization for
1 / 15

  • Uploaded on
  • Presentation posted in: General

Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1 , Jeih-weih Hung 2 and Berlin Chen 1. Presenter : 張庭豪. Outline. Introduction Nonnegative Matrix Factorization (NMF) Updating the Modulation Spectrum via NMF Experimental Setup

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Presenter : 張庭豪

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Modulation Spectrum Factorization for Robust Speech RecognitionWen-Yi Chu1, Jeih-weih Hung2 and Berlin Chen1

Presenter: 張庭豪



  • Introduction

  • Nonnegative Matrix Factorization (NMF)

  • Updating the Modulation Spectrum via NMF

  • Experimental Setup

  • Experimental Results and Discussions

  • Conclusion and Future Work



  • NMF is a recently developed method for finding a linear and non-subtractive combination scheme to extract important ingredients that can correspond better with

  • Most of the useful linguistic information is encapsulated in the

    modulation frequency components between 1 Hz and 16 Hz,with the

    dominant component centering around 4 Hz.

  • We attempt to refine the features in the magnitude part of modulation spectra (which is always real and non-negative) via the technique of non-negative matrix factorization

Nonnegative matrix factorization 1 3

Nonnegative Matrix Factorization (1/3)

  • Nonnegative matrix factorization (NMF) is a subspace method that approximates data with an additive and linear combination of nonnegative components (or basis vectors)

  • Given a nonnegative data matrix , NMF computes another two nonnegative matrices and such that V ≈ WH

    • r<< L and r<< M to ensure efficient encoding







(short and wide)

(tall and thin)

Nonnegative matrix factorization 2 3

Nonnegative Matrix Factorization (2/3)

  • V ≈ WH

  • To find an approximate factorization as V ≈ WH, the cost function is defined as:

  • With an initial (random) guess of W and H, the following multiplicative updating rule is employed to achieve a local minimum of L:


Nonnegative matrix factorization 3 3

Nonnegative Matrix Factorization (3/3)

  • Procedures:

Updating the modulation spectrum via nmf 1 3

Updating the Modulation Spectrum via NMF(1/3)

  • First, the time sequence x[n] for each utterance in

    the training set is converted to its spectrum x[k] via a 2L point

    DFT. Since the property of conjugate symmetry, only the first L+1 points

    of X[k] is reserved , and their magnitude parts (which are always

    nonnegative) form each column of the data matrix V.

  • Accordingly, if the training set consists of M utterances, then V has M

    columns. Given the data matrix V and a chosen number r , we obtain

    the two nonnegative matrices W and H .

Updating the modulation spectrum via nmf 2 3

Updating the Modulation Spectrum via NMF(2/3)

  • The fixed W comes directly from the previous step, and the encoding

    vector h can be obtained via the updating rule.

  • The vector V is a linearcombinationof the basis vectors involved in W , which is created via the clean utterances. Therefore we expect that the vector V , representing the new magnitude spectrum, can highlight the important information for speech recognition and alleviate the effect of noise from the original V .

  • A2L-point inverse DFT is performed on the new modulation spectrum (with the conjugate symmetric last-half part being appended), which consists of the updated magnitude parts and the original phase parts, to obtain the new time sequence.

Updating the modulation spectrum via nmf 3 3

Updating the Modulation Spectrum via NMF(3/3)

  • The basis spectra vectors of the MFCC c1 (r = 10)

    • Localized and sparse characteristics, which coincide with the fact that NMF often learns a parts-based representation of data

    • Capable of distilling or emphasizing the lower modulation frequency components of the speech features, which contains more speech information

(b) MVN-processed MFCC c1

(a) original MFCC c1

Experimental setup

Experimental Setup

  • Feature type : 39-dimensional MFCC

  • The number of basis vectors, r , is varied from 5 to 20

  • The DFT size : 1024

Experimental results 1 4

Experimental Results (1/4)

Experimental results 2 4

Experimental Results (2/4)

NMF : r = 5 NMF + CMVN : r = 15

Experimental results 3 4

Experimental Results (3/4)

  • The power spectral density (PSD) curves of the feature streams at different signal-to-noise ratios (SNRs):

NMF-processed c1

original c1

  • noise causes significant mismatch in PSD of MFCC

  • NMF reduces the PSD mismatch

Experimental results 4 4

Experimental Results (4/4)

  • The power spectral density (PSD) curves of the feature streams at different signal-to-noise ratios (SNRs):

MVN-processed c1

c1 processed by MVN and NMF

  • MVN reduces the PSD mismatch more in the low frequency region

  • NMF further alleviates the high-frequency PSD mismatch in MVN features

Conclusion and future work

Conclusion and Future Work

  • We have presented a novel use of NMF for deriving noise robust speech features

    • The basis spectra via NMF correspond well with the intuitive notion of the important components of modulation frequency.

    • NMF benefits both the plain and MVN-processed MFCC in recognition accuracy

  • As to future work, we envisage the following two directions:

    • To further process the encoding vector H in the mapping process of NMF to give better recognition accuracy

      (2) To examine if some variants or extensions of NMF, such as probabilistic latent semantic analysis (PLSA), and other compressive sensing methods can further enhance the modulation spectrum

  • Login