Presenter : 張庭豪

Enhancing the sub-band modulation spectra of speechfeatures via nonnegative matrix factorization for robustspeech recognitionHao-teng Fan, Yi-chang Tsai and Jeih-weih Hung Presenter: 張庭豪

Outline • INTRODUCTION • PROPOSED METHOD • EXPERIMENTAL SETUP • EXPERIMENTAL RESULTS AND DISCUSSIONS • CONCLUDING REMARKS AND FUTURE WORKS

Introduction • NMF does not provide a close-form solution, the aforementioned basis spectra as well as the new modulation spectrum are obtained in an iterative manner, which often requires relative high computation complexity. • Most of the useful linguistic information is encapsulated in the low modulation frequency components approximately within the sub-band. • We propose to promote the computational efficiency of NMF in two directions: One is to use the orthogonal projection in place of the iterative procedures. The other is to process the low-half band, rather than the entire band, of the modulation spectrum.

Proposed Method • Introduction to NMF • The nonnegative matrix factorization (NMF) is a subspace method that approximates data with an additive and linear combination of nonnegative components. Given a nonnegative data matrix , NMF calculates another two nonnegative matrices and such that cost funtion: r: number of basis vectors.(r is often chosen to be fewer than N and M) W : r columns of W are often called basis vectors. H : each column of H is often called an encoding .

Proposed Method

Proposed Method • Initialization: Randomly assign a nonnegative vector as the encoding. • Iteration: Update the encoding vector iteratively, and the relationship between the two encoding vectors during the iteration is: (where the subscript k is used to label the kth component of a vector.) • Termination: When there is no substantial change between the encoding vectors and after L iterations, we stop the iteration process and compute the new magnitude modulation spectrum as which is then combined with the original phase part to obtain the new feature time sequence via inverse discrete Fourier transform (IDFT).

Proposed Method • Using the operation of orthogonal projection to replace the iteration process • The final magnitude spectrum is a linear (additive) combination of the basis spectra vectors contained in the matrix W. In other words, is within the column space of the matrix W, denoted by C(W). Thus, here we propose to obtain the new vector via projecting the original magnitude spectrum on to the space C(W) as follows: B : N * r matrix or : N * 1 matrix 先乘 , 再乘 : 乘法數 2Nr 先乘 , 再乘 : 乘法數 r+ =(r+1) ≈

Proposed Method • Processing the low half-band of the modulation spectrum rather than the full-band • Different modulation frequency components have unequal importance for speech recognition, and the lower frequency components contain more information than higher frequency ones. Accordingly, we propose to focus on dealing with the low half-band portion of the modulation spectrum via NMF. Obviously, compared with the original full-band process, such an operation can reduce the overall computation times by a factor of 2 approximately. 只做上半部(低頻) Matrix V 低頻 N點高頻 M句

Experimental Setup • We conduct the speech recognition on the Aurora-2connected English-digit database. • Each utterance in the clean training set and three testing sets is converted to a 13-dimensional MFCC (c0-c12) sequence. Next, the MFCC features are pre-processed by mean and variance normalization (MVN) to alleviate the noise effect, and then updated by either one of the NMF methods. • Parameters: 1. The number of frequency bins, N, in the full-band modulation spectrum is set to 256. 2. The number of basis spectra derived from the NMF process, r, is set to 10. 3. The number of iterations, L, for the original NMF method to obtain the final encoding vector, is set to 100.

Experimental Setup • For simplicity and clarity, the four NMF methods are denoted respectively. • , the original NMF , which updates the full-band modulation spectrum with iterative processes. • , which updates the full-band modulation spectrum via orthogonal projection. • , which updates the low half-band modulation spectrum via iterative processes. • , which updates the low half-band modulation spectrum via orthogonal projection.

Experimental Results and Discussions • In the first set of evaluation experiments, we compare the computational complexity of the four NMF-based methods, which results are shown :

Introduction • Detailed word recognition accuracy (%) and relative error rate reduction (%), denoted by RR, for different feature types/methods at different Test Sets but averaged over all the noise types and SNR conditions of the Aurora-2 database.

Introduction • Detailed word recognition accuracy (%) for different feature types at different SNR values but averaged over all the noise types in three Test Sets of the Aurora-2 database.

Concluding Remarks And Future Works • In this paper, we present two procedures in order to refine the NMF approach for enhancing the modulation spectra of speech features in noise robustness. • Experimental results reveal that, compared with the original NMF, the resulting new scheme reduces the computational complexity as well as remains very similar recognition performance. • In the future, we plan to reduce the bandwidth of the low sub-band or to update the different sub-bands separately in NMF processing to investigate whether higher computation efficiency and/or better recognition accuracy can be achieved.

Presenter : 張庭豪

Presenter : 張庭豪

Presentation Transcript

Collaborating with III, Yankee Book Peddler and OCLC PromptCat to Maximize Technical Services Efficiencies at Record Loa

Speech and Articulation Screening Test August 5, 2011 Presenter: Jennifer Crookham, MH/Disability Manager

Improving African American Disparity in School Psychology

Medical equipment maintenance

New GASB Fund Balance Standard: Now is the Time to Begin Talking to Your Clients Presenter: John Montoro

Evidence to Support Active Management of Third Stage of Labor (AMTSL) Name of presenter Prevention of Postpartum Hemorr

Introduction to Q10 Pharmaceutical Quality System

Presenter: Tom Clark April 9th, 2009, 3:00 PM EST Please remember to call in to this webinar at: (712) 338-7030, Access

Presenter THOMAS F. KENDZIORSKI, ESQ. Executive Director The Arc of Oakland County, Inc.

Requirements on documentation of API and FPP quality and evaluation process Presenter: Hua YIN Prequalification of Medi

PRESENTER: Assoc. Professor Alauddin Postgraduate Coursework Coordinator

Today’s Presenter

Promoting Positive Partnerships with Parents

Presenter : Min- Chia Chang Advisor : Prof. Jane Hsu Date : 201 1 - 06 -30

Presenter: Sayaka Abe

Government Linked Companies Transformation (GLCT) Programme MALAYSIA Presenter:

Presenter Disclosure Information

Presenter: Lauren Bonilla January 16, 2009, 1:30 PM EST