Modulation Spectrum Factorization for Robust Speech Recognition Wen-Yi Chu 1 , Jeih-weih Hung 2 and Berlin Chen 1. Presenter : 張庭豪. Outline. Introduction Nonnegative Matrix Factorization (NMF) Updating the Modulation Spectrum via NMF Experimental Setup
modulation frequency components between 1 Hz and 16 Hz,with the
dominant component centering around 4 Hz.
(short and wide)
(tall and thin)
the training set is converted to its spectrum x[k] via a 2L point
DFT. Since the property of conjugate symmetry, only the first L+1 points
of X[k] is reserved , and their magnitude parts (which are always
nonnegative) form each column of the data matrix V.
columns. Given the data matrix V and a chosen number r , we obtain
the two nonnegative matrices W and H .
vector h can be obtained via the updating rule.
(b) MVN-processed MFCC c1
(a) original MFCC c1
NMF : r = 5 NMF + CMVN : r = 15
c1 processed by MVN and NMF
(2) To examine if some variants or extensions of NMF, such as probabilistic latent semantic analysis (PLSA), and other compressive sensing methods can further enhance the modulation spectrum