170 likes | 347 Views
Using Linear Interpolation to Improve Histogram Equalization for Speech Recognition Filipp Korkmazsky, Dominique Fohr, Irina Illina LORIA, France, ICSLP 2004. Presented by Chen-Wei Liu 2004/11/10. Outline. Introduction The Effect of Noise What ’ s Interpolated HEQ?
E N D
Using Linear Interpolation to Improve Histogram Equalization for Speech RecognitionFilipp Korkmazsky, Dominique Fohr, Irina IllinaLORIA, France, ICSLP 2004 Presented by Chen-Wei Liu 2004/11/10
Outline • Introduction • The Effect of Noise • What’s Interpolated HEQ? • Linear Interpolation for HEQ • Experimental Results • Observations • Conclusions
Introduction (1/4) • Histogram equalization is a signal normalization technique • Adjusts statistical parameters of the test data • Let CDF for test data match a target CDF for training data • Positive results of HEQ were achieved either along or in combination with other normalization methods • Ex. CMS, CN, SS
Introduction (3/4) • Lately, new approach was suggested • Instead of considering a single target histogram, two target histograms for silence and speech are estimated • Then, an adapted target histogram is computed for each speaker • By linear interpolation between speech and silence histograms • With a weighted coefficient “a” for silence histogram and “1-a” for speech histogram
Introduction (4/4) • Coefficient “a” is estimated for each test speaker separately as a fraction of silence frames in the speaker data • This is speaker-wise • It’s assumed that the global statistics for the test data doesn’t change rapidly from one test sentence to another • So interpolation can be used to combine local statistics estimated for the test sentence I and global statistics for the test sentences that precede this sentence
Linear Interpolation for HEQ (1/5) • It’s assumed that the environmental conditions don’t change rapidly from one test sentence to another • Combine information form multiple test sentences could provide more accurate estimation of the test conditions
Linear Interpolation for HEQ (2/5) • For histogram equalization we match interpolated testhistogram against a target histogram • is estimated by using all frames of training data • It was proposed using a linear interpolation to get a unique target histogram for each test sentence • By separating silence frames from the speech frames • Forced alignment for training ; two-pass for testing
Linear Interpolation for HEQ (3/5) • According to the ratio of silence/speech, a target cumulative histogram for this sentence can be estimated as follows
Linear Interpolation for HEQ (4/5) • Interpolated test histogram for the test sentence i is estimated as follows
Linear Interpolation for HEQ (5/5) • For example, frame F in dimension D in sentence S Global Silence Histogram Sentence 1 to S-1 Local Frame F Global SpeechHistogram Sentence 1 to S-1 Local Frame F
Experiments (1/4) • The experiments were conducted on VODIS • Speech data was conducted in a moving car • Recording from 200 speakers • 22600 clean sentences for training • 6100 noise sentences for testing • CMS was used to normalize all training and testing data • Each of 39 phones was modeled by a 3-state HMM • Each state was represented by 32 Gaussian mixtures
Experiments (2/4) • Interpolation of HEQ with only one kind of frame
Experiments (3/4) • Interpolation of HEQ with silence/speech frame
Experiments (4/4) • Interpolation of HEQ with multiple speech classes
Observations • Using interpolation for HEQ • Always gives better recognition results • Better results are obtained • When a smaller number of classes is used • When interpolation parameter sets to zero • Larger number of classes leads to better performance • Only a small amount of data is available for test histogram estimation
Conclusions • It was found that • Weighted interpolation between histograms of a test sentence and past test sentences improved speech recognition performance • 49.42% 48.59% 44.85% • It’s not sure that • More classes of speech obtain more improvements • Possible ways of weighted factor • Could be a subject of future research