150 likes | 172 Views
Explore how energy as an auxiliary variable in Bayesian Networks can improve speech recognition. Understand Dynamic BNs, experiments, and results.
E N D
SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain
Contents • Bayesian Networks • Automatic Speech Recognition using Dynamic BNs • Auxiliary variables • Experiments with energy as an auxiliary variable • Conclusions
Joint distribution of V: P(V) = P(vn|parents(vn)) N v2 P n=1 v1 v3 What is a Bayesian Network? • A BN is a type of graphical model composed of: • A directed acyclic graph (DAG) • A set of variables V = {v1,… ,vN} • A set of probability density functions P(vn|parents(vn)) Example: P(V) = P(v1,v2,v3) = P(v1|v2)×P(v2)×P(v3|v2)
T P t=1 Automatic Speech Recognition (ASR) M1: ‘cat’ M2: ‘dog’ … MK: ‘tiger’ LPC, MFCC,... HMM, ANN,... Feature extraction Statistical models Mj X = {x1,… ,xT} Mj = argmax P(Mk|X) = argmax P(X|Mk) × P(Mk) {Mk} {Mk} P(X|Mk) = p(xt|qt) × p(qt|qt-1)
ASR with Dynamic Bayesian Networks phone qt /k/ /a/ /a/ /t/ acoustics xt t = 1 t = 2 t = 3 t = 4 Equivalent to a standard HMM
ASR with Dynamic Bayesian Networks P(qt | qt-1 ) qt-1 qt p(xt|qt=k) ~ Nx(mk,Sk) xt-1 xt
Auxiliary information (1) • Main advantage of BNs: • Flexibility in defining dependencies between variables • Energy damage the system performance if it is appended to the feature vector • BNs allow us to use it in an alternativeway: • Conditioning the emission distributions upon this auxiliary variable • Marginalizing it out in recognition
Auxiliary information (2) The value of at affects the value of xt qt at p(xt | qt=k ,at=z) ~ Nx(mk+Bk×z,Sk) xt
Auxiliary information (3) The value of the auxiliary variable can be influenced by the hidden state qt qt at p(at | qt=k) ~ Na(mak ,Sak) xt p(xt | qt=k,at=z) ~ Nx(mk+Bkz,Sk)
Auxiliary information (4) Equivalent to appending the auxiliary variable to the feature vector qt at p(xt , at |qt=k) ~ Nxa( mkxa, Skxa) xt
Hiding auxiliary information • We can also marginalize out (hide) • the auxiliary variable in recognition • Useful when: • It is noisy • It is not accessible qt at ò p(xt|qt) = p(xt|qt,at)×p(at|qt)dat xt
Experimental setup • Isolated word recognition • Small vocabulary (75 words) • Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) • p(xt|qt) modeled with 4 mixtures of gaussians • p(at|qt) modeled with 1 gaussian
N S E = log s2[n]w2[n] n=1 Baseline Experiments with Energy as an auxiliary variable System 1 WER Observed Energy Hidden Energy System 1 6.9 % 5.3 % System 2 6.1 % 5.6 % System 3 5.8 % 5.9 % Baseline 5.9 % System 2 System 3
Conclusions • BNs are more flexible than HMMs. You can easily: • Change the topology of the distributions • Hide variables when necessary • Energy can improve the system performance if used in a non-traditional way