SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE Jaume Escofet Carmona IDIAP, Martigny, Switzerland UPC, Barcelona, Spain

Contents • Bayesian Networks • Automatic Speech Recognition using Dynamic BNs • Auxiliary variables • Experiments with energy as an auxiliary variable • Conclusions

Joint distribution of V: P(V) = P(vn|parents(vn)) N v2 P n=1 v1 v3 What is a Bayesian Network? • A BN is a type of graphical model composed of: • A directed acyclic graph (DAG) • A set of variables V = {v1,… ,vN} • A set of probability density functions P(vn|parents(vn)) Example: P(V) = P(v1,v2,v3) = P(v1|v2)×P(v2)×P(v3|v2)

T P t=1 Automatic Speech Recognition (ASR) M1: ‘cat’ M2: ‘dog’ … MK: ‘tiger’ LPC, MFCC,... HMM, ANN,... Feature extraction Statistical models Mj X = {x1,… ,xT} Mj = argmax P(Mk|X) = argmax P(X|Mk) × P(Mk) {Mk} {Mk} P(X|Mk) = p(xt|qt) × p(qt|qt-1)

ASR with Dynamic Bayesian Networks phone qt /k/ /a/ /a/ /t/ acoustics xt t = 1 t = 2 t = 3 t = 4 Equivalent to a standard HMM

ASR with Dynamic Bayesian Networks P(qt | qt-1 ) qt-1 qt p(xt|qt=k) ~ Nx(mk,Sk) xt-1 xt

Auxiliary information (1) • Main advantage of BNs: • Flexibility in defining dependencies between variables • Energy damage the system performance if it is appended to the feature vector • BNs allow us to use it in an alternativeway: • Conditioning the emission distributions upon this auxiliary variable • Marginalizing it out in recognition

Auxiliary information (2) The value of at affects the value of xt qt at p(xt | qt=k ,at=z) ~ Nx(mk+Bk×z,Sk) xt

Auxiliary information (3) The value of the auxiliary variable can be influenced by the hidden state qt qt at p(at | qt=k) ~ Na(mak ,Sak) xt p(xt | qt=k,at=z) ~ Nx(mk+Bkz,Sk)

Auxiliary information (4) Equivalent to appending the auxiliary variable to the feature vector qt at p(xt , at |qt=k) ~ Nxa( mkxa, Skxa) xt

Hiding auxiliary information • We can also marginalize out (hide) • the auxiliary variable in recognition • Useful when: • It is noisy • It is not accessible qt at ò p(xt|qt) = p(xt|qt,at)×p(at|qt)dat xt

Experimental setup • Isolated word recognition • Small vocabulary (75 words) • Feature extraction: Mel Frequency Cepstral Coefficients (MFCC) • p(xt|qt) modeled with 4 mixtures of gaussians • p(at|qt) modeled with 1 gaussian

N S E = log s2[n]w2[n] n=1 Baseline Experiments with Energy as an auxiliary variable System 1 WER Observed Energy Hidden Energy System 1 6.9 % 5.3 % System 2 6.1 % 5.6 % System 3 5.8 % 5.9 % Baseline 5.9 % System 2 System 3

Conclusions • BNs are more flexible than HMMs. You can easily: • Change the topology of the distributions • Hide variables when necessary • Energy can improve the system performance if used in a non-traditional way

Questions?

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

Presentation Transcript

Landmark-Based Speech Recognition

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Landmark-Based Speech Recognition: Spectrogram Reading, Support Vector Machines, Dynamic Bayesian Networks, and Phonolog

Variable Elimination for Inference with Bayesian networks

Plan Recognition with Multi-Entity Bayesian Networks

Speech Recognition through Neural Networks

Learning With Bayesian Networks

Reasoning with Bayesian Networks

Articulatory Feature-Based Speech Recognition

Articulatory Feature-Based Speech Recognition

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

A Study on Detection Based Automatic Speech Recognition

Active Learning based on Bayesian Networks

Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach

A Game Based on Speech Recognition

Articulatory Feature-Based Speech Recognition

Bayesian networks Variable Elimination

Articulatory Feature-Based Speech Recognition

Landmark-Based Speech Recognition