Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach

20.4 The value of F Phoneme Acc. (%) 20.3 20.2 : Root node mean vector 0 2 4 6 8 10 T T : Root node 20,000 sentences 20,000 sentences covariance matrix 79 : Variational posterior , 78 77 76 75 0 2 4 6 8 10 P F : Leaf node index Q : Leaf node occupation 74 : Posterior parameters 79 Y N P = + - ΔF F F F • Stop node split if : Leaf node mean vector Q Q Q Q Y N F F Likelihood 78 73 : Observation vectors Evidence Q Q : Hidden variables Phoneme Acc. (%) Phoneme Acc. (%) 77 72 : Model parameters 76 Prior distribution all 71 75 0 20 40 60 80 100 0 20 40 80 100 60 phone Tree Size (%) 20,000 sentences Tree Size (%) 2,500 sentences : Input vectors … /a/ /N/ state : Observation vector 58 … /a/.state[2] /a/.state[4] 54 Posterior distribution : Mean vector Phoneme Acc. (%) 50 : Inverse covariance leaf 46 100 0 20 40 80 60 : Hyperparameters Tree Size (%) 200 sentences Predictive distribution : Number of dimensions Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee and Keiichi Tokuda(Nagoya Institute of Technology) 1. Introduction 3. Variational Bayesian Approach 4. Hyperparameter Estimation 6. Experimental Results Relationships between F and recognition Acc. • Estimating appropriate hyperparameters • Maximize F w.r.t. hyperparameters • Variational Bayes [Attias;1999] • Approximate posterior distributions • by variational method • Define a lower bound on the log-likelihood • Recent speech recognition systems • ML（Maximum Likelihood） criterion • ⇒ Reduce the estimation accuracy • MDL（Minimum Description Length） criterion • ⇒ Based on an asymptotic approximation Conventional • Using monophone HMM state statistics • ⇒ Maximizing F at the root node • Variational Bayesian（VB） approach • Higher generalization ability • Appropriate model structures can be selected • Performance depends on hyperparameters ⇒ • F and recognition accuracy • behaved similarly Objective Proposed • Proposed technique gives • consistent improvement at the value of F Estimate hyperparameters maximizing marginal likelihood ⇒ Maximize F w.r.t. variational posteriors • Using the statistics of all leaf nodes • ⇒ Maximizing F of the tree structure • If prior distributions have tying structure • ⇒ F is good for model selection Context Clustering based on Variational Bayes [Watanabe et al. ;2002] • Otherwise • ⇒ F increases monotonically as T increases ⇒ 2. Bayesian Framework Maximize F w.r.t. variational posteriors Relationships between tying structure and the amount of training data Q : Phonetic question Yes No Tying structure of prior distributions Consider four kinds of tying structure • Use a conjugate prior distribution • Output probability distribution • ⇒ Gaussian distribution • Conjugate prior distribution • ⇒ Gauss-Wishart distribution Based on the posterior distributions Model parameters are regarded as probabilistic variables 5. Experimental Conditions • The VB clustering with appropriate prior • distribution improves the recognition performance • Advantages • Prior knowledge can be integrated • Model structure can be selected • Robust classification • Likelihood function • ⇒ Proportional to a Gauss-Wishart distribution • Appropriate tying structure of prior distributions • ⇒ Depend on the amount of training data • Define new hyperparameter T representing • the amount of prior data ・ Large training data set ⇒ Tying few prior distributions • Disadvantage • Include integral and expectation calculations • ⇒ Effective approximation technique is required ・ Small training data set ⇒ Tying many prior distributions

Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach

Hyperparameter Estimation for Speech Recognition Based on Variational Bayesian Approach

Presentation Transcript

Landmark-Based Speech Recognition

Soft Margin Estimation for Speech Recognition

VIBES Variational Inference Engine For Bayesian Networks

Variational Bayesian Methods for Audio Indexing

A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines

Articulatory Feature-Based Speech Recognition

Face Recognition Based on 3D Shape Estimation

Articulatory Feature-Based Speech Recognition

RADIOSONDE TEMPERATURE BIAS ESTIMATION USING A VARIATIONAL APPROACH

Bayesian Predictive Classification With Incremental Learning for Noisy Speech Recognition

A Study on Detection Based Automatic Speech Recognition

A Bayesian Approach for Transformation Estimation

A Bayesian Approach to HMM-Based Speech Synthesis

A Game Based on Speech Recognition

Articulatory Feature-Based Speech Recognition

A Sparse Modeling Approach to Speech Recognition Based on Relevance Vector Machines

Variational Bayesian Inference for fMRI time series

Articulatory Feature-Based Speech Recognition

Landmark-Based Speech Recognition

SPEECH RECOGNITION BASED ON BAYESIAN NETWORKS WITH ENERGY AS AN AUXILIARY VARIABLE

Landmark-Based Speech Recognition