1 / 20

A Bayesian Approach to HMM-Based Speech Synthesis

A Bayesian Approach to HMM-Based Speech Synthesis. 1. 1. 1. 2. 1. Kei Hashimoto , Heiga Zen , Yoshihiko Nankaku , Takashi Masuko , and Keiichi Tokuda Nagoya Institute of Technology Tokyo Institute of Technology. 1. 2. Background. HMM-based speech synthesis system

Download Presentation

A Bayesian Approach to HMM-Based Speech Synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Bayesian Approach to HMM-Based Speech Synthesis 1 1 1 2 1 Kei Hashimoto , Heiga Zen , Yoshihiko Nankaku , Takashi Masuko , and Keiichi Tokuda Nagoya Institute of Technology Tokyo Institute of Technology 1 2

  2. Background • HMM-based speech synthesis system • Spectrum, excitation and duration are modeled • Speech parameter seqs. are generated • Maximum likelihood (ML) criterion • Train HMMs and generate speech parameters • Point estimate ⇒ The over-fitting problem • Bayesian approach • Estimate posterior dist. of model parameters • Prior information can be use ⇒ Alleviate the over-fitting problem

  3. Outline • Bayesian speech synthesis • Variational Bayesian method • Speech parameter generation • Bayesian context clustering • Prior distribution using cross validation • Experiments • Conclusion & Future work

  4. Bayesian speech synthesis (1/2) Model training and speech synthesis ML Bayes : Model parameters : Synthesis data seq. : Label seq. for synthesis : Label seq. for training : Training data seq.

  5. Bayesian speech synthesis (2/2) Predictive distribution (marginal likelihood) : HMM state seq. for synthesis data : HMM state seq. for training data : Likelihood of synthesis data : Likelihood of training data : Prior distribution for model parameters Variational Bayesian method [Attias; ’99]

  6. Variational Bayesian method(1/2) Estimate approximate posterior dist. ⇒ Maximize a lower bound : Expectation w.r.t. (Jensen’s inequality) : Approximate distribution of the true posterior distribution

  7. Variational Bayesian method(2/2) • Random variables are statistically independent • Optimal posterior distributions : normalization terms Iterative updates as the EM algorithm

  8. Approximation for speech synthesis • is dependent on synthesis data ⇒ Huge computational cost in the synthesis part • Ignore the dependency of synthesis data ⇒ Estimation from only training data

  9. Prior distribution :Covariance of prior data • Conjugate prior distribution ⇒ Posterior dist. becomes a same family of dist. with prior dist. • Determination using statistics of prior data Likelihood function Conjugate prior distribution :# of prior data : Dimension of feature :Mean of prior data

  10. Speech parameter generation • Speech parameter Consist of static and dynamic features ⇒ Only static feature seq. is generated • Speech parameter generation based on Bayesian approach ⇒ Maximize the lower bound

  11. Relation between Bayes and ML Compare with the ML criterion • Use of expectations of model parameters • Can be solved by the same fashion of ML Output dist. ⇒ ML ⇒ Bayes

  12. Outline • Bayesian speech synthesis • Variational Bayesian method • Speech parameter generation • Bayesian context clustering • Prior distribution using cross validation • Experiments • Conclusion & Future work

  13. Bayesian context clustering :Is this phoneme a vowel? Gain of Select question Stopping condition Context clustering based on maximizing yes no ⇒ Split node based on gain

  14. Impact of prior distribution • Affect model selection as tuning parameters ⇒ Require determination technique of prior dist. • Conventional: maximize the marginal likelihood • Lead to the over-fitting problem as the ML • Tuning parameters are still required • Determination technique of prior distribution using cross validation [Hashimoto; ’08]

  15. Bayesian approach using CV Training data is randomly dividedinto K groups 2,3 1,3 1,2 Posterior dist. Calculate likelihood Prior distribution based on Cross Validation Cross valid prior dist.

  16. Outline • Bayesian speech synthesis • Variational Bayesian method • Speech parameter generation • Bayesian context clustering • Prior distribution using cross validation • Experiments • Conclusion & Future work

  17. Experimental conditions(1/2)

  18. Experimental conditions (2/2) • Compared approach • Mean Opinion Score (MOS) test • Subjects were 10 Japanese students • 20 sentences were chosen at random

  19. Subjective listening test Mean opinion score 2,491 25,911 2,553 27,106

  20. Conclusions and future work • A new framework based on Bayesian approach • All processes are derived from a single predictive distribution • Improve the naturalness of synthesized speech • Future work • Introduce HSMM instead of HMM • Investigate the relation between the speech quality and model structures

More Related