1 / 20

Basics of HMM-based speech synthesis

Basics of HMM-based speech synthesis. Contents Speech Parameter Generation Using Dynamic Features Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis Multi-Space Probability Distribution HMM

nbrescia
Download Presentation

Basics of HMM-based speech synthesis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Basics of HMM-based speech synthesis • Contents • Speech Parameter Generation Using Dynamic Features • Speech Parameter Generation Algorithms for HMM-Based Speech Synthesis • Multi-Space Probability Distribution HMM • Simultaneous Modelling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  2. Speech Parameter Generation using Dynamic Features (1) • 1. Problem vector sequence of speech parameter state sequence of an HMM λ vector of speech parameter at time t ct … static feature vector (e.g. cepstral coefficient) Δct … dynamic feature vector (e.g. delta cepstral coefficient) • Determine parameter sequence c that maximizes Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  3. Speech Parameter Generation using Dynamic Features (2) • 2. Solution to the Problem • For a given q, maximizing ,with respect to c is equivalent to maximizing , with respect to c, since does not depend on O. • … single mixture Gaussian distribution • By setting to maximize we obtain a set of equations • with Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  4. Speech Parameter Generation using Dynamic Features (3) • Solution to the Problem … mean vector of ct … covariance matrix of ct … mean vector of Δct … covariance matrix of Δ ct • To obtain optimal q and c the set of equations has to be solved for every possible state sequence Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  5. Speech Parameter Generation using Dynamic Features (3) • Solution to the Problem Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  6. Speech Parameter Generation using Dynamic Features (4) • Example Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [2] [1], [2]

  7. Speech Parameter Generation Algorithms (1) • Previously: state sequence was assumed to be known • Now: (part of) state sequence is hidden • Goal: determine speech parameter vector sequence • so that is • maximized with respect to O, • where is the state and mixture sequence, i.e. (q, i) indicates the i-th mixture of state q •  … speech parameter vector Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  8. Speech Parameter Generation Algorithms (2) • Derived Algorithm • Based on EM-Algorithm • Find critical point of the likelihood function P(O| λ) … auxiliary function of current and new parameter vector sequence O and O’, respectively Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration … occupancy probability

  9. Speech Parameter Generation Algorithms (3) • Derived Algorithm  Under condition O’=WC’,C’ which maximizes Q(O,O’) is given by  Solved with recursive algorithm for dealing with dynamic features Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  10. Speech Parameter Generation Algorithms (4) • Example • 5-state left-to-right HMMs • Speech sampled at 16 kHz & windowed by 25.6ms Blackman window with 5ms shift • Mel-cepstral coefficients obtained by mel-cepstral analysis • Sentence fragment “kiNzokuhiroo” Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [3]

  11. Multi-Space Probability Distribution HMM (1) • Problem:We cannot apply both the conventional and continuous HMMs to observations which consist of continuous values and discrete symbols [4] Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration Observation o1 consists of a set of space indices X1{1,2,G} and a 3D vector x1 R³ x1 is drawn from one of three spaces Ω1, Ω2, Ω3 R³ Pdf for x1 : w1N1(x)+w2N2(x)+wGNG(x) [4]

  12. Sample space Ω Multi-Space Probability Distribution HMM (2) • Example: A man fishing in a pond Ω1:2D space for length and height of red fish Ω2: 2D space for length and height of blue fish Ω3: 1D space for diameters of tortoises Ω4: 0D space for articles of junk Weights w1 to w4 are determined by ratio of blue and red fish, tortoises and junk in the pond N1, N2: 2D pdfs for sizes and heights of fish N3: 1D pdf for diameter of tortoise Man catches red fish: observation o=({1},x) is made Same by night: o=({1,2},x) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration Ω1=R² Ω2=R² Ω3=R¹ Ω4=Rº 4

  13. Multi-Space Probability Distribution HMM (3) • Algorithm • Each state i has G pdfs NiG and their • weights wiG • The observation probability of O, P(O| λ) • is calculated with the forward-backward • algorithm • Then, we need to maximize the • observation likelihood of P(O| λ) over all parameters • reestimation formulas for the maximum likelihood estimation are calculated in analogy to the Baum-Welch-Algorithm Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [4]

  14. Simultaneous modelling of Spectrum, Pitch and Duration (1) • Simultaneous Modelling • Pitch patterns modelled by MSD-HMM •  observation sequence composed of continuous values and discrete symbols • Spectrum modelled by continuous probability distribution • State duration densities modelled by single Gaussian distributions (dimension of SDD equal to number of HMM states) Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]

  15. Simultaneous modelling of Spectrum, Pitch and Duration (2) • Context Dependent Model • Many contextual factors taken into account such as: • Position of breath group in sentence • Position of current phoneme in current accentual phrase • {preceding, current succeeding} part of speech • {preceding, current succeeding} phoneme • (etc.) • Decision-tree based context clustering applied because • As contextual factors increase, their combinations increase exponentially  model parameters cannot be estimated with sufficient accuracy by limited training data • Impossible to prepare speech database which includes all combinations of contextual factors Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration

  16. Simultaneous modelling of Spectrum, Pitch and Duration (3) • Context Dependent Model spectrum, pitch and duration have their own influential contextual factors  distributions of the respective parameters are clustered independently Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]

  17. Simultaneous modelling of Spectrum, Pitch and Duration (4) •  arbitrary text converted to context-based label sequence •  sentence HMM constructed according to label sequence •  State durations determined so as to maximize likelihood of state duration densities •  Sequence of mel-cepstrum coefficients and pitch values generated •  Speech synthesized directly by MLSA filter 3. Text-to-Speech Synthesis System Contents Speech Parameter Generation - Dynamic Features - Algorithms Multi-Space Probability Distribution HMM Simultaneous modelling of Spectrum, Pitch and Duration [5]

  18. References • [1] K. Tokuda, T. Kobayashi and S. Imai, Speech Parameter Generation from HMM using Dynamic Features, Proc. ICASSP, 1995 • [2] http://hts.sp.nitech.ac.jp/?Publications – see ‘Attach file’  tokuda_TTSworkshop2002.pdf • [3] Tokuda, Yoshimura, Masuko, Kobayashi, Kitamura, Speech Parameter Generation Algortihms for HMM-based Speech Synthesis, ICASSP, 2000 • [4] Tokuda, Masuko, Miyazaki, Kobayashi, Multi-Space Probability Distribution HMM, IEICE Trans. Inf. & Syst., 2000

  19. References • [5] Yoshimura, Tokuda, Masuko, Kobayashi, Kitamura, Simultaneous modeling of Spectrum, Pitch and Duration in HMM-based Speech Synthesis, Proc EU-ROSPEECH, 1999

  20. Thanks for the attention

More Related