1 / 22

Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)

Modeling and Generation of Accentual Phrase F 0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions. Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Koji Iwano (currently with Tokyo Institute of Technology, Japan)

jaxon
Download Presentation

Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Modeling and Generation of Accentual Phrase F0 Contours Based on Discrete HMMs Synchronized at Mora-Unit Transitions Atsuhiro Sakurai (Texas Instruments Japan, Tsukuba R&D Center) Koji Iwano (currently with Tokyo Institute of Technology, Japan) Keikichi Hirose (Dep. of Frontier Eng., The University of Tokyo, Japan)

  2. Introduction to Corpus-Based Intonation Modeling • Traditional approach: rules derived from linguistic expertise Human-dependent (too complicated and not satisfactory, because the phenomena involved are not completely understood) • Corpus-based approach: modeling derived from statistical analysis of speech corpora Automatic (potential to improve as better speech corpora become available)

  3. Background • HMMs are widely used in speech recognition, and fast learning algorithms exist • Macroscopic discrete HMMs associated to accentual phrases can store information such as accent type and prosodic structure • Morae are extremely important to describe Japanese intonation - sequences of high and low mora can characterize accent types

  4. Overview of the Method • Definition of HMM and alphabet: • Accent types modeled by discrete HMMs • 2-code mora F0 contour alphabet used as output symbols • State transitions sychronized with mora transitions • Classification of HMMs and training: • HMMs classified according to linguistic attributes • Training by usual FB algorithm • Generation of F0 contours: • Best sequence of symbols generated by a modified Viterbi algorithm

  5. The Mora-F0 Alphabet • Two codes: stylized mora F0 contours and mora-to-mora F0: 34 symbols each • Obtained by LBG clustering from a 500-sentence database (ATR continuous speech database, speaker MHT) • All the database is labeled using the 2-code symbols.

  6. The Accentual Phrase HMM • Accentual phrases are classified according to: • Accent type • Position of accentual phrase in the sentence • (Optional: number of morae, part-of-speech, syntactic structure) State transition Mora transition HMM Accentual phrase

  7. Example: Example: ‘Karewa Tookyookara kuru. (He comes from Tokyo) Label sequence Accent type Position [],[],[] 1 M1: 1 0 [],[],[],[],[],[] M2: 2 M3: 1 3 [],[] shape1 F01 shape2 F02 ,

  8. HMM Topologies (a) Accent types 0 and 1 (a) Other accent types

  9. Training Database • ATR Continuous Speech Database (500 sentences, speaker MHT) • Segmented in mora and accentual phrases • Mora labels using the mora-F0 alphabet: shape (stylized F0 contour), mora F0. • Accentual phrase labels: number of morae, position in the sentence

  10. Output Code Generation How to use the HMM for synthesis? A) Recognition Likelihood Best path 1 output sequence B) Synthesis Best output sequence Best path

  11. Intonation Modeling Using HMM Viterbi Search for the Recognition Problem: for t=2,3,...,T for it=1,2,...,S Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)] +[-log b(y(t)| it)]} (t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)] +[-log b(y(t)| it)]} next it next t

  12. Intonation Modeling Using HMM Modified Viterbi Search for the Synthesis Problem: for t=2,3,...,T for it=1,2,...,S Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)] +[-log b(ymax(t)| it)]} (t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)] +[-log b(ymax(t)| it)]} next it next t

  13. Use of Bigram Probabilities for t=2,3,...,T for it=1,2,...,S Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)] +maxk{[-log b(y(t)| y(t-1),it)]}} (t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)] +maxk{[-log b(y(t)| y(t-1),it)]}} next it next t k=1,…,K (dimension of y) k=1,…,K (dimension of y)

  14. Accent Type Modeling Using HMM

  15. Phrase Boundary Level Modeling Using HMM Pause Y/N J-TOBI B.I. Bound. Level 3 3 2 Y N N 1 2 3

  16. PH1_0.original PH1_0.bigram The Effect of Bigrams PH1_1.original PH1_1.bigram PH1_2.original PH1_2.bigram

  17. Comments • We presented a novel approach to intonation modeling for TTS synthesis based on discrete mora-synchronous HMMs. • For now on, more features should be included in the HMM modeling (phonetic context, part-of-speech, etc.), and the approach should be compared to rule-based methods. • Training data scarcity is a major problem to overcome (by feature clustering, an F0 contour generation model, etc.)

  18. a11 a22 a33 a44 a13 a12 a23 a34 2 4 3 1 b(1|3)~b(K|3) b(1|1)~b(K|1) b(1|2)~b(K|2) b(1|4)~b(K|4) Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2, ..., K

  19. ステップ1:データベース作成 • ATRの連続音声データベースを使用(500文,話者MHT) • モーラ単位に分割 • モーララベルの付与 • F0パターンを抽出 • LBG法によるクラスタリング • 全データベースにクラスタクラスを付与

  20. Bigramの導入 for t=2,3,...,T for it=1,2,...,S Dmin(t, it) = min(it-1){Dmin(t-1, it-1) + [-log a(it| it-1)] +maxk{[-log b(y(t)| y(t-1),it)]}} (t, it) =argmin(it-1){Dmin(t-1, it-1)+[-log a(it| it-1)] +maxk{[-log b(y(t)| y(t-1),it)]}} next it next t k=1,…,K (dimension of y) k=1,…,K (dimension of y)

  21. 考察・今後の展望 • 学習データが少ない • TTSシステムへの組込みにはさらなる工夫が必要 他の言語情報を考慮(音素、モーラ数、品詞等) データ不足を克服するための工夫(クラスタリング等) モデルの接続に関する検討

  22. a11 a22 a33 a44 a13 a12 a23 a34 2 4 3 1 b(1|3)~b(K|3) b(1|1)~b(K|1) b(1|2)~b(K|2) b(1|4)~b(K|4) Hidden Markov Models (HMM) A Hidden Markov Model (HMM) is a Finite State Automaton where both state transitions and outputs are stochastic. It changes to a new state each time period, generating a new vector according to the output distribution of that state. Symbols: 1,2, ..., K

More Related