1 / 14

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovi čova 3, SK- 812 19 Bratislava, Slovakia, E-mail: Anna.Pribilova@stuba.sk Jiří PŘIBIL

oberon
Download Presentation

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812 19 Bratislava, Slovakia, E-mail: Anna.Pribilova@stuba.sk JiříPŘIBIL Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic Chaberská 57, CZ-182 51 Praha 8, Czech Republic, E-mail: Jiri.Pribil@savba.sk • Introduction • Harmonic speech model with AR parameterization • Spectral modifications for emotional synthesis • Prosodic modifications for emotional synthesis • Listening tests results • Conclusion

  2. Harmonic speech model with AR parameterization voicing transition frequency

  3. Voicing transition frequency

  4. Determination of model parameters spectral flatness measure

  5. Emotional influence on speech formants pleasant emotions – faucal and pharyngeal expansion, relaxation of tract walls, mouth corners retracted upward (F1 falling, resonances raised) unpleasant emotions – faucal and pharyngeal constriction, tensing of vocal tract walls, mouth corners retracted downward (F1 rising, F2 and F3 falling) pleasant emotions F1 falling, resonances raised unpleasant emotions F1 rising, F2 and F3 falling Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256 Male formant areas F1  250 Hz  700 Hz F2  700 Hz  2000 Hz F3  2000 Hz  3200 Hz F4  3200 Hz  4000 Hz Female formant areas (+20%) F1  300 Hz  840 Hz F2  840 Hz  2400 Hz F3  2400 Hz  3840 Hz F4  3840 Hz  4800 Hz 840 Hz 700 Hz 700 Hz 840 Hz Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)

  6. Spectral modifications for emotional synthesis frequency scale transformation

  7. Frequency scale transformation g[-] g [-] F1 ( < F1,2 ) increased (decreased) F2, F3, F4 ( > F1,2) decreased (increased) f [kHz] f [kHz] F1,2 fs/4 f [kHz] fs/4 f [kHz] F1,2

  8. Formant ratio between emotional and neutral speech + 5 % - 30 % joyous + 35 % - 15 % angry + 10 % - 10 % sad joyous + 5.89 % + 3.34 % - 10.18 % - 0.36 % angry + 12.89 % - 11.51 % - 13.77 % - 9.88 % sad + 4.32 % - 6.17 % - 10.09 % - 9.24 %

  9. Prosody of emotional speech Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256 OUR CHOICE OF EMOTIONAL-TO-NEUTRAL RATIOS

  10. Linear trend of F0 at the end of sentences JOY ANGER

  11. Listening tests “Determination of emotion type” – 10 evaluation sets selected randomly from the testing corpus –60 short sentences (1 s  3.5 s) –from the Czech stories – female professional actors – 4 possibilities: “joy”, “anger”, “sadness”, “other” 20 listeners (16 Czechs and 4 Slovaks, 6 women and 14 men) http://www.lef.um.savba.sk/Scripts/itstposl2.dll http://www.lef.um.savba.sk/Scripts/itstposl2.dll MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol

  12. Listening tests http://www.lef.um.savba.sk/Scripts/itstposl2.dll http://www.lef.um.savba.sk/Scripts/itstposl2.dll MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol

  13. Listening tests results Confusion matrix Successful determination of emotions (summed for all emotions) * “Vše co potřeboval.” (“All he needed.”) ** “Máš ho mít.” (“You ought to have it.”)

  14. Conclusion Female voice emotional conversion: – harmonic speech model with AR parameterization Spectral modifications: – spectral envelope: formant shift – spectral flatness => voicing transition frequency Prosodic modifications: – energy, duration, F0 mean, range, linear trend at the end of sentences Listening tests: best synthesized: sadness worst synthesized: joy Next research: – inclusion of microprosodic features in emotional voice conversion –modifications of F0 linear trend at the beginning of sentences

More Related