HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovičova 3, SK-812 19 Bratislava, Slovakia, E-mail: Anna.Pribilova@stuba.sk JiříPŘIBIL Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic Chaberská 57, CZ-182 51 Praha 8, Czech Republic, E-mail: Jiri.Pribil@savba.sk • Introduction • Harmonic speech model with AR parameterization • Spectral modifications for emotional synthesis • Prosodic modifications for emotional synthesis • Listening tests results • Conclusion

Harmonic speech model with AR parameterization voicing transition frequency

Voicing transition frequency

Determination of model parameters spectral flatness measure

Emotional influence on speech formants pleasant emotions – faucal and pharyngeal expansion, relaxation of tract walls, mouth corners retracted upward (F1 falling, resonances raised) unpleasant emotions – faucal and pharyngeal constriction, tensing of vocal tract walls, mouth corners retracted downward (F1 rising, F2 and F3 falling) pleasant emotions F1 falling, resonances raised unpleasant emotions F1 rising, F2 and F3 falling Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256 Male formant areas F1  250 Hz  700 Hz F2  700 Hz  2000 Hz F3  2000 Hz  3200 Hz F4  3200 Hz  4000 Hz Female formant areas (+20%) F1  300 Hz  840 Hz F2  840 Hz  2400 Hz F3  2400 Hz  3840 Hz F4  3840 Hz  4800 Hz 840 Hz 700 Hz 700 Hz 840 Hz Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)

Spectral modifications for emotional synthesis frequency scale transformation

Frequency scale transformation g[-] g [-] F1 ( < F1,2 ) increased (decreased) F2, F3, F4 ( > F1,2) decreased (increased) f [kHz] f [kHz] F1,2 fs/4 f [kHz] fs/4 f [kHz] F1,2

Formant ratio between emotional and neutral speech + 5 % - 30 % joyous + 35 % - 15 % angry + 10 % - 10 % sad joyous + 5.89 % + 3.34 % - 10.18 % - 0.36 % angry + 12.89 % - 11.51 % - 13.77 % - 9.88 % sad + 4.32 % - 6.17 % - 10.09 % - 9.24 %

Prosody of emotional speech Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256 OUR CHOICE OF EMOTIONAL-TO-NEUTRAL RATIOS

Linear trend of F0 at the end of sentences JOY ANGER

Listening tests “Determination of emotion type” – 10 evaluation sets selected randomly from the testing corpus –60 short sentences (1 s  3.5 s) –from the Czech stories – female professional actors – 4 possibilities: “joy”, “anger”, “sadness”, “other” 20 listeners (16 Czechs and 4 Slovaks, 6 women and 14 men) http://www.lef.um.savba.sk/Scripts/itstposl2.dll http://www.lef.um.savba.sk/Scripts/itstposl2.dll MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol

Listening tests http://www.lef.um.savba.sk/Scripts/itstposl2.dll http://www.lef.um.savba.sk/Scripts/itstposl2.dll MS ISAPI/NSAPI DLL script - runs on server PC - communicates with user via HTTP protocol

Listening tests results Confusion matrix Successful determination of emotions (summed for all emotions) * “Vše co potřeboval.” (“All he needed.”) ** “Máš ho mít.” (“You ought to have it.”)

Conclusion Female voice emotional conversion: – harmonic speech model with AR parameterization Spectral modifications: – spectral envelope: formant shift – spectral flatness => voicing transition frequency Prosodic modifications: – energy, duration, F0 mean, range, linear trend at the end of sentences Listening tests: best synthesized: sadness worst synthesized: joy Next research: – inclusion of microprosodic features in emotional voice conversion –modifications of F0 linear trend at the beginning of sentences

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ