HARMONIC MODEL
Download
1 / 14

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ - PowerPoint PPT Presentation


  • 70 Views
  • Uploaded on

HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ Department of Radioelectronics, Slovak University of Technology Ilkovi čova 3, SK- 812 19 Bratislava, Slovakia, E-mail: [email protected] Jiří PŘIBIL

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' HARMONIC MODEL FOR FEMALE VOICE EMOTIONAL SYNTHESIS Anna PŘIBILOVÁ' - oberon


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

HARMONIC MODEL

FOR FEMALE VOICE EMOTIONAL SYNTHESIS

Anna PŘIBILOVÁ

Department of Radioelectronics, Slovak University of Technology

Ilkovičova 3, SK-812 19 Bratislava, Slovakia, E-mail: [email protected]

JiříPŘIBIL

Institute of Photonics and Electronics, Academy of Sciences of the Czech Republic

Chaberská 57, CZ-182 51 Praha 8, Czech Republic, E-mail: [email protected]

• Introduction

• Harmonic speech model with AR parameterization

• Spectral modifications for emotional synthesis

• Prosodic modifications for emotional synthesis

• Listening tests results

• Conclusion




Determination of model parameters

spectral flatness measure


Emotional influence on speech formants

pleasant emotions – faucal and pharyngeal expansion, relaxation of tract walls, mouth corners retracted upward (F1 falling, resonances raised)

unpleasant emotions – faucal and pharyngeal constriction, tensing of vocal tract walls, mouth corners retracted downward

(F1 rising, F2 and F3 falling)

pleasant emotions

F1 falling, resonances raised

unpleasant emotions

F1 rising, F2 and F3 falling

Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256

Male formant areas

F1  250 Hz  700 Hz

F2  700 Hz  2000 Hz

F3  2000 Hz  3200 Hz

F4  3200 Hz  4000 Hz

Female formant areas (+20%)

F1  300 Hz  840 Hz

F2  840 Hz  2400 Hz

F3  2400 Hz  3840 Hz

F4  3840 Hz  4800 Hz

840 Hz

700 Hz

700 Hz

840 Hz

Fant, G.: Speech Acoustics and Phonetics. Kluwer Academic Publishers, Dordrecht (2004)


Spectral modifications for emotional synthesis

frequency

scale

transformation


Frequency scale transformation

g[-]

g [-]

F1

( < F1,2 )

increased

(decreased)

F2, F3, F4

( > F1,2)

decreased

(increased)

f [kHz]

f [kHz]

F1,2

fs/4

f [kHz]

fs/4

f [kHz]

F1,2


Formant ratio between emotional and neutral speech

+ 5 %

- 30 %

joyous

+ 35 %

- 15 %

angry

+ 10 %

- 10 %

sad

joyous

+ 5.89 %

+ 3.34 %

- 10.18 %

- 0.36 %

angry

+ 12.89 %

- 11.51 %

- 13.77 %

- 9.88 %

sad

+ 4.32 %

- 6.17 %

- 10.09 %

- 9.24 %


Prosody of emotional speech

Scherer, K., R.: Vocal Communication of Emotion: A Review of Research Paradigms. Speech Communication, Vol. 40 (2003) 227-256

OUR CHOICE OF EMOTIONAL-TO-NEUTRAL RATIOS



Listening tests

“Determination of emotion type”

– 10 evaluation sets selected randomly from the testing corpus

–60 short sentences (1 s  3.5 s)

–from the Czech stories

– female professional actors

– 4 possibilities: “joy”, “anger”, “sadness”, “other”

20 listeners (16 Czechs and 4 Slovaks, 6 women and 14 men)

http://www.lef.um.savba.sk/Scripts/itstposl2.dll

http://www.lef.um.savba.sk/Scripts/itstposl2.dll

MS ISAPI/NSAPI DLL script

- runs on server PC

- communicates with user via HTTP protocol


Listening tests

http://www.lef.um.savba.sk/Scripts/itstposl2.dll

http://www.lef.um.savba.sk/Scripts/itstposl2.dll

MS ISAPI/NSAPI DLL script

- runs on server PC

- communicates with user via HTTP protocol


Listening tests results

Confusion matrix

Successful determination of emotions (summed for all emotions)

* “Vše co potřeboval.” (“All he needed.”)

** “Máš ho mít.” (“You ought to have it.”)


Conclusion

Female voice emotional conversion:

– harmonic speech model with AR parameterization

Spectral modifications:

– spectral envelope: formant shift

– spectral flatness => voicing transition frequency

Prosodic modifications:

– energy, duration, F0 mean, range, linear trend at the end of sentences

Listening tests:

best synthesized: sadness

worst synthesized: joy

Next research:

– inclusion of microprosodic features in emotional voice conversion

–modifications of F0 linear trend at the beginning of sentences


ad