cues to emotion language
Skip this Video
Download Presentation
Cues to Emotion: Language

Loading in 2 Seconds...

play fullscreen
1 / 19

Cues to Emotion: Language - PowerPoint PPT Presentation

  • Uploaded on

Cues to Emotion: Language. Suzanne Yuen Monday Oct 5, 2009 COMS 6998 . Overview. Two-Stream Emotion Recognition for Call Center Monitoring Voice Quality and f 0 Cues for Affect Expression: Implications for Synthesis. Two Stream Emotion Recognition for Call Center Monitoring.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
Download Presentation

PowerPoint Slideshow about 'Cues to Emotion: Language' - abram

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
cues to emotion language

Cues to Emotion: Language

Suzanne Yuen

Monday Oct 5, 2009

COMS 6998

  • Two-Stream Emotion Recognition for Call Center Monitoring
  • Voice Quality and f0 Cues for Affect Expression: Implications for Synthesis
two stream emotion recognition for call center monitoring
Two Stream Emotion Recognition for Call Center Monitoring
  • Background: To aid supervisors in the evaluation of agents at call centers*
  • Objective: To present a two stream processing technique to detect strong emotion
  • Previous Work:
    • Fernandez categorized affect into four main components: intonation, loudness, rhythm, and voice quality
    • Yang studied feature selection methods in text categorization and suggested that information gain should be used
    • Petrushin and Yacoub examined agitation and calm states in people-machine interaction

*Typical medium-sized call-center receives about 100,000 calls per day

two stream recognition
Two-Stream Recognition
  • Semantic Stream
  • Performed speech-to-text conversion
  • Text classification algorithms identified phrases such as “pleasure,” “thanks,” “useless,” & “disgusting.”
  • Acoustic Stream
  • Extracted features based on pitch and energy
  • Trained on 900 calls, ~60hrs of speech
  • Vocabulary system of more than 10 000 words
  • TF-IDF scheme = Term Frequency – Inverse Document Frequency
  • Method:
    • Two streams analyzed separately:
      • speech utterance/acoustic features
      • spoken text/semantics/speech recognition of conversation
    • Confidence levels of two streams combined
    • Examined 3 emotions
      • Neutral
      • Hot-anger
      • Happy
  • Tested two data sets:
    • LDC data
    • 20 real-world call-center calls
two stream conclusion
Two Stream - Conclusion
  • Table 2 suggested that two-stream analysis is more accurate than acoustic or semantic alone
  • LDC data recognition significantly higher than real-world data
  • Neutral emotions had less accuracy
  • Combination of two-stream processing showed improvement (~20%) in identification of “happy” and “anger” emotions
  • Low acoustic stream accuracy may be attributed to length of sentences in real-world data. Normal people do not exhibit different emotions significantly in long sentences
  • Gupta analyzed three emotions (happy, neutral, hot-anger): Why break it down into these categories? Implications? Can this technique be applied to a wider range of emotions? For other applications?
  • Speech to text may not translate the complete conversation. Would further examination greatly improve results? What are the pros and cons?
  • Pitch range was from 50-400Hz. Research may not be applicable outside this range. Do you think it necessary to examine other frequencies?
  • In this paper, TF-IDF (Term Frequency – Inverse Document Frequency) technique is used to classify utterances. Accuracy for acoustics only is about 55%. Previous research suggest that alternative techniques may be better. Would implementation better results? What are the pros and cons of using the TF-IDF technique?
voice quality and f 0 cues for affect expression implications for synthesis
Voice Quality and f0 Cues for Affect Expression: Implications for Synthesis
  • Previous work:
    • 1995; Mozziconacci suggested that VQ combined with f0 combined could create affect
    • 2002; Gobl suggested synthesized stimuli with VQ can add affective coloring. Study suggested that “VQ + f0” stimuli is more affective than “f0 only”
    • 2003; Gobl tested VQ with large f0 range. Did not examine contribution of affect-related f0 contours
  • Objective: To examine affects of VQ and f0 on affect expression
voice quality and f 0 cues for affect expression implications for synthesis9
Voice Quality and f0 Cues for Affect Expression: Implications for Synthesis
  • 3 series of stimuli of Sweden utterance – “jaadjo”:
    • Stimuli exemplifying VQ
    • Stimuli with modal voice quality with different affect-related f0 contours
    • Stimuli combining both
  • Tested parameters exemplifying 5 voice quality (VQ):
    • Modal voice
    • Breathy voice
    • Whispery voice
    • Lax-creaky voice
    • Tense voice
  • 15 synthesized stimuli test samples (see Table 1)
what is voice quality phonation gestures
What is Voice Quality? Phonation Gestures
    • Derived from a variety of laryngeal and supralaryngeal features
  • Adductive tension: interarytenoid muscles adduct the arytenoid muscles
  • Medial compression: adductive force on vocal processes- adjustment of ligamental glottis
  • Longitudinal pressure: tension of vocal folds
tense voice
Tense Voice
  • Very strong tension of vocal folds, very high tension in vocal tract
whispery voice
Whispery Voice
  • Very low adductive tension
  • Medial compression moderately high
  • Longitudinal tension moderately high
  • Little or no vocal fold vibration
  • Turbulence generated by friction of air in and above larynx
creaky voice
Creaky Voice
  • Vocal fold vibration at low frequency, irregular
  • Low tension (only ligamental part of glottis vibrates)
  • The vocal folds strongly adducted
  • Longitudinal tension weak
  • Moderately high medial compression
breathy voice
Breathy Voice
  • Tension low
    • Minimal adductive tension
    • Weak medial compression
  • Medium longitudinal vocal fold tension
  • Vocal folds do not come together completely, leading to frication
modal voice
Modal Voice
  • “Neutral” mode
  • Muscular adjustments moderate
  • Vibration of vocal folds periodic, full closing of glottis, no audible friction
  • Frequency of vibration and loudness in low to mid range for conversational speech
voice quality and f 0 cues for affect expression implications for synthesis16
Voice Quality and f0 Cues for Affect Expression: Implications for Synthesis
  • Six sub-tests with 20 native speakers of Hiberno-English.
  • Rated on 12 different affective attributes:
    • Sad – happy
    • Intimate – formal
    • Relaxed – stressed
    • Bored – interested
    • Apologetic – indignant
    • Fearless – scared
  • Participants asked to mark their response on scale



No affective load

voice quality and f 0 test conclusion
Voice Quality and f0 Test: Conclusion
  • Categorized results into 4 groups. No simple one-to-one mapping between quality and affect
  • “Happy” was most difficult to synthesis
  • Suggested that, in addition to f0 ,VQ should be used to synthesis affectively colored speech. VQ appears to be crucial for expressive synthesis
voice quality and f 0 test discussion
Voice Quality and f0 Test: Discussion
  • If the scale is on a 1-7, then 3.5 should be “neutral”; however, most ratings are less than 2. Do the conclusions (see Fig 2) seem strong?
  • In terms of VQ and f0, the groupings in Fig 2 seem to suggest that certain affects are closely related. What are the implications of this? For example, are happy and indignant affects closer than relaxed or formal? Do you agree?
  • Do you consider an intimate voice more “breathy” or “whispery?” Does your intuition agree with the paper?
  • Yanushevskaya found that the VQ accounts for the highest affect ratings overall. How to compare range of voice quality with frequency? Do you think they are comparable? Is there a different way to describe these qualities?