emotional speech l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Emotional Speech PowerPoint Presentation
Download Presentation
Emotional Speech

Loading in 2 Seconds...

play fullscreen
1 / 36

Emotional Speech - PowerPoint PPT Presentation


  • 271 Views
  • Uploaded on

Emotional Speech. CS 4706 Julia Hirschberg (thanks to Jackson Liscombe and Lauren Wilcox for some slides). Outline. Why study emotional speech? Why is modeling emotional speech so difficult? Production and perception studies Voice Quality features: the holy grail.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Emotional Speech' - oshin


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
emotional speech

Emotional Speech

CS 4706

Julia Hirschberg (thanks to Jackson Liscombe and Lauren Wilcox for some slides)

outline
Outline
  • Why study emotional speech?
  • Why is modeling emotional speech so difficult?
  • Production and perception studies
  • Voice Quality features: the holy grail

CS 4706

why study emotional speech
Why study emotional speech?
  • Recognition
    • Customer-care centers
    • Tutoring systems
    • Automated agents (Wildfire)
  • Generation
    • Characteristics of ‘emotional speech’ little understood, so hard to produce: …a voice that sounds friendly, sympathetic, authoritative….
    • TTS systems
    • Games

CS 4706

emotion in spoken dialogue systems
Emotion in Spoken Dialogue Systems
  • Batliner, Huber, Fischer, Spilker, Nöth (2003)
    • Verbmobil (Wizard of Oz scenarios)
  • Ang, Dhillon, Krupski, Shriberg, Stolcke (2002)
    • DARPA Communicator
  • Liscombe, Guicciardi, Tur, Gokken-Tur (2005)
    • “How May I Help You?” call center
  • Lee, Narayanan (2004)
    • Speechworks call-center
  • Liscombe, Hirschberg, Venditti (2005)
    • ITSpoke Tutoring System (physics)

CS 4706

why is emotional speech so hard to model
Why is emotional speech so hard to model?
  • Colloquial definitions of speakers and listeners ≠ technical definitions
  • Utterances may convey multiple emotions simultaneously
  • Result:
    • Human consensus low
    • Hard to get reliable training data

CS 4706

spontaneous corpora
Spontaneous Corpora
  • Unconstrained
    • [Campbell, 2003] [Roach, 2000]
    • [Cowie et al., 2001]
  • Call centers
    • [Vidrascu & Devillers, 2005] [Ang et al., 2002]
    • [Litman and Forbes-Riley, 2004] [Batliner et al., 2003]
    • [Lee & Narayanan, 2005]
  • Meetings
    • [Wrede and Shriberg, 2003]

CS 4706

acted corpora
Acted Corpora

happy

sad

angry

confident

frustrated

friendly

interested

anxious

bored

encouraging

CS 4706

ldc emotional prosody and transcripts corpus
LDC Emotional Prosody and Transcripts corpus
  • Semantically neutral (dates and numbers)
  • 8 actors
  • 15 emotions

CS 4706

are emotions mutually exclusive
Are Emotions Mutually Exclusive?
  • User study to classify tokens from LDC Emotional Prosody corpus
  • 10 emotions only:
    • Positive: confident, encouraging, friendly, happy, interested
    • Negative: angry, anxious, bored, frustrated, sad
  • Example

CS 4706

emotion intercorrelations
Emotion Intercorrelations

(p < 0.001)

CS 4706

results
Results
  • Emotions are heavily correlated
    • Positive with positive
    • Negative with negative
  • Emotions are non-exclusive
  • Can they be clustered empirically
    • Activation
    • Valency

CS 4706

global pitch statistics
Global Pitch Statistics

Different Valence/Activation

CS 4706

identifying emotions
Identifying Emotions
  • Automatic Acoustic-prosodic

[Davitz, 1964] [Huttar, 1968]

    • Global characterization
      • pitch
      • loudness
      • speaking rate
  • Intonational Contours

[Mozziconacci & Hermes, 1999]

  • Spectral Tilt

[Banse & Scherer, 1996] [Ang et al., 2002]

CS 4706

machine learning experiment
Machine Learning Experiment
  • RIPPER 90/10 split
  • Binary classification for each emotion
  • Results
    • 62% average baseline
    • 75% average accuracy
    • Acoustic-prosodic features for activation
    • /H-L%/ for negative; /L-L%/ for positive
    • Spectral tilt for valence?

CS 4706

a call center application
A Call Center Application
  • AT&T’s “How May I Help You?” system
  • Customers often angry and frustrated

CS 4706

hmihy example
HMIHY Example

VeryFrustrated

Somewhat Frustrated

CS 4706

features
Features
  • Automatic Acoustic-prosodic
  • Contextual

[Cauldwell, 2000]

  • Lexical

[Schröder, 2003] [Brennan, 1995]

  • Pragmatic

[Ang et al., 2002] [Lee & Narayanan, 2005]

CS 4706

results21
Results

CS 4706

tutoring systems should respond to uncertainty
Tutoring Systems Should Respond to Uncertainty
  • SCoT [Pon-Barry et al. 2006]
    • Responding to uncertainty
      • Active listening
      • Hinting vs. paraphrasing
    • Features examined
      • Latency
      • Filled pauses
      • Hedges
    • Performance metric
      • Learning gain
    • But no improvement by responding to uncertainty

CS 4706

uncertainty in itspoke
Uncertainty in ITSpoke

um <sigh> I don’t even think I have an idea here ...... now .. mass isn’t weight ...... mass is ................ the .......... space that an object takes up ........ is that mass?

[71-67-1:92-113]

CS 4706

itspoke experiment
ITSpoke Experiment
  • Human-Human Corpus
  • AdaBoost(C4.5) 90/10 split in WEKA
  • Classes: Uncertain vs Certain vs Neutral
  • Results:

CS 4706

voice quality and emotion
Voice Quality and Emotion
  • Perceptual coloring
    • Derived from a variety of laryngeal and supralaryngeal features
    • modal, creaky, whispered, harsh, breathy, ...
  • Correlates with emotion
    • Laver ‘80, Scherer ‘86, Murray& Arnott ’93, Laukkanen ’96, Johnstone & Scherer ’99, Gobl & Chasaide, ‘03, Fernandez ‘00

CS 4706

phonation gestures
Phonation Gestures
  • Adductive tension: interarytenoid muscles adduct the arytenoid muscles
  • Medial compression: adductive force on vocal processes- adjustment of ligamental glottis
  • Longitudinal pressure: tension of vocal folds

CS 4706

modal voice
Modal Voice
  • “Neutral” mode
  • Muscular adjustments moderate
  • Vibration of vocal folds periodic, full closing of glottis, no audible friction
  • Frequency of vibration and loudness in low to mid range for conversational speech

CS 4706

tense voice
Tense Voice
  • Very strong tension of vocal folds, very high tension in vocal tract

CS 4706

whispery voice
Whispery Voice
  • Very low adductive tension
  • Medial compression moderately high
  • Longitudinal tension moderately high
  • Little or no vocal fold vibration
  • Turbulence generated by friction of air in and above larynx

CS 4706

creaky voice
Creaky Voice
  • Vocal fold vibration at low frequency, irregular
  • Low tension (only ligamental part of glottis vibrates)
  • The vocal folds strongly adducted
  • Longitudinal tension weak
  • Moderately high medial compression

CS 4706

breathy voice
Breathy Voice
  • Tension low
    • Minimal adductive tension
    • Weak medial compression
  • Medium longitudinal vocal fold tension
  • Vocal folds do not come together completely, leading to frication

CS 4706

estimating voice quality
Estimating Voice Quality
  • Estimate wrt controlled neutral quality
    • But how do we know the control is truly “neutral”?
    • Must must match the natural laryngeal behavior to laboratory “neutral”
  • Our knowledge of models of vocal fold movements may be inadequate for describing real phonation
  • Known relationships between acoustic signal and voice source are complex
    • Only can observe behavior of voicing indirectly so prone to error.
    • Direct source data obtained by invasive techniques which may interfere with signal

CS 4706

next class
Next Class
  • Deceptive Speech

CS 4706