1 / 48

Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems

Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems. Dr. Aaron C. Elkins The University of Arizona. Emotional Voice. Can computers perceive vocal emotion?. Yes…. but, The science of the emotional voice is young Communication is complex and dynamic

hachi
Download Presentation

Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vocalic Markers of Deception and Cognitive Dissonance for Automated Emotion Detection Systems Dr. Aaron C. Elkins The University of Arizona

  2. Emotional Voice

  3. Can computers perceive vocal emotion? • Yes…. but, • The science of the emotional voice is young • Communication is complex and dynamic • Moods and emotions contextually switch • Emotion is computationally ill-defined • Measuring emotion may inform theory

  4. Emotional Dimensions DISGUST?

  5. Four Components of Speech • Voiced vs. Unvoiced sounds • [v] vs. [f] • Airstream through mouth or nose • [m] vs. [o]

  6. Speech Sounds • (1) pitch, (2) loudness, and (3) quality • Sound is small variations in air pressure that occur rapidly in succession • Vocal folds superimpose outgoing air of voiced sounds • The vocal folds vibrate to create a periodic vibration (100 – 250 Hz) • We measure these features digitally

  7. Recording Father – Digital Audio Waveform measures pulses of vocal folds Based on air pressure disturbance (dB) Voiced vs. Unvoiced (low pressure) Each peak occurs every 100th of a second (100 Hz)

  8. Vowel Articulation • Source-Filter Theory (Müller, 1848) • Vocal Folds vibrate at same speed (pitch) • Resonance changes in vocal tract to filter frequencies (formants)

  9. Vocalics • Vocalic Analysis • Examines how it was said • Amplitude • Pitch (frequency) • Response latency • Tempo • Linguistics • Examines what was said

  10. Sound Production is Complex • When we tense our muscles, such during stress, our larynx tenses • Higher Pitch • The process is complex • Emotions affect the normal operation • Deception takes away cognitive resources away and is stressful • More mistakes, lower quality, increased average and variation in pitch • Sympathetic Nervous system response • Increased auditory acuity • Heightened arousal

  11. Standard Vocal Measures Calculated with Praat and Custom Signal Processing Software

  12. Nemesysco LVA 6.50Commercial Vocalic Software Evaluated

  13. Five Vocalic Studies Summarized • Study One (Deception Experiment) • Study Two (Cognitive Dissonance) • Study Three (Embodied Conversational Agent and Trust) • Study Four (Embodied Conversational Agent Security Screening - Bomber) • Study Five (Embodied Conversational Agent Security Screening - Imposter)

  14. Vocal Deception (Study 1) – Experimental Design • N = 96 • $10 reward for appearing credible to professional interviewer • Two Sequences: First Sequence: DT DDTT TD TTDD T Second Sequence: DT TTDD TD DDTT T • 13 Short-Answer Questions • Only 8 had variation both within and between subjects • Two types of questions: Charged and Neutral

  15. Results • Built-in classification performed at chance level • Vocal measures independent of system discriminated deception: FMain, AVJ, and SOS • Possible Latent Variables measuring Conflicting Thoughts, Cognitive Effort, and Emotional Fear • Logistic regression performed best on charged questions • Higher pitch, cognitive effort, and hesitations are predictive of deception in more stressful interactions • The claim that the vocal analysis software measures stress, cognitive effort, or emotion cannot be completely dismissed • Deception and Stress can be predicted by Acoustic measures of Voice Quality and Pitch when controlling for speaker characteristics

  16. Vocal Dissonance (Study 2) –Experimental Design • Modified Induced-Compliance Paradigm • Participants (N=52) made two vocal counter-attitudinal arguments for cutting funding for service for the disabled • Choice is manipulated High vs. Low (IV) High N = 24, Low N = 28 • Participants report attitude towards argument issue (DV)

  17. Arousal (Vocal Pitch) • High choice had a 10Hz higher pitch • F(1,50) = 4.43, p = .04 • All participants reduced their pitch over time • F(1,50) = 4.90, p = .03

  18. Cognitive Difficulty • High Choice had nearly 2x the response latency on argument two • F(1,50) = 4.53, p = .04 • Arousal moderation

  19. Cognitive Difficulty • Participants spoke with 33% more nonfluencies on the second argument • F(1,50) = 4.03, p = .05

  20. The Importance of Language (Imagery as Abstract Language)

  21. Vocal Dissonance Model • χ²(1, N = 51), p = .49 SRMR = .02 • R² Attitude Change = .17, Imagery = .11

  22. From the lab to the AVATAR

  23. First Kiosk

  24. Kiosk from Last Year

  25. Third-Generation Kiosk

  26. Gender and Demeanor

  27. Vocal Trust (Study 3) – Experimental Design • Participants completed pre-survey • Packed bag before ECA screening interviewing • Completed security screening • All responses to ECA recorded for vocal analysis

  28. ECA Demeanor and Gender N = 88 Participants (53 Males, 35 Females) Repeated Measures Latin Square Design All participants interacted with all demeanor and gender ECA combinations 4 Questions Per block, 16 Total Questions

  29. Trust and Time Multilevel Growth Model Specified with Trust as the DV (N = 218) with Subject as random effect (N=60) Main effects • Initial Trust = 4.09 • Trust Rate of Change • .04 per second increase • p < .01 • Duration • .05 decrease in trust for every second spent answering the ECA overthe 7.6 second average • p < .001

  30. Vocal Pitch, Time, and Trust • Main Effect of Pitch • For every 1Hz increase in pitch over 156Hz trust drops by .01 • p = .03 • Interaction Pitch and Time • Pitch x Time b = 9.3e-05, p = .03 • Over time pitch predicts trust less and less

  31. Results • Human perceptions of trust transfer to ECA • Time plays in important role in the interaction • All participants trusted the ECA more over time, particularly when it smiled • 48 increase in trust when ECA smiles • Vocal measures of pitch predicted trust, but only early on • For every 1Hz increase in pitch over 156Hz trust drops by .01 • Over time pitch predicts trust less and less

  32. Vocalics of a Bomber (Study 4) Experimental Design • 29 EU border guards were randomly assigned to build a bomb (N = 16) or Control (N = 13) then pack a bag • Identical to Study 3,but no breaks in the interview • Only male neutral demeanor ECA interviewed participants • Bomb Makers were instructed to successfully smuggle the bomb past the ECA

  33. Vocal Analysis Recorded responses to question: • “Has anyone given you a prohibited substance to transport through this checkpoint?” • Average Response 2.68 sec (SD = 1.66) • Responses such as “No” or “of course not” • Vocal measures of Pitch and Pitch Variation

  34. Results of Vocal Pitch • Voice Quality, Gender, and Intensity included as covariates • No difference in mean vocal pitch F(1,22)=0.38, p = .54 • Main Effect of pitch variation • Bomb Makers had 25.34%more variation F(1,22)=4.79, p=.04

  35. Pitch Contours

  36. Eye Gaze: Guilty

  37. Eye Gaze: Innocent

  38. Vocalics of an Imposter (Study 5) – Experimental Design • 38 EU Border Guards • All required to present visa and passport through multiphase screening • E-gate • Manual Processing • AVATAR Screening Interview • Four randomly assigned imposters carrying false documents with hostile intentions through screening

  39. AVATAR Interaction Example

  40. iPad Output for Screener

  41. Voice Quality Change from Baseline Question (What is your full name?)

  42. Vocalic Classification Model

  43. Vocalic Resulting Classification • 7 innocents falsely classified as terrorists • 27 correctly classified as innocent • All “guilty” referred to secondary • Overall accuracy = 81% • TPR = 100% • TNR = 79% • FPR = 20% • FNR = 0%

  44. Eye Fixations on Visa

  45. Date of Birth Results – Correct?

  46. Final Decision Model

  47. Vocalic Resulting Classification • 3 innocents falsely classified as terrorists • One of these three was actually lying • Actually a True Positive • 31 correctly classified as innocent • All “guilty” referred to secondary • Overall accuracy = 94.47% • TPR = 100% • TNR = 88.24% • FPR = 5.8%  Reduced by 3/4 • FNR = 0%

  48. Questions? • Isn’t the voice amazing?

More Related