Cognitive control and intoxicated speech variance Thomas Purnell University of Wisconsin-Madison tcpurnell at wisc.edu
Challenge • How do we model intoxication to know the effect of ethyl alcohol on behavior? • Toxin-induced syndrome • Phonetic control and naturalness in language variation and change • Using intoxication as a tool • Allows for control of behavior • Intoxicate vernacular speech disrupts or severs the speech chain
Programmatic Goals • Examine global and low level measures • Known prosodic or temporal information • Known spectral frequency information • Unknown effects • On dialect • On forensics • Research Question • Can acoustic features reliably correlate to low-level (2 drinks) alcohol intoxication?
Areas of Inquiry • Variation from the mean: Prosody and pacing • Global effects: Long-term average spectra • Lower level effects: Vernacular vowel changes
Intoxicated speech generally • Intoxicated speech affects motor control of speaking leading to coarticulatory, precision and timing differences from sober speech (Klingholz et al. 1988, Pisoni & Martin 1989, Behne & Rivera 1990, Hollien et al. 2001) • In past, primary focus on prosody and consonants (e.g., Hollien et al. 2001)
Gross effects • Dysfluencies are common at the sentence, word, and segmental levels • Duration is increased, speaking rate decreases • Sound substitutions/segmental changes • Lengthening of individual segments • Replacing [s] with [ʃ] is particularly common • Devoicing of final obstruents • Deaffrication • Spirantizing stops
Recordings • Subjects recorded as participated in a larger study (Moberg & Curtin 2009) • As such, four recordings • Prior to intoxication (two drinks achieving a BAC of 0.08% in 30 minutes; Curtin & Fairbanks 2003) • In the ascending arm of intoxication (15 minutes after the first drink) • At peak (30 minutes after first drink) • In the descending arm (15 minutes, post-peak intoxication).
Effects at 0.08 BAC • Euphoric condition • Increase in talkativeness • Shortened attention span • Impaired judgment, relaxation • Impaired fine muscle coordination, but not quite ataxia (begins soon after and by 0.12 BAC)
Acoustics of the actors’ speech simulation of intoxication • Hollien, Liljegren, Martin & DeJong (2001) investigated the acoustics of the actors’ speech • F0 • Mean F0 higher under intoxication and simulated intoxication for the males • Mean F0 higher under intoxication and lower under simulated intoxication for the females • Duration: statistically significantly longer for both real and feigned intoxication • Intensity: showed no systematic change
Pitch and Intoxication • Guiding idea is that intoxication is a family of variances from the mean that exceed the standard variance • E.g., observing someone driving a car involves multiple cues • Hanke & Purnell (2006) • Pitch variation widely variant across the four conditions • Placebo effect: belief and behavior
Comparison of Prosody, S01 M • T1 (137 seconds) • N = 66 phrases • x̄ Dur =1.5 sec • x̄ Phrase F0=117.6 Hz • x̄ Phrase Std=8.1 Hz • x̄ Min F0=102.7 Hz • x̄ Max F0=133.4 Hz • T4 (123 seconds) • N = 48 phrases • x̄ Dur =2.2 sec • x̄ Phrase F0=119.0 Hz • x̄ Phrase Std=15.0 Hz • x̄ Min F0=95.6 Hz • x̄ Max F0=153.1 Hz
Rainbow, Para 1 • When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow. The rainbow is a division of white light into many beautiful colors. These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon. There is, according to legend, a boiling pot of gold at one end. People look, but no one ever finds it. When a man looks for something beyond his reach,his friends say he is looking for the pot of gold at the end of the rainbow. Where do the breaks go?
Rainbow, Para 1 • When the sunlight strikes raindrops in the air, (8 words) • they act as a prism and form a rainbow. (9) • The rainbow is a division of white light into many beautiful colors. (12) • These take the shape of a long round arch, (9) • with its path high above, (5) • and its two ends apparently beyond the horizon. (8) • There is, according to legend, (5) • a boiling pot of gold at one end. (8) • People look, but no one ever finds it. (8) • When a man looks for something beyond his reach, (9) • his friends say he is looking for the pot of gold (11) • at the end of the rainbow. (6)
Rainbow, Para 1, T1 • When the sunlight strikes raindrops in the air, [0.309] they act as a prism [0.170] and [0.140] form a rainbow. [0.898] The rainbow is a division of white light into many beautiful colors. [0.329] These take the shape [0.439] of a long round arch, [0.678] with [0.858] its path high above, [0.289] and its two ends apparently [0.180] beyond the horizon. [0.509] There is, according to legend, a boiling pot of gold at [0.200] one end. [0.758] People look, [0.050] but no one [0.100] ever finds it. [1.177] When a man looks for something beyond his reach,his friends say he is looking for the pot [0.040] of [0.220] gold [0.090] at the end of the [0.020] rainbow [1.207].
Rainbow, Para 1, Time 4 • When the sunlight strikes [0.359] raindrops in the air, they act as a prism and form a rainbow. [0.259] The rainbow is a division of white light into many beautiful colors. [0.748] These take the shape of a long round [0.110] arch, [0.050] with [0.190] its path high above, [0.529] and its two ends apparently beyond the horizon. [0.788] There is, according to legend, a boiling pot of [0.040] gold [0.040] at one end. [0.629] People look, [0.050] but no one ever finds it. [0.778] When a man looks for something beyond his reach, his friends say he is looking for the pot of gold at the end of the rainbow. [1.856]
Work to do • Verbal stumbling • Need to apply a more thorough family of measures • Better statistical modeling: differences in distribution
Long Term Average Spectra (LTAS) • Measure overall energy envelope • Use longer passages • Gives a better picture of habitual vocal tract behavior of an individual speaker • With enough speech, smooth out situational variation
Spectral envelope “The rainbow is a division of white light into many beautiful colors.”
LTAS & bandwidth 1,000 Hz 500 Hz 100 Hz
Previous work • Previous results have largely found just changes with no pattern besides simply variation (e.g., Schiel and Heinrich, 2009) • Used for a wide variety of tasks to discriminate between different categories of speech (e.g., Boersma and Kovacic, 2006; Pauk, 2006) • Our question was asked before, using LTAS to find acoustic cues to intoxication, but the full data are not available, and some experimental balances not in place (Klingholz et al. 1988)
Importance for variationists • Seems awkward because not fine-tuned measure • May show gross effects of nasalization, pharyngealization, etc. • All detail is not lost • Unknown uses such as informing spectral tilt analyses
Hypotheses • Null • No difference in spectral energy between sober and intoxicated states • Test • Difference in spectral energy with intoxicated speech being more variable in energy • Difference in spectral energy with intoxicated speech being lower in energy
Speakers & Task • 10 native American English speakers from Wisconsin • Divided by gender • Average subject age: 26 (range, 21-36) • 5 placebo subjects • 5 controls who came back after 2 weeks
Acoustic Analysis • Rainbow passage only • Recordings normalized • LTAS, 3 bandwidths • 100 Hz • 500 Hz • 1,000 Hz • Begin with contrastive states • T1 sober • T3 peak intoxication • Paired t-tests
1kHz Bandwidth Significant 2.8 dB average reduction under intoxication in 0-2kHz and 4-5kHz regions
500Hz Bandwidth Significant 2.0 dB average reduction under intoxication in 500Hz-1.5kHz and 4-5kHz
100Hz Bandwidth Significant 2.2 dB average reduction under intoxication in specific areas
Normal Variation • Five WI subjects’ normal variation • Small decrease in amplitude (0.8dB, cf 2.2 dB intox)
Placebo Variation • Five placebo subjects’ averaged variation • Small increase in amplitude (0.7dB)
Subject consistency • Probability of reduction in amplitude for significant frequency ranges
Discussion • Rejects the null hypothesis (“no systematic differences”) and tests hypothesis 1 (“intox. more variable”) • Supports test hypothesis 2 (“intox. lowers envelope”) • Found statistically-significant variation across the sober and intoxicated conditions reduced by roughly 2 dB • Two general regions • 300 to 1300 Hz • 4 to 5 kHz • Why?
Low frequencies • Physiological possibility • Alcohol inhibits tongue muscle (Krol, 1984) • Jaw lowering movement for low back vowels (300 - 1300 Hz) is suppressed F2: Vowel Backness 1500 500 2500 F1: Vowel Height o 400 ʌ a Ɔ 800
High frequencies • Potentially known “lush” characteristic • Apical [s] laminal [ʃ] • Fine motor control substituted by less accurate gross gesture
Alternative: Overall nasality If velopharyngeal port is left open, then expect overall lowering of spectral envelope rainbow rainbow
Intoxicated vowels • Vowel formants in intoxicated speech may reduce (e.g., diminished F1/F2 ratio; Klingholz et al. 1988) and may be variable across speakers (Behne & Rivera 1990, Hollien et al. 2001) • Other studies argue that a low blood alcohol content (BAC) affects vowels less than prosody and consonants (Pisoni & Martin 1989), perhaps because of greater aperture in vowels controlled by jaw movement (Perkell 1969, Stevens 1989)
Sociophonetics of vowels • But, we are more sophisticated in our understanding of vowels • E.g., /aj/, /aw/, /æ/ etc. should be examined by following consonant; /o/ and /u/ with light of initial consonant, ... • Vowels can be style shifted (Labov 2001), that is, vowels are under some conscious motor planning • Need to take into account a speaker’s dialect, esp relation to contemporary vowel shifts and mergers
Goal • Describe and contextualize • Vowel changes within specific speakers • Between a speaker-internal control state (not intoxicated) to speaker-internal test state (intoxicated) • How does ethyl alcohol-induced motor control modify vowel qualities and interact with dialectal variation?
Null Hypothesis • No significant difference across intoxication level for subjects • Vowels in the vowel space should be the same for the two times (Pisoni & Martin 1989) • Perhaps because of the gross motor movement of the jaw involved in vowel articulation rather than fine aperture movements of consonants (Perkell 1969)