Analyzing complementary acoustic cues for signalling prominence in different languages

Analyzing complementary acoustic cues for signalling prominence in different languages William J. Barry Bistra Andreeva Jacques Koreman

This talk presents the related results from three recent presentations: Basis for this presentation • Koreman, J., Andreeva, B. & Barry, W.J. (2008). Accentuation cues in French and German, in: P.A. Barbosa, S. Madureira and C. Reis. Proc. Speech Prosody2008, Campinas (Brazil), 613-616. Campinas, Brazil: Editora RG/CNPq. • Koreman, J., Van Dommelen, W., Sikveland, R., Andreeva, B. & Barry, W.J. (in print). Cross-language differences in the production of phrasal prominence in Norwegian and German, Proc.Nordic Prosody2008, Helsinki (Finland). • Barry, William J. & Bistra Andreeva (2009). Cross-language and individual differences in the production and perception of syllabic prominence, Annual Meeting SPP 1234 Sprachlautliche Kompetenz 2009, Cologne (Germany).

This talk only deals with the acoustics of prominence. But because that involves several prosodic dimensions, the data analysis may also be relevant to multi-modal speech. Why present this here? • Björn Granström: “Coherence between audio and video?”, e.g. between nodding and F0 in “Båten seglede forbi”. • Kristiina Jokinen: “To what extent does non-verbal activity, esp. gestures and facial expressions, co-occur with verbal expressions?” (culture-dependence, communicative function) Are there cross-cultural (-language) differences in importance of acoustic and visual cues? (There are for prosodic dimensions.) Are they complementary? (Prosodic dimensions are.) What does that mean for synchrony detection? (Trouble?)

Outline • Research questions • Recordings • Measurements • Statistical analysis • Results • Discussion • Conclusion and possible relevance to COST 2102 The ideas about the acoustic realization of prominence that I present here are mainly Bill Barry’s and Bistra Andreeva’s.(This is an acknowledgement, not an attempt to evade responsibility.) from each of the three presentations

How do different languages exploit the universal means of signalling the varying prominence of words in an utterance? • •duration • • fundamental frequency • • energy • • spectral properties • Do the different word-phonological requirements of a language affect the degree to which the properties are exploited? • •duration (length opposition; word stress) • • fundamental frequency (tonal word-accent) • • spectral properties (phonologized vowel reduction) Research questions

The present work is part of a larger project funded by the German Research Council: • Cross-language and individual differences in the production and perception of syllabic prominence. Rhythm-typology revisited. • The languages investigated in the projects are • article 1 article 2 article 3•German• English• Norwegian• Bulgarian• Russian• French • Japanese Project

Recordings Results given here,but checked with text versions Six speakers from homogeneous groups in each language Comparable production task across languages: varying accentuation due to different focus on critical words (CWs) elicited by questions: broad narrow non-contrastive (early or late) narrow contrastive (early or late) Text replies to questions followed by “dada” version text dada • Norwegian sentences: • Hun Siv drar med skipet snart. • 2. Han Karl tenker på fag nå. • 3. Hans far brukte sagen da. • 4. Min pasta blir kald til da. • 6. Min stabsmann forblir bak nå. • 7. Han Krister fikk skiftet mitt. German sentences: 1. Der Mann fuhr den Wagen vor. 2. Das Bild soll nicht hässlich sein. 3. Das Kind sollte im Bett sein. 4. Der Peter kann den Film gucken. 5. Das Mädchen soll ein Bild malen. 6. Mein Vater kann Türkisch lesen. B E L

Measurements

Statistical analysis FR-GE(Speech Prosody data) Multivariate Anova’s for CW1 and CW2 separately with independent variables: language (FR, GE) focus (accented, deaccented) number of syllables in CW (1,2) Multivariate Anova’s per language (FR, GE) Stepwise discriminant analyses: cue weighting for CW1 and CW2 separately for each language separately

Main effects for language Interactions lang.  accentuation Results: Manova’s

1 1 1 1 2 1 1 2 2 GE FR 2 2 2 CW1 syllable duration 1 1 1 2 2 2 2 2 2 1 1 1 CW1 word duration Results for duration syllable duration word duration

1 1 2 1 2 1 1 1 1 2 2 2 2 2 2 2 1 1 1 2 1 1 1 1 Results for duration GE FR Effects greater for French than for German syllable duration CW2 syllable duration in final foot word duration CW2 word duration

Results: discriminant analyses CW1 CW2

Discussion Duration effects accented-deaccented in anova greater for French than for German: exploitation in German constrained due to segmental vowel length opposition?? Spectral balance included as DA-predictor in German: reduction increases accented-deaccented opposition (but no interaction lg x accentuation in Anova’s). But importance of duration in French compared to German not so clear in DA, probably due to correlation between acoustic cues. DA therefore not very suitable for analyzing these data.

Statistical analysis NO-GE(Nordic Prosody data) Multivariate Anova’s for CW1 and CW2 separately with independent variables: language (NO, GE) focus (broad, early narrow, late narrow) number of syllables in CW (1,2) Multivariate Anova’s per language (NO, GE)

Main effects for language Results Interactions lang.  accentuation

η2-values for accentuation (for both CWs, NO and GE)* Results: Manova’s per language * η2= ratio of treatment / total variances η2 in red > 0.5;η2 in grey n.s.

Results η2-values are a ratio of treatment and total variance, and thus indicate the part of the total variance explained by the focus conditions. In Norwegian, durational cues (esp. syllable duration) distinguish the three conditions. In German, intensity and F0 are the strongest cues to distinguish the three conditions. The lack of importance of F0 in Norwegian is most likely an artefact of the different realizations of the lexical tone 1 for mono- and disyllabic stimuli.

Focus early late broad CW1 Results for intensity vowel intensity • Similar patterns for (normalized) intensity for German and Norwegian • But greater differences between early, late and broad focus in German than in Norwegian • In Norwegian late and broad focus intensity of CW2 less than that of CW1, but not in German CW2 vowel intensity GERMAN NORW.

1 σ 1 σ 1 σ 1 σ 2 σ 2 σ 2 σ 2 σ Focus early late broad Results for duration critical word 1 syllable duration • Greater (normalized) durational differen-ces between early, broad and late focus in Norwegian than in German • Similar effect for CW2 word duration NORWEGIAN GERMAN

German strongly uses intensity to signal prominence • Norwegian uses duration more→ but Norwegian also has a vowel length opposition and is classified as the same rhythm type as German (stress-timed), so this disconfirms the hypothesis that the use of acoustic cues depends on their phonological status in a language! • F0 does play a role (esp. for German), but our measures do not reflect the different accent types well.→There is a difference in peak alignment of early and late/broad focus between Norwegian and German Results: summary

Analysis 6 languages(SPP1234 data) Anova’s with languages as independent variables Dependent variable is mean change in values from broad to contrastive focus Mean change is expressed as a percentage (duration, F0) or in dB (intensity)

Languages use the acoustic carriers of prominence to different degrees (CS=Critical Syllable): • NO > FR > RU ~ GE > EN ~ BUCS1 46% 32% 25% 22% 17% 16% • NO > FR > RU > GE ~ BU ~ ENCS2 53% 38% 26% 17% 17% 14% • Note: No apparent connection between vowel length opposition and use of duration for accentuation(in contrast to Rebecca Dauer‘s claim) Results for syllable duration of [da]

Languages use the acoustic carriers of prominence to different degrees: FR > EN ~ GE > BU ~ NO > RUCS1 72% 61% 58% 28% 27% 20% GE ~ FR > EN > BU > RU > NOCS2 64% 62% 51% 38% 31% 10% Note: Despite some shift in rank between FR, EN, GE and between NO and RU for the early (CS1) and the late position (CS2), the generally high vs. low dynamics for the groups remain (the ranking for [dada] is even more consistent) Results for F0 in text recordings

Languages use the acoustic carriers of prominence to different degrees (intensities in dB): BU > FR ~ GE > RU ~ EN > NOCS1 5.8 3.2 3.0 2.7 2.5 1.6 BU > FR = GE > EN > RU > NOCS2 6.5 5.6 5.6 4.2 3.7 2.8 Note: Larger intensity differences for CS2 than CS1. Results for intensity in [dada] recordings

For each acoustic parameter, there is a hierarchy of its exploitation for signalling focus-induced prominence in different languages. • Similar differences may exist between languages/cultures in the way they exploit different gestures (face, hand, arm, etc.) and/or for the relative explotiation of acoustic/visual cues, e.g. to signal focus or other communicative functions. • Possibly not only correlation (synchrony), but also complementarity of parameters. Conclusion and possible relevance

Thank you for your attention

Analyzing complementary acoustic cues for signalling prominence in different languages

Analyzing complementary acoustic cues for signalling prominence in different languages

Presentation Transcript

Signalling

Analyzing different protocols for E-business

Signalling

Signalling

Notables: Translated in over 30 different languages

Counting in different languages

Cues !

Signalling

Analyzing Different Mediums

Identifying and analyzing different perspectives

Signalling

DIFFERENT LANGUAGES, SO… DIFFERENT PEOPLE?

Speech is bimodal essentially. Acoustic and Visual cues.

Signalling

Concept Unification of Terms in Different Languages for IR

Perception of major acoustic cues

Fry: acoustic cues for consonants

Analyzing complementary acoustic cues for signalling prominence in different languages

Acoustic Cues to Emotional Speech

Cues

Foods in Different Languages