On Quantifying Speech Rhythm

UCSD Phonetics Lab On Quantifying Speech Rhythm Tristie Ross, Naja Ferjan & Amalia Arvaniti University of California, San Diego 1. Introduction • Rhythm categories and rhythm metrics • In the past decade or so, the idea that languages can be rhythmically categorized as stress- or syllable-timed on the basis of their temporal properties has acquired new currency, due to the development of various metrics of consonantal and vocalic variability, such as %V and C, the percentage of vocalic intervals in speech and the standard deviation of consonant intervals respectively (Ramus, Nazzi & Mehler 1999), or nPVI and rPVI, normalized and raw pairwise variability indices that reflect vocalic and consonantal variability respectively (Grabe & Low 2002). • Both sets of metrics (and of others proposed since the publication of Ramus et al. 1999) are based on the idea of Dauer (1983) that the rhythmic classification of languages can be deduced from their prevailing phonological patterns, such as the extent to which they allow consonant clusters and reduce vowels in unstressed syllables; the metrics are said to measure this variability as it is manifested in consonant and vowel durations in speech: • stress-timed languages are expected to show larger consonantal variability than syllable-timed languages, since they allow more complex syllable structures; • vocalic intervals, on the other hand, are said to occupy a smaller proportion of the signal in stress-timed than syllable-timed languages, because the former show vowel reduction in unstressed syllables, while the latter do not; this also means that vocalic intervals are expected to be more variable in stressed-timed than in syllable-timed languages. • Some problems with metrics • Different sets of metrics can classify the same language differently: e.g. %V and C classify Thai as syllable-timed, while nPVI and rPVI classify it as stress-timed (Grabe & Low 2002). • Many languages are difficult if not impossible to classify on the basis of metric scores; e.g. Grabe & Low (2002:531) conclude that Greek may be unclassifiable, while Catalan and Polish are said to have “mixed” rhythm (Ramus et al. 1999, Grabe & Low 2002). • The problems may be related to the fact that much of the research that shows metrics to be successful is limited in some way: e.g. • Ramus et al. (1999) used only five (unreported) sentences of each language; • Grabe & Low (2002) used only one speaker per language; • metrics are best at classifying languages that present a “conspiracy of factors” that place them towards one or the other end of the rhythm continuum (Bertinetto 1989). • Hypotheses • Metric scores are likely to vary across speakers, even if metrics are normalized, like VarCoC and VarCoV (White & Mattys 2007). • Metric scores are likely to be affected by the corpus used: if so, more “stress-timed” corpora should yield more “stress-timed” scores. • As a result of such effects, scores may vary substantially depending on method of calculation, speaker and corpus, making general claims about their ability to rhythmically classify languages difficult to sustain. • Variability in metric scores is likely to be more pronounced in L2 data. 2. Methods • Materials • Languages: Southern Californian (SoCal) English, Std Northern German, Std Italian, Std Greek, Std Korean, Std Mexican Spanish • Three types of materials were recorded • 15 read sentences: five sentences were as “stress-timed” as possible; five were as “syllable-timed” as possible; five sentences were not controlled for rhythm and were chosen from books of popular authors of each language • spontaneous speech: 1-2 min. of talking on a topic of the speaker’s choice, or on their parking experiences at UCSD; speakers could choose instead to describe three images from the Calvin and Hobbs cartoons • read running speech: The North Wind and the Sun with texts taken from IPA illustrations where appropriate. The images used • Speakers • Native speakers of SoCal English (N=3), Std N. German (N=3), Std Italian (N=3), Std Greek (N=1), Std Korean, Std Mexican Spanish. • For L2: native speakers of Std Northern German (N=3) and Std Italian (N=3); most speakers were graduate students in the US; they had different lengths of exposure to English and were judged by native SoCal English speakers (N=11) to have a foreign accent (with judgments correlating with L2 speaker intelligibility, but not with fluency or perceived rhythm). Procedures • The speakers were given time to familiarize themselves with the materials, prior to the recording. • They read the two types of materials and produced spontaneous speech twice in one session; the order of tasks was counterbalanced within language. • The L2 speakers were recorded in separate sessions for L1 and L2; they read only The North Wind and the Sun (as did the English control group (N=3)). • Measurements • Measurements of consonantal and vocalic intervals were made on the basis of phonetic criteria: • phrase final intervals were not excluded; • glides were included in vocalic intervals if they showed no evidence of frication: e.g. [w] in Italian acqua [akwa] “water”; • glides were included in consonantal intervals if they were fricated; e.g. [j] in Greek [jaja] “grandmother”. • Metrics • %V and DC, nPVI and rPVI, VarCoV and VarCoC were calculated for the three subsets of read sentences • Statistics • L1 data • one-way ANOVAs for each language with SENTENCE-SUBSET as repeated-measures factor and the metrics as dependent variables; • one-way between subjects ANOVAs with LANGUAGE as categorical variable and the metrics as dependent variables. • L2 data: between subjects ANOVAs with SPEAKER’S L1 as categorical variable and the metrics as dependent variables. • p < 0.05 whenever a difference between scores is reported.

UCSD Phonetics Lab On Quantifying Speech Rhythm 3. Results L1 RESULTS results in green show corpus effects supporting our hypothesis Italian C-%V: no effect for C; the “syllable-timed” subset has the highest %V. VarCos: no effects. PVIs: no effect for nPVI; rPVI is highest in the “stress-timed” subset, and higher in the uncontrolled subset than in the “syllable-timed” subset. English C-%V: no effect, but trend for “stress-timed” subset to have lower %V. VarCos: no effect for VarCoC; VarCoV is lower in the “stress-timed” subset than the other two subsets. PVIs: no effect for rPVI; nPVI is higher in the “stress-timed” subset than in the “syllable-timed” subset. German There are no effects, largely due to high inter-speaker variation. Greek Metrics gave different scores per subset, but not always in the right direction (since N=1, statistical analysis was not possible). L1 Grand Mean Results Most score differences across languages are not statistically significant; those that are do not pattern in the expected direction. C-%V: %V is lower in German than in English and Italian; there is no difference between English and Italian. VarCos: German VarCoC is lower than English, but not different from Italian; there is no difference between English and Italian. PVIs: there are no differences across languages. L2 RESULTS There were no significant differences for any of the metrics either between L1 English and L2 English or between the Italian and German L2 English. There is no evidence that the L2 scores corrrelate with either the L1 scores of the speakers or with the impression that these speakers’ accents give to L1 speakers. 4. Discussion and Conclusion • The results do not present a clear picture, but largely confirm our hypotheses: the scores from all metrics show substantial inter-speaker variation and are affected by the corpus used to calculate them; the variability is more extensive is the L2 data but shows no consistent pattern. • These results are not easy to interpret: although most differences within language are not statistically significant, the same applies across languages too, so that English and Italian are not differentiated by the metrics; many of the scores for English and German fall within the space that is supposed to be occupied by the scores of syllable-timed languages, while the Greek %V-C scores are very close to the scores that Ramus et al. (1999) report for Japanese and very different from the scores reported for Greek by Grabe & Low (2002) and Baltazani (2007). • The statistically significant differences, both across subsets within a language and across languages, do not pattern in the “right” direction: • %V suggests that German is more stress-timed than both Italian and English, while VarCoC indicates that German is more syllable-timed than both English and Italian; these differences clearly have to do with the variety of English examined here (in which consonant cluster simplification is widespread—e.g. Andy > [æni]—and the fact that we included geminates in our Italian corpus; but this in turn supports the view that the scores depend a great deal on the materials used to calculated them. • Although it is clear that more speakers and larger corpora are needed to confirm these results, overall they point to the inability of metrics to reflect even basic timing properties of speech. • The results also suggest that rhythm cannot be easily reduced to measures of consonantal and vocalic variability; this is largely because the phonological properties that the metrics purport to capture are too abstract to be directly and solely expressed as segmental duration. • On the other hand, segmental timing is also affected by factors that are not straightforwardly related to syllable structure, such as the extent of phrase final lengthening, the phonetic inventory of the language, the extent of vowel elision and so on. • Thus our results suggest that alternative ways of viewing rhythm should be revisited, such as the proposals of Dauer (1983) and Arvaniti (1994) that rhythm in all languages is based on grouping, accentuation and the alternation of levels of relative prominence. References: Arvaniti, A. 1994. Acoustic Features of Greek Rhythmic Structure. Journal of Phonetics 22.239-268. // Baltazani, M. 2007. Prosodic Rhythm and the Status of Vowel Reduction in Greek. Selected Papers on Theoretical and Applied Linguistics from the 17th International Symposium on Theoretical & Applied Linguistics, vol. 1:31-43. Thessaloniki.// Bertinetto, P.M. 1989. Reflections on the Dichotomy ‘stress’ vs. ‘syllable-timing’. Revue de Phonétique Appliquée 91-92-93.99-130. // Dauer, R. M. 1983. Stress-timing and syllable-timing Reanalyzed. Journal of Phonetics 11.51-62. // Grabe, E. & E. L. Low. 2002. Acoustic Correlates of Rhythm Class. Laboratory Phonology 7 ed. by C. Gussenhoven & N. Warner, 515-546. Berlin, New York: Mouton de Gruyter. // Ramus, F., M. Nespor & J. Mehler. 1999. Correlates of Linguistic Rhythm in the Speech Signal. Cognition 73.265-292. // White, L. & S. L. Mattys. 2007. Calibrating Rhythm: First Language and Second Language Studies. Journal of Phonetics 35.501-522.

On Quantifying Speech Rhythm

On Quantifying Speech Rhythm

Presentation Transcript

Rhythm

Stress and rhythm of English speech

RHYTHM

L2 Speech and Rhythm Metrics

Rhythm

Rhythm

Rhythm

rhythm

Rhythm

Rhythm

Rhythm

Rhythm

Rhythm

Rhythm

Rhythm

Quantifying Rhythm in Running Speech

Rhythm

Rhythm

rhythm

Rhythm

rhythm