Understanding Phonetic Categories in Speech Perception

What are “phonetic categories”? “Bottom-up” sequence of processing levels typically assumed Usually neglected by models of speech perception meaning grammar ? ? ? abstract word representations Standard domain of models of speech perception abstract phonemes Productivemorpheme 100 “Top-down” influences are poorly understood but are typically assumed to be separable from bottom-up processes abstract initial categories e.g. phonological features boundary shift 50 % /d/ 0 short VOT (d) long VOT (t) References Speech acoustics inform about multiple linguistic levels simultaneously12,13,34 ˅ t ɛ s mɪ s t a ɪ m z ɪ t ˅ t ɛ s m ɪ s t e ɪ k s ɪ t Unproductive morpheme Speakers exhibit different assimilation in function and content words. E.g. /m/ assimilates to place of next consonant in I’m but not lime or crime:11 I’m blowing/going/watching:aIm aI)N aI)wlime bark lime goes crime wave:aIm In principle, the acoustic pattern can be used by the listener to inform about the grammatical class of the speech segment being perceived. In its place in an utterance, ‘I’m’ has few or no acoustic competitors. Acoustic cue Nature Perceptual correlate Nature Perceptual correlate Events(see waveforms ) 1. periodic, nasal new syllable (simple onset); morpheme; word poor segment identity same as ‘mistimes’ 2. nasal-oral boundary + formant definition Abrupt Clear features for [m]; phoneme /m/? high front vowel? Unclear Unclear features for nasal? labial?? high vowel? front vowel?? 3. frication start rel. Late syllable coda starts; rhyme has voiceless coda??; features for [s]; phoneme /s/? syllable is unstressed (weak, light)?? rel. Early same as ‘mistimes’ except: syllable is unstressed (weak)? 4. fricative-silence boundary rel. Early phoneme /s/; voiceless coda; coda ends?; new syllable? features for [t]?; morpheme ends?? productive morpheme/same word?? rel.Late phoneme /s/; features for mis, maybe dis; features for [t]?; syllable coda continues?? morpheme continues (is nonproductive)?? Relationships Relative durations: sonorant:sibilant 1:1 weak light syllable? 1:2 weak heavy syllable Relative durations: sonorant:sibilant plus sibilant: silence 1:1 2:1 weak, light syllable;productive morphememis? (dis??) silence +intonation heralds new syll. onset, new foot? 1:2 3:1 weak heavy syllable; strong (stressed) syll.onset of same word (monomorphemic polysyllable)?; defocussed verb missed?? transient + aspiration Long confirms: productive morphememis (dis??); new strong syllableonset [th], new foot, new morpheme,same polymorphemic word; features for [th]; phoneme /t/ Short new strong syll.onset [st]; new foot. Con-firms monomorphemic word beginning mis(t), vis, bis (dis?); features for [t]; phoneme /t/ Relationships Bold font = nodes in linguistic structure = potential perceptual units Events 1 2? 3 4 1 2 3 4 A phonemic category boundary shift due to the Ganong effect. dask-task dash-tash Sarah Hawkins & Ingrid Johnsrude Phonetics Laboratory, Dept. of Linguistics, University of Cambridge, UK; Dept. of Psychology, Queen’s University, Canada sh110@cam.ac.uk ij4@post.queensu.ca Conclusions The problem Anatomical considerations (cont) 2) Acoustic variability is systematic and potentially informative Models of speech perception often emphasize phonetic or phonological categories (features, phonemes, gestures) that: • are stable, abstract entities; • result from stripping (irrelevant) variation from the speech stream; • are prerequisite to the processing of other aspects of speech (grammar and meaning). Phonetic and anatomical data are consistent with the hypotheses that The acoustic realization of a phoneme is systematically influenced by10,11: 1) allophonic variation: • position in the syllable (eg “tip” vs “pit”) • boundaries between words (eg “grey train” vs “great rain”) • grammatical status (eg the productivity of a morpheme; content vs function words) 2) speaker intent & register (discourse function, casualness, rate) 3) talker identity Experiments show listeners use much of this systematic variability12-17 • fine phonetic detail informs about perceptual units at multiple linguistic ‘levels’ (phonetics/phonology/grammar/meaning) simultaneously • and thus over different time domains (variable grain sizes)12,13,35 Hence a phonetic category: • is relational & plastic: each element is bound with other elements (larger, smaller) and no element can be described independently of its prosodic, grammatical, & functional context • entails cognitively and neuropsychologically distributedprocesses which operate on different types of information13,36,37 Some implications for models of speech perception: • speech perception, like visual object perception,38,39 mayconform to Bayesian models: e.g. hypotheses about speech segmental identity (at multiple scales of temporal integration)may be generated by ‘higher-order’ regions and tested in‘lower-order’ regions. • major challenges for the next generation of models include: • use of acoustic-phonetic information at all linguistic levels • long-range phonetic dependencies, at all linguistic levels Petrides & Pandya (1988). J Comp Neurol 273: 52-66 ABOVE: Anatomical organization of the macaque cortex suggests four or five discrete, hierarchically organized stages of auditory processing between primary core and frontal cortex21 RIGHT: Temporofrontal connections are parallel among multiple levels of auditory cortex (belt to superior temporal sulcus), segregated, bidirectional, and follow a strict anterior-posterior topographic organization 24,25 Seltzer & Pandya (1989). J Comp Neurol 281: 97-113 Examples: Grammatical information conveyed by systematic acoustic-phonetic variation Results of functional neuroimaging studies of speech perception are consistent with multiple, parallel, cascaded auditory streams of processing.22,23,26-28 Information flow in the auditory system is not unidirectional. Cortical feedforward connections each have their feedback complement.29-31 Anatomy suggests converging influences from multiple higher stages of perception, removed from the stage in question by zero, one, or more intervening stages.21 Neurophysiological studies suggest that information in even core auditory cortex regions is integrated over multiple time domains.32,33 Syllable-internal spectro-temporal relationships indicate morphemic productivity. Spectrograms of ‘mistimes’ & ‘mistakes’ from ‘I’d be surprised if Tess___ it.’ The first four phonemes (/mist/) are the same. Their acoustic differences produce a different rhythm that may signal that ‘mis’ in ‘mistimes’ is a productive morpheme, whereas ‘mis’ in ‘mistakes’ is not. This systematic acoustic variation has implications for models of word recognition incorporating lexical competition.15, 18-20 But: • phonemic category boundaries shift with phonetic context, meaning, and the function of the utterance. 2) much variability in speech sounds is systematic and potentially informative about features of speech other than phonemic categories. 1. Parducci. A. (1974) Psychophysical Judgment & Measurement. ed. Carterette/Friedman, 127-141. 2. Rosen, S., (1979) Journal of Phonetics 7, 393-402. 3. Pastore, R. (1987) Categorical Perception, ed. Harnad. Cambridge. 29-52. 4. Hawkins/Stevens (1985) J. Acoust. Soc. Am. 77, 1560-75 5. Ganong, W.F., (1980) J. Exp. Psych.: HPP 6, 110-125. 6. Borsky, S., et al.(2000) J. Psycholing. Res., 29, 155-168. 7. Ladefoged/Broadbent (1957) J. Ac..Soc. Am. 29, 98-104. 8. Norris, D. et al. (2003) Cognit Psychol, 47, 204-38. 9. Eisner/McQueen (2006) J. Acoust. Soc. Am.119, 1950-3. • Abercrombie, D, (1967) Elements of General Phonetics. • Local, J.K., (2003) J. Phonetics 31, 321-339. 12. Hawkins/Smith (2001) Italian J. Linguistics 13, 99-188. 13. Hawkins, S., (2003) J. Phonetics 31, 373-405. 14. Pisoni, D.B. (1997) Talker Variability in Speech Processing. ed. Johnson, JW, Academic. 9-32. 15. Davis et al.(2002) J. Exp. Psych.: HPP 28, 218-244. 16. Kemps, R. et al.,(2005) Mem. Cognit. 33, 430-46. 17. Salverda, A., et al. (2003) Cognition 90, 51-89. 18. Marslen-Wilson (1990) In Cognitive Models of Speech Processing. Ed Altmann, Cambridge. 148-172. 19. Norris, D. (1994) Cognition 52, 189-234. 20. McClelland/Elman (1986) Cognit. Psychol. 18, 1-86. 21. Kaas, J.. et al. (1999) Curr. Opin. Neurobiol. 9, 164-170. 22. Davis/Johnsrude (2003) J. Neurosci. 23, 3423-31. 23. Scott/Johnsrude (2003) Trends Neurosci. 26, 100-7. 24. Petrides/Pandya (1988) J. Comp. Neurol. 273, 52-66. 25. Seltzer/Pandya (1989) J. Comp. Neurol. 281, 97-113. 26. Davis/Johnsrude/Horwitz (2004) Soc. Neurosci. Ann. Mtg 27. Rodd, R. et al. (2005) Cereb. Cortex 15, 1261-9. • Buchsbaum. B.R. et al. (2005) Neuron 48, 687-97. • Pickles, J. (1988) An Introduction to the Physiology of Hearing. London: Academic Press. • Pandya, D.N. (1995) Rev. Neurol., 151, 486-494. • de la Motte, L. et al. (2006) J. Comp. Neurol. 496, 27-71. • Nelken, I. et al. (2003) Biol. Cybern. 89, 397-406. 33. Ulanovsky, N., et al. (2004) J. Neurosci. 24, 10440-53. • Ogden, R. et al. (2000) Comput. Sp. & Lang.14, 177-210 • Boemio, A., et al (2005) Nat. Neurosci. 8, 389-95. • Andruski, J et al. (1994) Cognition 52, 163-187 • Blumstein, S.et al. (2005) J. Cog. Neurosci. 17, 1353-66. • Murray, S. et al. (2002) Proc. Natl. Acad. Sc. 99,5164-9. • Kersten, D. et al. (2004).Ann. Rev. Psychol. 55, 271-304. 1) Phonemic category boundaries are context-dependent; thus not stable Range & frequency effects: category boundaries tend towards the middle of the stimulus series. When stimuli are removed from one end of a continuum, the boundary shifts towards the other end. Stimulus frequency (& previous stimulus) affect current decision.1-4 Meaning: phonemic boundaries favour the phoneme related to the word in word-nonword continua;5 they favour sensible meanings in word-word continua in sentences.6 Perceptual learning:Rather little exposure to a novel pronunciation is required for a phonemic category boundary to shift.3,7-9 Anatomical considerations The anatomical organization of the brain is not consistent with serial, feedforward models of speech perception.19 The multiple cognitive processes required for speech comprehension probably rely on multiple cortical networks that operate in parallel. This functional organization may, in humans, map onto anatomically segregated, hierarchically organized processing streams similar to those identified in macaque monkeys.21,22 Different pathways may be differentially specialized to serve different processes or operate on complementary representations of speech(eg articulatory; phonological; crossmodal).23 Perceptual information available in the short sections of sound, ‘mist,’ taken from ‘Tess mistimes it’ and ‘Tess mistakes it’.Information about featural, phonemic and lexical identity, and syllabic, morphemic and grammatical structure is conveyed simultaneously in the fine acoustic-phonetic detail, comprising both ‘events’ at segment boundaries and longer-term relationships. Prior knowledge is required for linguistic information—at all levels—to be extracted from sensory input. No unit is identifiable independent of context, and no unit/level is primary. Information is mapped onto prosodic structures linked to grammatical structures12,13,34 (example at http://kiri.ling.cam.ac.uk/sarah/docs/CNS06trees.pdf). Funded in part by the Leverhulme Trust

Understanding Phonetic Categories in Speech Perception

Understanding Phonetic Categories in Speech Perception

Presentation Transcript

Assimilation

Speakers

Timing processes and disfluency in fluent speakers and speakers who stutter

Content words and function words

SPEAKERS AND SOUND

Different words do different jobs in a sentence.

Data assimilation in Aladin

PHONICS words spelled ‘ ea ’ with different sound

SPEAKERS

e.g., Attachment Reminder

Speakers and moderators

Parametrizations in Data Assimilation

Data Assimilation in AMPS

Assimilation and validation

amplifier and speakers

Assimilation

Speakers Rental Dubai | Speakers and Lighting Rental in Dubai

Content words and function words

Speakers and Session Chairs