A caveat about ‘cognition’

The Complexities of Understanding Speech in Background NoiseStuart RosenUCL Speech, Hearing and Phonetic SciencesFirst International Conference onCognitive Hearing Science for Communication

A caveat about ‘cognition’ • Important aspects of this problem are not ‘cognitive’ but … • Cognitive processing … • relies on adequate sensory representations, and … • can compensate for impoverished sensory representations.

Why is this interesting? • Most speech is not heard in quiet, anechoic conditions. • People vary a lot in how well they can understand speech in the presence of other sounds. • Effects of hearing impairment • Effects of age • Auditory processing disorder (APD)?

Some determinants of performance: I • The nature of the target speech material • Predictability • context • number of alternative utterances • frequency of usage • size of lexical ‘neighbourhoods’

Some determinants of performance: II • The configuration of the environment • Open air or in a room? • How ‘dry’ is a room? • effects of reverberation • spatial separation between target and noise • or, the transmission system (e.g. mobile telephone) • distortion & noise added by the system

Some determinants of performance: III • Talker characteristics • Different talkers vary considerably in intrinsic intelligibility • Talkers vary their own speech depending upon demands of the situation • hyper/hypo distinction of Lindblom (1990) • Match between talker and listener accents

Some determinants of performance: IV • Listener characteristics • Linguistic development • vocabulary knowledge • ability to use context • the presence of language impairments • L1 vs L2 • Hearing sensitivity and any hearing prosthesis used • Neuro-developmental disorders • Language impairment • Autism spectrum disorder • APD

Some determinants of performance: V • The nature of the background noises • level (SNR) • fluctuations in level • spectral characteristics • genuine ‘noise’: aperiodic or periodic? • and/or other talkers • how many there are • speaking your own language or a language you don’t know • How ‘attention-grabbing’ the background noises are

The simplest case:A steady-state background noise

Much is understood about what makes one steady noise more or less interfering than another • spectral shape • SNR

‘Energetic’ masking • Noises interfere with speech to the extent that have energy in the same frequency regions • Can be quantified in the ‘articulation index’ • Reflects direct interaction of masker and speech in the cochlea, which acts as a frequency analyser.

But noises are typically not steady …

Fluctuating maskers afford ‘glimpses’ of the target signal target glimpses masker masker

‘dip listening’ or ‘glimpsing’ • People with normal hearing can listen in the ‘dips’ of an amplitude modulated masker  better performance SRT for VCVs in simple on/off fluctuations as a function of the duration of the fluctuation. Howard-Jones & Rosen (1993) Acustica

‘Dips’ can be limited in frequency (‘checkerboard noise’)  better performance SRT for VCVs in 10 Hz modulations with different numbers of channels. Howard-Jones & Rosen (1993) JASA

But maskers can be periodic too, most importantly, when speech is in the background.

Miller (1947)‘The masking of speech’ It has been said that the best place to hide a leaf is in the forest, and presumably the best place to hide a voice is among other voices.

Listening to speech in ‘noise’ Children’s Coordinate Response Measure Bouncy in quiet in steady noise in modulated noise against another talker

A useful distinction • Energetic masking • maskers interfere with speech to the extent that have energy in the same time/frequency regions • primarily reflecting direct interaction of masker and speech in the cochlea • relevance of glimpsing/dip listening • Temporal and/or spectral ‘dips’ in the masker allow ‘glimpses’ of target speech • Informational masking everything else!

Informational masking • Something to do with target/masker similarity? • signal and masker ‘are both audible but the listener is unable to disentangle the elements of the target speech from a similar-sounding distracter’ (Brungart, 2001)

Informational masking: a finer distinction (Shin-Cunningham, 2008) • Problems in ‘object formation’ • Related to auditory scene analysis • similarities in auditory properties make segregation difficult • voice pitch, timbre, rate • Problems in ‘object selection’ • Related to attention and distraction • the masker may distract attention from the target • e.g., more interference from a known as opposed to a foreign language 1 woman, 1 man 2 men

EM & IM appear to operate at different parts in the auditory pathway • Energetic masking at the periphery, in the cochlea • Early developing abilities • Increased EM from hearing impairment • Unlikely to be a factor in APD • Informational masking at higher centres • Late developing abilities? • Increased IM in older listeners? • Increased IM in developmental disorders? • But aspects of IM can be made difficult by peripheral factors • e.g., CI users difficulties with auditory scene analysis

little glimpsing for CI usersNelson et al. (2003) speech-spectrum-shaped masking noise square-wave modulated added to IEEE sentences normal listeners better performance →

CI users not only poor frequency selectivity, but lack of sensation of voice pitch (poor perception of TFS) makes auditory scene analysis difficult: How do you tell the noise from the speech? better performance →

But IM can be excessive in the presence of normal hearing …

Children find it hard to ignore another talker  better performance

Slow development of abilities that minimise IM  better performance

Increased IM in Specific Language Impairment (SLI)  better performance 9 SLI & 10 TD children aged 6-10 years CCRM sentences MSc work of CsabaRedey-Nagy steady noise ed speech

Increased IM in some people withHigh Functioning Autism (HFA) • evidence for a temporal processing deficit but … • not the crucial factor in excessive masking for speech control better HFA worse HFA CCRM sentences in various backgrounds PhD work of Katharine Mair

Increased IM in some people withHigh Functioning Autism (HFA) HFA poor performers (and younger children) are highly susceptible to informational masking … but what aspect? ASA? attention? linguistic aspects? control better HFA worse HFA CCRM sentences in various backgrounds PhD work of Katharine Mair

An ecologically valid test bed for evaluating the roles of EM and IM:Speech in n-talker babble for n=1,2,3…∞ talkers

SNR (dB) +12 +6 0 -6 -12 -18 Miller (1947)Increasing the number of talkers in the masker • ‘It is relatively easy for a listener to distinguish between two voices, but as the number of rival voices is increased the desired speech is lost in the general jabber.’ • target words from multiple males • babble: equal numbers of m/f • (1 VOICE is male) better performance →

IEEE sentences in n-talker babble • What happens as n increases? • glimpsing opportunities  so EM  • linguistic content  so IM  (selection?) • number of Fo contours so IM  (ASA) better performance →

1-talker voice pitch source with envelopes derived from n-talker babble 1-talker 2-talker 16-talker babble-modulated 1-talker F0 (plus with an unmodulated envelope)

2-talker voice pitch source with envelopes derived from n-talker babble 1-talker 2-talker 16-talker babble-modulated 2-talker F0 (plus with an unmodulated envelope)

Unintelligible maskers on noise-vocoded IEEE sentences Periodicity in the maskers leads to better performance, probably through better ASA It’s easier to ignore a single F0 contour, rather than two but ... Why improved performance for steady-state vs 16-talker envelopes? Worse still, why glimpsing in noise?! noise 2 Fo contours  better performance 1 Fo contour

Final remarks • The balance of EM & IM effects presumably varies with the age and hearing status of the listener • The linguistic effects seen may represent a separate aspect of IM apart from object formation and selection. • Unraveling the contributions of various factors in understanding the masking of speech by other sounds is very important … • But very complex!

Tack såmycket! Work supported by: UCL Speech, Hearing and Phonetic Sciences National Institutes of Health DC006014 Bloedel Hearing Research Center Thanks to my collaborators: Sophie Scott, Katharine Mair, Tim Green, CsabaRedey-Nagy, Jude Barwell, Zoe Lyall & AroojMajeed of UCL Pam Souza, Northwestern U

A caveat about ‘cognition’