Additivity of auditory masking using Gaussian-shaped tones

Acoustics Research Institute Austrian Academy of Sciences • Additivity of auditory masking • using Gaussian-shaped tones • aLaback, B., aBalazs, P., aToupin, G., bNecciari, T., bSavel, S., bMeunier, S., bYstad, S., and bKronland-Martinet, R. • aAcoustics Research Institute, Austrian Acad. of Sciences, Austria • bLaboratoire de Mécanique et d'Acoustique, CNRS Marseille, France • MULAC Meeting, Vienna • Sept 24rd, 2008 • Bernhard.Laback@oeaw.ac.at • http://www.kfs.oeaw.ac.at

Motivation • Both temporal and frequency masking have been studied extensively in the literature • Very little is known about their interaction, i.e., masking in the time-frequency domain • An accompanying study (Necciari et al., this conference) presents data on time-frequency time-frequency masking caused by a Gaussian-shaped tone pulse (“Gaussian”) • Our aim is to study the additivity of masking from multiple Gaussian maskers • Taken together, these data may serve as a basis to model time-frequency masking in complex signals

Tim-frequency masking frequency time

Outline • 3 steps: • Additivity of temporal masking • Additivity of frequency masking • Additivity of time-frequency masking (not presented today)

Experiment design • Both signal and maskers are Gaussian-windowed tones: with Γ: gamma factor: (Γ = α.f0), where f0 is the tone frequency and α the shape factor • Equivalent rectangular bandwidth ( Γ): 600 Hz • Equivalent rectangular duration: 1.7 ms • Good properties of Gaussian in time-frequency domain: • Minimal spread in time-frequency • Gaussian shape in both time and frequency • A study by van Schijndel et al. (1999) has shown that Gaussian-windowed tones with an appropriate alpha factor may fit the auditory time-frequency window.

Experiment design • Procedure: • 3 interval - 3 AFC (oddity task) • Adaptive procedure: 3 down - 1 up rule (estimates the 79.4% threshold) • 12 turnarounds, the last 8 used to calculate the threshold • Stepsize: 5 dB, halved after 2 turnarounds • Repeated measurements to have at least three stable values • Presented in blocks of equivalent number of maskers • Five subjects, normal hearing according to standard audiometric tests

Additivity of temporal maskingDesign • Frequency (target and maskers): 4000 Hz • Four maskers with time shifts: -24, -16, -8, +8 ms • Maskers nearly equally effective (iterative approach) • Amount of masking:  8 dB • Combinations: “M2-M3”, “M3-M4”, “M1-M2-M3”, “M2-M3-M4”, “M1-M2-M3-M4” M1 M2 M3 T M4 time (ms) -24 0 -16 -8 +8 Δt

Waveform of four maskers at equally effective levels(target at masked threshold for single masker) M4 T M3 M2 M1

p << 0.05 p << 0.05 p << 0.05 p >> 0.05 Additivity of temporal maskingAverage results over five subjects Error bars: 95% confidence intervals p >> 0.05 Empty symbols: measured data Filled symbols: linear additivity model

Additivity of temporal maskingAverage results over five subjects Error bars: 95% confidence intervals

Summary of temporal masking data(average) • No difference between forward and backward maskers • Amount of masking increases with number of maskers: • 2 maskers vs. 1 masker: + 18 dB (p << 0.05) • 3 maskers vs. 2 maskers: + 5 dB (p << 0.05) • 4 maskers vs. 3 maskers: + 11 dB (p << 0.05) • Amount of excess masking (nonlinear additivity) increases with number of maskers • 2 maskers: 14 dB • 3 maskers: 17 dB • 4 maskers: 26 dB • Results qualitatively consistent with literature data using stimuli with no or little temporal overlap of maskers

Additivity of frequency maskingDesign • Target frequency: 5611 Hz • Four simultaneous maskers with frequency separations: -7, -5, -3, +3 erbs • Maskers nearly equally effective • Amount of masking: 8 dB • Combinations: as for temporal masking M1 M2 M3 T M4 Frequency(erb) -7 0 -5 -3 +3 Δf

Additivity of frequency maskingDesign • Cochlear distortions (combination tones) could be detection cues • Therefore, lowpass-filtered background noise was added • The most critical condition (M3+T) wastested with/without noise on two subjects • No difference in threshold: so finally NO masking noise!

Additivity of frequency maskingAverage results over five subjects Error bars: 95% CI Empty symbols: measured data Filled symbols: linear additivity model

Summary of frequency masking data(average) • Amount of masking depends on maskers involved: • M2-M3 vs. single: 3 dB (p < 0.05) • M3-M4 vs. single: 15 dB (p << 0.05) • M1-M2-M3 vs. M2-M3: 5 dB (p < 0.05) • M2-M3-M4 vs. M3-M4: 0 dB (p > 0.05) • M2-M3-M4 vs. M2-M3: 14 dB (p << 0.05) • M1-M2-M3-M4 vs. M1-M2-M3: 9 dB (p << 0.05) • M1-M2-M3-M4 vs. M2-M3-M4: 0 dB (p > 0.05) • Excess masking (nonlinear additivity) mainly occurring when higher-frequency masker (M4) included • Pairs: 2-3: 0 dB, 3-4:15 dB • Triples: 1-2-3: 5 dB, 2-3-4: 13 dB • Quadruple: 14 dB

Waveform of four maskers at equally effective levels(target at masked threshold for single masker) M2 M3 M1 M4 T Maskers M1,M2, and M3 overlap with each other, but not with M4

Discussion and Conclusions • Strong excess masking for Gaussian maskers if they are physically non-overlapping • Amount of excess masking increases monotonically with number of non-overlapping maskers • Excess masking is thought to be related to the compressivity of BM vibration (e.g. Humes and Jesteadt, 1989) • Thus, our Gaussians seem to be subject to BM compression, even though they are rather short (ERD = 1.7 ms) • This is consistent with the physiological finding that the BM starts to be highly compressive already 0.5 to 0.7 ms after the onset of a signal (Recio et al., 1998)

Modeling of Results Linear Energy Summation Model • Assumption: Masked threshold proportional to masker energy at out put of integrator stage • Combining two equally effective maskers A and B should produce X + 3 dB of masking • Valid for completely overlapping maskers Nonlinear Model • Assumption: Compressive nonlinearity in auditory system is preceding the integrator stage • Combining maskers A and B results in more than linear additivity (excess masking) • Valid for non-overlapping maskers

Modeling of Results • General form: where • MA, B: Amount of masking produced by maskers A or B MAB: Amount of masking produced by the combination of maskers A and B J: Compressive nonlinearity in peripheral auditory processing

Modeling of Results • Power-law model (Lutfi, 1980): • for p = 1: linear model • for p < 1: compressive model MTX: Masked threshold of masker X • Modified Power-law model (Humes et al., 1989): • Threshold in quiet (QT) considered as “internal noise”

Start with Temporal Masking: → perfect masker separation • Power Model: • best fit for p = 0.2 • Mean error: 1.9 dB • Modified power model: • Prediction always too low

Include Correction for Quiet Threshold: -7 dB • Power Model: • Mean error: 1.9 dB • Modified power model: • Mean error: 1.6 dB Why correction required? → Probably, absolute thresholds for Gaussians are no good approximation for internal noise

Spectral Masking: Using same p-value (0.2) and threshold correction • Power Model: • Good fit only for M3M4 (non-overlapping) • Modified power model: • too high predictions Adjustment of parameters required!

p-values optimized for Modified Power model

Some questions • Can we derive appropriate p-values from amount of overlap between maskers? • Can the (modified) power model be included into the Gabor-Multiplier framework to predict time-frequency masking effects for complex signals?

More experiments to test the model

Acknowledgements • We would like to thank • the subjects for their patience • Piotr Majdak for providing support in the development of the software for the experiments • Work partly supported by WTZ (project AMADEUS) and WWTF (project MULAC)

End of talk

p-values optimized for Power model

Time-frequency conditions frequency time

Additivity of auditory masking using Gaussian-shaped tones

Additivity of auditory masking using Gaussian-shaped tones

Presentation Transcript

Recognition of Isolated Instrument Tones

MASKING

Solution of linear equations using Gaussian elimination

TONES, TONIC SYLLABLES TONES UNITS

Low Cost Crowd Counting using Audio Tones

Earth Tones

Enharmonic Tones

Gaussian 03 Calculations using GridNexus

Speaker Identification using Gaussian Mixture Model

RDBE Spurious Tones

Tones

Analysis of Additivity in OLAP Systems

Masking:

Additivity of Heats of Reaction: Hess’s Law

tones

Solution of linear equations using Gaussian elimination

COMBINATION TONES

tones

The Tones of Mandarin Chinese

Additivity of auditory masking using Gaussian-shaped tones

TRAVERTINE TONES

From Auditory Masking to Binary Classification: Machine Learning for Speech Separation