1 / 60

Source Segregation

Source Segregation. Chris Darwin Experimental Psychology University of Sussex. Need for sound segregation. Ears receive mixture of sounds We hear each sound source as having its own appropriate timbre, pitch, location

phillip
Download Presentation

Source Segregation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Source Segregation Chris Darwin Experimental Psychology University of Sussex

  2. Need for sound segregation • Ears receive mixture of sounds • We hear each sound source as having its own appropriate timbre, pitch, location • Stored information about sounds (eg acoustic/phonetic relations) probably concerns a single source • Need to make single source properties (eg silence) explicit

  3. Making properties explicit • Single-source properties not explicit in input signal • eg silence (Darwin & Bethel-Fox, JEP:HPP 1977) NB experience of yodelling may alter your susceptibility to this effect

  4. Mechanisms of segregation • Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition” • Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down.

  5. Segregation of simple musical sounds • Successive segregation • Different frequency (or pitch) • Different spatial position • Different timbre • Simultaneous segregation • Different onset-time • Irregular spacing in frequency • Location (rather unreliable) • Uncorrelated FM not used

  6. Successive grouping by frequency Bugandan xylophone music: “Ssematimba ne Kikwabanga” Track 7 Track 8

  7. Not peripheral channelling Streaming occurs for sounds • with same auditory excitation pattern, but different periodicities Vliegen, J. and Oxenham, A. J. (1999). "Sequential stream segregation in the absence of spectral cues," J. Acoust. Soc. Am. 105, 339-46. • with Huggins pitch sounds that are only defined binaurally Carlyon & Akeroyd

  8. "a faint tone" Noise Frequency Time 2π Interaural phase difference 0 500 Hz Frequency Huggins pitch ∆ø

  9. Successive groupingby spatial separation Track 41

  10. Sach & Bailey - rhythm unmasking by ITD or spatial position ? Masker Target • ITD=0, ILD = 0 Target • ITD=0, ILD = +4 dB ITD sufficient but, sequential segregation by spatial position rather than by ITD alone.

  11. Build-up of segregation Horse Morse -LHL-LHL-LHL- --> --H---H---H-- -L-L-L-L-L-L-L • Segregation takes a few seconds to build up. • Then between-stream temporal / rhythmic judgments are very difficult

  12. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process.

  13. Attention necessary for build-up of streaming (Carlyon et al, JEP:HPP 2000) Horse Morse -LHL-LHL-LHL- --> --H---H---H-- -L-L-L-L-L-L-L • Horse -> Morse takes a few seconds to segregate • These have to be seconds spent attending to the tone stream • Does this also apply to other types of segregation?

  14. Capturing a component from a mixture by frequency proximity A-B A-BC Freq separation of AB Harmonicity & synchrony of BC

  15. Simultaneous grouping • What is the timbre / pitch / location of a particular sound source ? • Important grouping cues • continuity • onset time • harmonicity (or regularity of frequency spacing) (Old + New)

  16. Bregman’s Old + New principle • Stimulus: A followed by A+B • -> Percept of: • A as continuous (or repeated) • with B added as separate percept

  17. A A A A M M M M M M M M M B B B B Old+New Heuristic A MAMB B MAMB

  18. Percept M

  19. Grouping & vowel quality frequency time

  20. Grouping & vowel quality (2) continuation not removed from vowel continuation removed from vowel captor frequency frequency time time frequency frequency time time + + frequency frequency time time

  21. Onset-time:allocation is subtractive not exclusive • Bregman’s Old-plus-New heuristic • Indicates importance of coding change.

  22. 490 480 470 460 450 440 0 80 160 240 320 Asynchrony & vowel quality 8 subjects No 500 Hz component F1 boundary (Hz) T 90 ms Onset Asynchrony T (ms)

  23. 1 vowel 0.8 complex 0.6 0.4 0.2 0 -0.2 0 1 2 3 5 8 Mistuning & pitch 90 ms Mean pitch shift (Hz) 8 subjects % Mistuning of 4th Harmonic

  24. Onset asynchrony & pitch 1 vowel 0.8 complex 0.6 0.4 0.2 0 -0.2 0 80 160 240 320 ±3% mistuning 8 subjects Mean pitch shift (Hz) T 90 ms Onset Asynchrony T (ms)

  25. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process. • Parametric behaviour of grouping depends on what it is for.

  26. Grouping for Effectiveness of a parameter on grouping depends on the task. Eg 10-ms onset time allows a harmonic to be heard out 40-ms onset-time needed to remove from vowel quality >100-ms needed to remove it from pitch.

  27. c. 10 ms Harmonic in vowel to be heard out: 40 ms Harmonic to be removed from vowel: 200 ms Harmonic to be removed from pitch: Minimum onset needed for:

  28. Grouping not absolute and independentof classification classify group

  29. Apparent continuity If B would have masked if it HAD been there, then you don’t notice that it is not there. Track 28

  30. Enharmonic Harmonic Continuity & grouping 1. Pulsing complex Pulsing high tone Steady low tone Group tones; then decide on continuity.

  31. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process. • Parametric behaviour of grouping depends on what it is for. • Not everything that is obvious on an auditory spectrogram can be used : • FM of Fo irrelevant for segregation (Carlyon, JASA 1991; Summerfield & Culling 1992)

  32. Harm Inharmonic 2500 2000 1500 Easy 2500 2100 1500 Impossible frequency 1 2 3 Carlyon: across-frequency FM coherence 5 Hz, 2.5% FM Odd-one in 2 or 3 ? Carlyon, R. P. (1991). "Discriminating between coherent and incoherent frequency modulation of complex tones," J. Acoust. Soc. Am. 89, 329-340.

  33. Role of localisation cues What role do localisation cues play in helping us to hear one voice in the presence of another ? • Head shadow increases S/N at the nearer ear (Bronkhurst & Plomp, 1988). • … but this advantage is reduced if high frequencies inaudible (B & P, 1989) • But do localisation cues also contribute to selectively grouping different sound sources?

  34. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process. • Parametric behaviour of grouping depends on what it is for. • Not everything that is obvious on an auditory spectrogram can be used : • FM of Fo irrelevant for segregation (Carlyon, JASA 1991; Summerfield & Culling 1992) • Although we can group sounds by ear, ITDs by themselves remarkably useless for simultaneous grouping. Group first then localise grouped object.

  35. Separating two simultaneous sound sources • Noise bands played to different ears group by ear, but... • Noise bands differing in ITD do not group by ear

  36. EE AR ear ITD AR EE AR EE delay ER OO Segregation by ear but not by ITD(Culling & Summerfield 1995) Task - what vowel is on your left ? (“ee”)

  37. Two models of attention

  38. Phase Ambiguity 500 Hz: period = 2ms 500-Hz pure tone leading in Right ear by 1.5 ms Heard on Left side R leads by 1.5 ms L leads by 0.5 ms L R L cross-correlation peaks at +0.5ms and -1.5ms auditory system weighted toone closest to zero

  39. Disambiguating phase-ambiguity • Narrowband noise at 500 Hz with ITD of 1.5 ms (3/4 cycle) heard at lagging side. • Increasing noise bandwidth changes location to the leading side. • Explained by across-frequency consistency of ITD. • (Jeffress, Trahiotis & Stern)

  40. Cross-correlation peaks for noise delayed in one ear by 1.5 ms Resolving phase ambiguity Left ear actually lags by 1.5 ms 500 Hz: period = 2ms 300 Hz: period = 3.3ms L R R L R R L lags by 1.5 ms or L leads by 0.5 ms ? L lags by 1.5 ms or L leads by 1.8 ms ? 800 Actual delay 600 Frequency of auditory filter Hz 400 200 -2.5 -0.5 1.5 3.5 Delay of cross-correlator ms

  41. Segregation by onset-time Synchronous Asynchronous 800 600 Frequency (Hz) 400 200 0 400 0 80 400 Duration (ms) Duration (ms) ITD: ± 1.5 ms (3/4 cycle at 500 Hz)

  42. Segregated tone changes location 20 0 Pointer IID (dB) R L Complex Pure -20 0 20 40 80 Onset Asynchrony (ms)

  43. Segregation by mistuning In tune Mistuned 800 600 Frequency (Hz) 400 200 0 400 0 80 400 Duration (ms) Duration (ms)

  44. Mistuned tone changes location

  45. Mechanisms of segregation • Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition” • Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down.

  46. Hierarchy of sound sources ? • Orchestra • 1° Violin section • Leader • Chord • Lowest note • Attack • 2° violins… Corresponding hierarchy of constraints ?

  47. Is speech a single sound source ? • Multiple sources of sound: • Vocal folds vibrating • Aspiration • Frication • Burst explosion • Clicks Nama: Baboon's arse

  48. Tuvan throat music

  49. Tuvan throat music

More Related