source segregation l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Source Segregation PowerPoint Presentation
Download Presentation
Source Segregation

Loading in 2 Seconds...

play fullscreen
1 / 60

Source Segregation - PowerPoint PPT Presentation


  • 224 Views
  • Updated on

Source Segregation. Chris Darwin Experimental Psychology University of Sussex. Need for sound segregation. Ears receive mixture of sounds We hear each sound source as having its own appropriate timbre, pitch, location

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

Source Segregation


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
    Presentation Transcript
    1. Source Segregation Chris Darwin Experimental Psychology University of Sussex

    2. Need for sound segregation • Ears receive mixture of sounds • We hear each sound source as having its own appropriate timbre, pitch, location • Stored information about sounds (eg acoustic/phonetic relations) probably concerns a single source • Need to make single source properties (eg silence) explicit

    3. Making properties explicit • Single-source properties not explicit in input signal • eg silence (Darwin & Bethel-Fox, JEP:HPP 1977) NB experience of yodelling may alter your susceptibility to this effect

    4. Mechanisms of segregation • Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition” • Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down.

    5. Segregation of simple musical sounds • Successive segregation • Different frequency (or pitch) • Different spatial position • Different timbre • Simultaneous segregation • Different onset-time • Irregular spacing in frequency • Location (rather unreliable) • Uncorrelated FM not used

    6. Successive grouping by frequency Bugandan xylophone music: “Ssematimba ne Kikwabanga” Track 7 Track 8

    7. Not peripheral channelling Streaming occurs for sounds • with same auditory excitation pattern, but different periodicities Vliegen, J. and Oxenham, A. J. (1999). "Sequential stream segregation in the absence of spectral cues," J. Acoust. Soc. Am. 105, 339-46. • with Huggins pitch sounds that are only defined binaurally Carlyon & Akeroyd

    8. "a faint tone" Noise Frequency Time 2π Interaural phase difference 0 500 Hz Frequency Huggins pitch ∆ø

    9. Successive groupingby spatial separation Track 41

    10. Sach & Bailey - rhythm unmasking by ITD or spatial position ? Masker Target • ITD=0, ILD = 0 Target • ITD=0, ILD = +4 dB ITD sufficient but, sequential segregation by spatial position rather than by ITD alone.

    11. Build-up of segregation Horse Morse -LHL-LHL-LHL- --> --H---H---H-- -L-L-L-L-L-L-L • Segregation takes a few seconds to build up. • Then between-stream temporal / rhythmic judgments are very difficult

    12. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process.

    13. Attention necessary for build-up of streaming (Carlyon et al, JEP:HPP 2000) Horse Morse -LHL-LHL-LHL- --> --H---H---H-- -L-L-L-L-L-L-L • Horse -> Morse takes a few seconds to segregate • These have to be seconds spent attending to the tone stream • Does this also apply to other types of segregation?

    14. Capturing a component from a mixture by frequency proximity A-B A-BC Freq separation of AB Harmonicity & synchrony of BC

    15. Simultaneous grouping • What is the timbre / pitch / location of a particular sound source ? • Important grouping cues • continuity • onset time • harmonicity (or regularity of frequency spacing) (Old + New)

    16. Bregman’s Old + New principle • Stimulus: A followed by A+B • -> Percept of: • A as continuous (or repeated) • with B added as separate percept

    17. A A A A M M M M M M M M M B B B B Old+New Heuristic A MAMB B MAMB

    18. Percept M

    19. Grouping & vowel quality frequency time

    20. Grouping & vowel quality (2) continuation not removed from vowel continuation removed from vowel captor frequency frequency time time frequency frequency time time + + frequency frequency time time

    21. Onset-time:allocation is subtractive not exclusive • Bregman’s Old-plus-New heuristic • Indicates importance of coding change.

    22. 490 480 470 460 450 440 0 80 160 240 320 Asynchrony & vowel quality 8 subjects No 500 Hz component F1 boundary (Hz) T 90 ms Onset Asynchrony T (ms)

    23. 1 vowel 0.8 complex 0.6 0.4 0.2 0 -0.2 0 1 2 3 5 8 Mistuning & pitch 90 ms Mean pitch shift (Hz) 8 subjects % Mistuning of 4th Harmonic

    24. Onset asynchrony & pitch 1 vowel 0.8 complex 0.6 0.4 0.2 0 -0.2 0 80 160 240 320 ±3% mistuning 8 subjects Mean pitch shift (Hz) T 90 ms Onset Asynchrony T (ms)

    25. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process. • Parametric behaviour of grouping depends on what it is for.

    26. Grouping for Effectiveness of a parameter on grouping depends on the task. Eg 10-ms onset time allows a harmonic to be heard out 40-ms onset-time needed to remove from vowel quality >100-ms needed to remove it from pitch.

    27. c. 10 ms Harmonic in vowel to be heard out: 40 ms Harmonic to be removed from vowel: 200 ms Harmonic to be removed from pitch: Minimum onset needed for:

    28. Grouping not absolute and independentof classification classify group

    29. Apparent continuity If B would have masked if it HAD been there, then you don’t notice that it is not there. Track 28

    30. Enharmonic Harmonic Continuity & grouping 1. Pulsing complex Pulsing high tone Steady low tone Group tones; then decide on continuity.

    31. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process. • Parametric behaviour of grouping depends on what it is for. • Not everything that is obvious on an auditory spectrogram can be used : • FM of Fo irrelevant for segregation (Carlyon, JASA 1991; Summerfield & Culling 1992)

    32. Harm Inharmonic 2500 2000 1500 Easy 2500 2100 1500 Impossible frequency 1 2 3 Carlyon: across-frequency FM coherence 5 Hz, 2.5% FM Odd-one in 2 or 3 ? Carlyon, R. P. (1991). "Discriminating between coherent and incoherent frequency modulation of complex tones," J. Acoust. Soc. Am. 89, 329-340.

    33. Role of localisation cues What role do localisation cues play in helping us to hear one voice in the presence of another ? • Head shadow increases S/N at the nearer ear (Bronkhurst & Plomp, 1988). • … but this advantage is reduced if high frequencies inaudible (B & P, 1989) • But do localisation cues also contribute to selectively grouping different sound sources?

    34. Some interesting points: • Sequential streaming may require attention - rather than being a pre-attentive process. • Parametric behaviour of grouping depends on what it is for. • Not everything that is obvious on an auditory spectrogram can be used : • FM of Fo irrelevant for segregation (Carlyon, JASA 1991; Summerfield & Culling 1992) • Although we can group sounds by ear, ITDs by themselves remarkably useless for simultaneous grouping. Group first then localise grouped object.

    35. Separating two simultaneous sound sources • Noise bands played to different ears group by ear, but... • Noise bands differing in ITD do not group by ear

    36. EE AR ear ITD AR EE AR EE delay ER OO Segregation by ear but not by ITD(Culling & Summerfield 1995) Task - what vowel is on your left ? (“ee”)

    37. Two models of attention

    38. Phase Ambiguity 500 Hz: period = 2ms 500-Hz pure tone leading in Right ear by 1.5 ms Heard on Left side R leads by 1.5 ms L leads by 0.5 ms L R L cross-correlation peaks at +0.5ms and -1.5ms auditory system weighted toone closest to zero

    39. Disambiguating phase-ambiguity • Narrowband noise at 500 Hz with ITD of 1.5 ms (3/4 cycle) heard at lagging side. • Increasing noise bandwidth changes location to the leading side. • Explained by across-frequency consistency of ITD. • (Jeffress, Trahiotis & Stern)

    40. Cross-correlation peaks for noise delayed in one ear by 1.5 ms Resolving phase ambiguity Left ear actually lags by 1.5 ms 500 Hz: period = 2ms 300 Hz: period = 3.3ms L R R L R R L lags by 1.5 ms or L leads by 0.5 ms ? L lags by 1.5 ms or L leads by 1.8 ms ? 800 Actual delay 600 Frequency of auditory filter Hz 400 200 -2.5 -0.5 1.5 3.5 Delay of cross-correlator ms

    41. Segregation by onset-time Synchronous Asynchronous 800 600 Frequency (Hz) 400 200 0 400 0 80 400 Duration (ms) Duration (ms) ITD: ± 1.5 ms (3/4 cycle at 500 Hz)

    42. Segregated tone changes location 20 0 Pointer IID (dB) R L Complex Pure -20 0 20 40 80 Onset Asynchrony (ms)

    43. Segregation by mistuning In tune Mistuned 800 600 Frequency (Hz) 400 200 0 400 0 80 400 Duration (ms) Duration (ms)

    44. Mistuned tone changes location

    45. Mechanisms of segregation • Primitive grouping mechanisms based on general heuristics such as harmonicity and onset-time - “bottom-up” / “pure audition” • Schema-based mechanisms based on specific knowledge (general speech constraints?) - “top-down.

    46. Hierarchy of sound sources ? • Orchestra • 1° Violin section • Leader • Chord • Lowest note • Attack • 2° violins… Corresponding hierarchy of constraints ?

    47. Is speech a single sound source ? • Multiple sources of sound: • Vocal folds vibrating • Aspiration • Frication • Burst explosion • Clicks Nama: Baboon's arse

    48. Tuvan throat music

    49. Tuvan throat music