1 / 35

Auditory scene analysis Day 15

Auditory scene analysis Day 15. Music Cognition MUSC 495.02, NSCI 466, NSCI 710.03 Harry Howard Barbara Jazwinski Tulane University. Course administration. Spend provost's money. Goals for today. Statement of the problem. The ball-room problem Helmholtz (1863).

jacinda
Download Presentation

Auditory scene analysis Day 15

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Auditory scene analysis Day 15 Music Cognition MUSC 495.02, NSCI 466, NSCI 710.03 Harry Howard Barbara Jazwinski Tulane University

  2. Course administration • Spend provost's money Music Cognition - Jazwinski & Howard - Tulane University

  3. Goals for today Music Cognition - Jazwinski & Howard - Tulane University

  4. Statement of the problem

  5. The ball-room problemHelmholtz (1863) "In the interior of a ball-room … there are a number of musical instruments in action, speaking men and women, rustling garments, gliding feet, clinking glasses, and so on … a tumbled entanglement [that is] complicated beyond conception. And yet … the ear is able to distinguish all the separate constituent pats of this confused whole." Music Cognition - Jazwinski & Howard - Tulane University

  6. … which is a well-known problem in speech perception • “One of our most important faculties is our ability to listen to, and follow, one speaker in the presence of others. This is such a common experience that we may take it for granted; we may call it ‘the cocktail party problem’…” (Cherry, 1957) • “For ‘cocktail party’-like situations… when all voices are equally loud, speech remains intelligible for normal-hearing listeners even when there are as many as six interfering talkers” (Bronkhorst & Plomp, 1992) Music Cognition - Jazwinski & Howard - Tulane University

  7. What would the analog be in music? • The orchestra problem? Music Cognition - Jazwinski & Howard - Tulane University

  8. additive noise from other sound sources channel distortion reverberationfrom surface reflections Model as sources of intrusion and distortion Music Cognition - Jazwinski & Howard - Tulane University

  9. Some review with new information about computational/mathematical modeling

  10. The auditory periphery A complex mechanism for transducing pressure variations in the air to neural impulses in auditory nerve fibers Music Cognition - Jazwinski & Howard - Tulane University

  11. Traveling wave • Different frequencies of sound give rise to maximum vibrations at different places along the basilar membrane. • The frequency of vibration at a given place is equal to that of the nearest stimulus component (resonance). • Hence, the cochlea performs a frequency analysis. Music Cognition - Jazwinski & Howard - Tulane University

  12. Cochlear filtering model • The gammatone function approximates physiologically-recorded impulse responses • n = filter order (4) • b = bandwidth • f0 = centre frequency • f = phase Music Cognition - Jazwinski & Howard - Tulane University

  13. Gammatone filterbank • Each position on the basilar membrane is simulated by a single gammatone filter with appropriate centre frequency and bandwidth. • A small number of filters (e.g. 32) are generally sufficient to cover the range 50-8 kHz. • Note variation in bandwidth with frequency (unlike Fourier analysis). Music Cognition - Jazwinski & Howard - Tulane University

  14. Response to a pure tone • Many channels respond, but those closest to the target tone frequency respond most strongly (place coding). • The interval between successive peaks also encodes the tone frequency (temporal coding). • Note propagation delay along the membrane model. Music Cognition - Jazwinski & Howard - Tulane University

  15. Spectrogram vs. cochleogram • Spectrogram • Plot of log energy across time and frequency (linear frequency scale) • ‘Cochleogram’ • Cochlear filtering by the gammatone filterbank (or other models of cochlear filtering) • Quasi-logarithmic frequency scale, and filter bandwidth is frequency-dependent • Previous work suggests better resilience to noise than spectrogram • Let’s call it ‘cochleogram’ Music Cognition - Jazwinski & Howard - Tulane University

  16. Beyond the periphery The auditory system (Source: Arbib, 1989) • The auditory system is complex: four relay stations between periphery and cortex rather than one as in the visual system • In comparison to the auditory periphery, central parts of the auditory system are less understood. • Number of neurons in the primary auditory cortex is comparable to that in the primary visual cortex despite the fact that the number of fibers in the auditory nerve is far fewer than that of the optic nerve (thousands vs. millions) The auditory nerve Music Cognition - Jazwinski & Howard - Tulane University

  17. Auditory scene analysis

  18. Auditory scene analysis (ASA) • Listeners are capable of parsing an acoustic scene to form a mental representation of each sound source – a stream – in the perceptual process of auditory scene analysis (Bregman, 1990) • From events to streams • Two conceptual processes of ASA: • Segmentation • Decompose the acoustic mixture into sensory elements (segments) • Grouping • Combine segments into streams, so that segments in the same stream originate from the same source • Two sorts of temporal organization • Simultaneous • Sequential Music Cognition - Jazwinski & Howard - Tulane University

  19. Simultaneous organization • Groups sound components that overlap in time. • Some cues for simultaneous organization • Proximity in frequency (spectral proximity) • Common periodicity • Harmonicity • Fine temporal structure • Common spatial location • Common onset (and to a lesser degree, common offset) • Common temporal modulation • Amplitude modulation • Frequency modulation Music Cognition - Jazwinski & Howard - Tulane University

  20. Sequential organization • Groups sound components across time. • Some cues for sequential organization: • Proximity in time and frequency • Temporal and spectral continuity • Common spatial location; more generally, spatial continuity • Smooth pitch contour • Rhythmic structure • Rhythmic attention theory (Large and Jones, 1999) Music Cognition - Jazwinski & Howard - Tulane University

  21. Two processes for grouping • Primitive grouping (bottom-up) • Innate data-driven mechanisms, consistent with those described by Gestalt psychologists for visual perception (proximity, similarity, common fate, good continuation, etc.) • It is domain-general, and exploits intrinsic structure of environmental sound • Grouping cues described earlier are primitive in nature • Schema-driven grouping (model-based or top-down) • Learned knowledge about speech, music and other environmental sounds. • It is domain-specific, e.g. organization of speech sounds into syllables Music Cognition - Jazwinski & Howard - Tulane University

  22. Organisation in speech: Broadband spectrogram “… pure pleasure … ” continuity onset synchrony offset synchrony common AM harmonicity Music Cognition - Jazwinski & Howard - Tulane University

  23. Organisation in speech: Narrowband spectrogram “… pure pleasure … ” continuity onset synchrony offset synchrony harmonicity Music Cognition - Jazwinski & Howard - Tulane University

  24. CASA system architecture Music Cognition - Jazwinski & Howard - Tulane University

  25. Music cognition Scheirer, E. D. Bregman's chimerae: Music perception as auditory scene analysis.

  26. The goal • “… is to explain the human ability to map incoming acoustic data into emotional, music-theoretical, or other high-level cognitive representations, and to provide evidence from psychological experimentation for these explanations.” Music Cognition - Jazwinski & Howard - Tulane University

  27. A bottom-up model of musical perception and cognition • Boxes contain "facilities" or processes which operate on streams of input and produce streams of output. • Arrows denote these streams and are labeled with a rough indication of the types of information they might contain. • Italicized labels beneath the "music perception" and "music cognition" boxes indicate into which of these categories various musical properties might fall. Music Cognition - Jazwinski & Howard - Tulane University

  28. More explanation • Acoustic events enter the ear as waves of varying sound-pressure level and are processed by the cochlea into streams of band-passed power levels at various frequencies. • The harmonically-related peaks in the time-frequency spectrum specified by the channels of filterbank output are grouped into "notes" or "complex tones" using auditory grouping rules such as continuation, harmonicity, and common onset time. • Properties of these notes such as timbre, pitch, loudness, and perhaps their rhythmic relationships over time, are determined by a low-level "music perception" facility. • Once the properties of the component notes are known, the relationships they bear to each other and to the ongoing flow of time can be analyzed, and higher-level structures such as melodies, chords, and key centers can be constructed. • These high-level descriptions give rise to the final "emotive" content of the listening experience as well as other forms of high-level understanding and modeling, such as recognition, affective response, and the capacity for theoretical analysis. Music Cognition - Jazwinski & Howard - Tulane University

  29. One assumption which bears examination • The explicitly mono-directional flow of data from "low-level" processes to "high-level" processes • that is, that the implication that higher-level cognitive models have little or no impact on the stages of lower-level processing. • We know from existing experimental data that this upward data-flow model is untrue in particular cases. • For example, frequency contours in melodies can lead to a percept of accent structure, which in turn leads to the belief that the accented notes are louder than the unaccented. • Thus, the high-level process of melodic understanding impacts the "lower-level" process of determining the loudness of notes. Music Cognition - Jazwinski & Howard - Tulane University

  30. Another assumption which bears examination • In computer-music research, the process of turning a digital-audio signal into a symbolic representation of the same musical content is termed the transcription problem, and has received much study. • The assumption that "notes" are the fundamental mental representations of all musical perception and cognition requires that there be a transcription facility in the brain to produce them. • This assumption, and especially the implicated requirement, are largely unsupported by experimental evidence. • We have no percept of most of the individual notes which comprise the chords and rhythms in the densely-scored inner sections of a Beethoven symphonic development. • While highly-trained individuals may be able to "hear out" some of the specific pitches and timbres through a difficult process of listening and deduction, this is surely not the way in which the general experience of hearing music unfolds. Music Cognition - Jazwinski & Howard - Tulane University

  31. A "top-down" or "prediction-driven" model of music perception and cognition • Boxes again represent processing facilities; • arrows are unlabeled to indicate less knowledge about the exact types of information being passed from box to box. Music Cognition - Jazwinski & Howard - Tulane University

  32. More explanation • In this model, predictions based on the current musical context are compared against the incoming psychoacoustic cues. • Prediction is dependent on what has been previously heard, and what is known about the musical domain from innate constraints and learned acculturation. • The agreements and/or disagreements between prediction and realization are reconciled and reflected in a new representation of the musical situation. • Note that within this model, the types of representations actually present in a mental taxonomy of musical context are as yet unspecified. Music Cognition - Jazwinski & Howard - Tulane University

  33. Auditory chimera • One element of the internal representation of music which has been somewhat underexamined is called an auditory chimera by Bregman: • [Music often wants] the listener to accept the simultaneous roll of the drum, clash of the cymbal, and brief pulse of noise from the woodwinds as a single coherent event with its own striking properties. • The sound is chimeric in the sense that it does not belong to any single environmental object. [Bregman 1990 p. 460, emphasis added] Music Cognition - Jazwinski & Howard - Tulane University

  34. An example • Again arguing from intuition, it seems likely the majority of the inner-part content of a Beethoven symphony is perceived in exactly this manner. • That is, multiple non-melodic voices are grouped together into a single virtual "orchestral" sound object which has certain properties analogous to "timbre" and "harmonic implication", and which is, crucially, irreducible into perceptually smaller units. • It is the combined and continuing experience of these "chimeric" objects which gives the music its particular quality in the large -- that is, what the music "sounds like" on a global level. • In fact, it seems likely that a good portion of the harmonic and textural impact of a piece of complex music is carried by such objects. Music Cognition - Jazwinski & Howard - Tulane University

  35. Next Monday Prediction in music

More Related