EEL6586 Automatic Speech Processing . Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski. Topics. Anatomy of the Ear and Hearing Auditory perception Hearing aids and Cochlear implants. The Incredible sense of Hearing.
Automatic Speech Processing
Special thanks to Dr. Mark Skowronski
“Behind these unprepossessing flaps ... lie structures of such delicacy that they shame the most skillful craftsman"
Stevens, S.S. [Professor of Psychophysics, Harvard University]
Dynamic range is enhanced by an effective amplification structure which extends its low end and by a protective mechanism which extends the high end.
Is sound on your right or left side?
Interaural Time Difference (ITD)
Interaural Intensity Difference (IID)
The direct path from the acoustic source to the two ears will generally be different
-The signal needs to travel further to more distant ear
-More distant ear partially occluded by the head
Two types of interaural difference will emerge
- Interaural time difference (ITD)
- Interaural intensity difference (IID)
Schematic illustration of interaural differences
Schematic illustration of interaural differences
Interaural time differences (ITDs)
Interaural intensity differences (IIDs)
Interaural time differences (ITDs) Low frequencies
Ongoing disparities can only be detected for frequencies up to around 1500 Hz
sensitivity declines rapidly above 1000 Hz
Interaural intensity differences (IIDs) High Frequencies
The amount of attenuation varies across frequency
Is sound above or below?
Pinna Directional Filtering
In a Barn Owl, the left ear left opening is higher than the right - so a sound coming from below the Owl's line of site will reach the right ear first.
Frequency IndependentProject 2 Beamforming and Direction of Arrival
Most DOA algorithms apply Eigen Decomposition for the Spatial correlation matrix and noise subspace eg. MUSIC, ESPRIT
More biologically inspired DOA algorithm should do better
Functions of Inner Ear
Stapedius reflex (explained later)
The tympanic membrane or "eardrum" receives vibrations traveling up the auditory canal and transfers them through the tiny ossicles to the oval window.
Eardrum MalleusIncusStapesOval Window
Ossicles: 3 bones Malleus (Hammer), Incus (Anvil), Stapes (Stirrup)
Protection against low frequency sounds
Tenses muscles stiffens vibration of Ossicles reduces sound transmitted (20dB)
Reflex is triggered by loud sounds
The inner ear structure called the cochlea is a snail-shell like structure divided into three fluid-filled parts.
Two are canals (Scala tympani and Scala Vestibuli) for the transmission of pressure and in the third is the sensitive organ of Corti, which detects pressure impulses and responds with electrical impulses which travel along the auditory nerve to the brain.
This mid-modiolar section shows the coiling of the cochlear duct (1) the scala vestibuli (2) and scala tympani (3).The red arrow is from the oval window, the blue arrow points to the round window. Within the modiolus, the spiral ganglion (4) and auditory nerve fibres (5) are seen.
It operates on the incoming sound’s frequencies
BM vibrates in synchrony with the sound entering the ear, producing action potentials-- in auditory nerve cells -- at the same frequency
(e.g., 50 Hz sound -> 50 APs/sec).
Limitations: max APs/sec = 200 Hz.
Use this theory for Frequencies <100Hz
32-35 mm long
At the base, the basilar membrane is stiff and thin (more responsive to high Hz)
At the end or “apex”, the basilar membrane is wide and floppy (more responsive to low Hz)
Tonotopic map on Cochlea: Cells in different spots on the cochlea respond to different frequencies, with high frequencies near the base, and low frequencies near the apex.
Response curve is a BPF with almost constant Q(=f0/BW)
The auditory nerve takes electrical impulses from the cochlea and the semicircular canals
Makes connections with both auditory areas of the brain.
Auditory Area of Brain
Information from both ears goes to both sides of the brain - binaural information is present in all of the major relay stations.
----- Left ear information
___ Right ear information
Threshold of hearing
Equal Loudness curves
Bass loss problem
Hearing area is the area between the Threshold in quiet and the threshold of pain.
Shift in threshold of quiet for those who listen to loud music
The sound intensity required to be heard is quite different for different frequencies.
Threshold of hearing at 1000 Hz is nominally taken to be 0 dB.
Marked discrimination against low frequencies so that about 60 dB is required to be heard at 30 Hz.
The maximum sensitivity at about 3500 to 4000 Hz is related to the resonance of the auditory canal.
Whole recording will last atleast 15minutes
Change in level at fine steps <2dB else clicks become audible and act as a cue to listener
Loudness is not simply sound intensity!
Subjective term describing the strength of the ear's perception of a sound.
Have to include the ear's sensitivity to the particular frequencies contained in the sound as in the equal loudness curves.
Sound must be increased in intensity by a factor of ten for the sound to be perceived as twice as loud.
For very soft sounds, near the threshold of hearing, the ear strongly discriminates against low frequencies.
For mid-range sounds around 60 phons, the discrimination is not so pronounced
For very loud sounds in the neighborhood of 120 phons, the hearing response is more nearly flat.
Eg. Rock music
Too lowno bass
Too hightoo much bass
The sound quality of a complex tone depends ONLY on the amplitudes
and NOT relative phases of its harmonics.
Low frequency detection