meena ramani 04 07 04 special thanks to dr mark skowronski
Download
Skip this Video
Download Presentation
Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski

Loading in 2 Seconds...

play fullscreen
1 / 42

Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski - PowerPoint PPT Presentation


  • 196 Views
  • Uploaded on

EEL6586 Automatic Speech Processing . Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski. Topics. Anatomy of the Ear and Hearing Auditory perception Hearing aids and Cochlear implants. The Incredible sense of Hearing.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Meena Ramani 04/07/04 Special thanks to Dr. Mark Skowronski' - donar


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
meena ramani 04 07 04 special thanks to dr mark skowronski
EEL6586

Automatic Speech Processing

Meena Ramani

04/07/04

Special thanks to Dr. Mark Skowronski

topics
Topics
  • Anatomy of the Ear and Hearing
  • Auditory perception
  • Hearing aids and Cochlear implants.
the incredible sense of hearing
The Incredible sense of Hearing

“Behind these unprepossessing flaps ... lie structures of such delicacy that they shame the most skillful craftsman"

Stevens, S.S. [Professor of Psychophysics, Harvard University]

why study hearing
Why study hearing?
  • Best example of Speech Recognition
  • Mimic Human Speech Processing
  • Hearing Aids/ Cochlear implants
  • Speech Coding
slide5
The stapes or stirrup is the smallest bone in our body. It is roughly the size of a grain of rice ~2.5mm
  • The movement of the eardrum in response to the minimum audible ## sound is less than the diameter of a hydrogen atom
  • The inner ear has reached its full adult size and shape when the fetus is 20-22 weeks old.
  • Even during sleep the ear continues to function with incredible efficiency
  • The ears are responsible for keeping the body in balance
  • Hearing loss is the number one disability in the world.
  • Percentage of people who loose their hearing at age 19 and over: 76.3%
dynamic range of hearing
Dynamic Range of Hearing
  • The practical dynamic range could be said to be from the threshold of hearing to the threshold of pain
  • Sound level measurements in decibels are generally referenced to a standard threshold of hearing at 1000 Hz for the human ear which can be stated in terms of sound intensity:

Dynamic range is enhanced by an effective amplification structure which extends its low end and by a protective mechanism which extends the high end.

slide7
A

N

A

T

O

M

Y

outer ear
Pinna /AuricleOuter Ear

Auditory Canal

  • Focuses sound waves (variations in pressure) into the ear canal
  • Sound spreads out according to Inverse Square Law
  • A larger pinna captures more of the wave and hence more sound energy.
  • Elephants: Hear Low frequency sound from up to 5 miles away
  • Human Pinna structure: Pointed forward & has a number of curves.
  • More sensitive to sounds in front
  • Dogs/ Cats- Movable Pinna => focus on sounds from a particular direction
outer ear1
Pinna /AuricleOuter Ear

Auditory Canal

  • Horizontal localization

Sound Localization

  • Vertical localization

Is sound on your right or left side?

Interaural Time Difference (ITD)

Interaural Intensity Difference (IID)

Interaural Differences

interaural differences
Interaural differences

The direct path from the acoustic source to the two ears will generally be different

-The signal needs to travel further to more distant ear

-More distant ear partially occluded by the head

Two types of interaural difference will emerge

- Interaural time difference (ITD)

- Interaural intensity difference (IID)

slide11
left

right

Schematic illustration of interaural differences

Left

ear

Right

ear

time

sound

onset

slide12
Schematic illustration of interaural differences

Left

ear

Right

ear

time

sound

onset

arrival time

difference

slide13
ongoing time

difference

Schematic illustration of interaural differences

Left

ear

Right

ear

time

sound

onset

slide14
Schematic illustration of interaural differences

Left

ear

intensity difference

Right

ear

time

sound

onset

slide15
Thresholds

Interaural time differences (ITDs)

  • Threshold ITD  10-20 ms (~ 0.7 cm)

Interaural intensity differences (IIDs)

  • Threshold IID  1 dB
slide16
D

U

P

L

E

X

T

H

E

O

R

Y

Interaural time differences (ITDs) Low frequencies

Ongoing disparities can only be detected for frequencies up to around 1500 Hz

sensitivity declines rapidly above 1000 Hz

  • Auditory system assumes that the smallest phase difference corresponds to the true ITD
  • For frequencies below 700 Hz, this strategy will always give the correct answer

Interaural intensity differences (IIDs)  High Frequencies

The amount of attenuation varies across frequency

  • below 500 Hz, IIDs are negligible (due to diffraction)
  • from 2 – 4 kHz, IIDs of 10 dB occur for sources located at 90º
  • IIDs can reach up to 20 dB at high frequencies
outer ear2
Pinna /AuricleOuter Ear

Auditory Canal

  • Horizontal localization

Sound Localization

  • Vertical localization

Is sound above or below?

Pinna Directional Filtering

  • Pinna amplifies sound above and below differently
  • Curves in structureselective amplifies certain parts of the sound spectrum
sound localization of barn owls and cats
Sound localization of Barn Owls and cats

In a Barn Owl, the left ear left opening is higher than the right - so a sound coming from below the Owl's line of site will reach the right ear first.

  • Hearing sensitivity comparison of Barn Owls, Cats & Humans
  • Both the cat and the Barn Owl have much more sensitive hearing than the human in the range of about 0.5 to 10 kHz.
  • The cat and Barn Owl have a similar sensitivity up to approximately 7 kHz.
  • Beyond this point the Barn Owl's sensitivity declines sharply.
project1 using head related transfer functions to deliver speech for virtual reality applications
Project1:Using Head-Related Transfer Functions to deliver speech for virtual reality applications
  • The simplest spatial audio systems are limited to localizing in azimuth only.
  • To go beyond the limited capabilities of these approaches, we need to use Head-Related Transfer Functions (HRTF's).
  • The impulse response from the source to the ear drum is called the Head-Related Impulse Response (HRIR), and its Fourier transform H(f) is called the Head Related Transfer Function (HRTF)
  • It accounts for diffraction around the head, reflections from the shoulders and most significantly, reflections from the pinnae.
project 2 beamforming and direction of arrival
Frequency Dependent

Frequency Independent

Project 2 Beamforming and Direction of Arrival

Most DOA algorithms apply Eigen Decomposition for the Spatial correlation matrix and noise subspace eg. MUSIC, ESPRIT

More biologically inspired DOA algorithm should do better

outer ear3
Pinna /AuricleOuter Ear

Auditory Canal

  • Auditory canal length 2.7cm
  • Can model the canal as a ¼ wave resonator
  • Resonance frequency ~3Khz
  • Boosts energy between 2-5Khz upto 15dB
  • Correspondingly, the hearing curves show a significant dip in the range 2000-5000 Hz with a peak sensitivity around 3500 -4000 Hz.
  • High sensitivity region at 2-5kHz is very important for the understanding of speech.
slide22
A

N

A

T

O

M

Y

middle ear
EardrumMiddle Ear

Ossicles

Oval window

Functions of Inner Ear

Impedance matching

  • Between vibrations in air and the liquid medium in the inner ear.
  • Acoustic impedance of the fluid is 4000 x that of air. => All but 0.1% would be reflected back.

Stapedius reflex (explained later)

The tympanic membrane or "eardrum" receives vibrations traveling up the auditory canal and transfers them through the tiny ossicles to the oval window.

middle ear1
EardrumMiddle Ear

Ossicles

Oval window

Eardrum MalleusIncusStapesOval Window

Ossicles: 3 bones Malleus (Hammer), Incus (Anvil), Stapes (Stirrup)

  • An amplification by lever action < 3x
  • Area amplification 15x
    • Large area of ear drum ( 55mm2), small area of stapes (3.2 mm2)
    • Increases effective Force/Unit area.

Stapedius Reflex:

Protection against low frequency sounds

Tenses muscles stiffens vibration of Ossicles reduces sound transmitted (20dB)

Reflex is triggered by loud sounds

slide25
A

N

A

T

O

M

Y

inner ear
Semicircular CanalsInner Ear

Cochlea

  • The semicircular canals are the body's balance organs.
  • Hair cells, in the canals, detect movements of the fluid in the canals caused by angular acceleration
  • The canals are connected to the auditory nerve.
inner ear1
Semicircular CanalsInner Ear

Cochlea

The inner ear structure called the cochlea is a snail-shell like structure divided into three fluid-filled parts.

Two are canals (Scala tympani and Scala Vestibuli) for the transmission of pressure and in the third is the sensitive organ of Corti, which detects pressure impulses and responds with electrical impulses which travel along the auditory nerve to the brain.

This mid-modiolar section shows the coiling of the cochlear duct (1) the scala vestibuli (2) and scala tympani (3).The red arrow is from the oval window, the blue arrow points to the round window. Within the modiolus, the spiral ganglion (4) and auditory nerve fibres (5) are seen.

slide28
Semicircular Canals

Inner Ear

Cochlea

  • The organ of Corti can be thought of as the body's microphone.
  • Perception of pitch and perception of loudness is connected with this organ.
  • It is situated on the basilar membrane in the cochlea duct
  • It contains inner hair cells and outer hair cells.
  • There are some 16,000 -20,000 of the hair cells distributed along the basilar membrane.
  • Vibrations of the oval window causes the cochlear fluid to vibrate.
  • This causes the Basilar membrane to vibrate thus producing a traveling wave.
  • This causes the bending of the hair cells which produces generator potentials
  • If large enough will stimulate the fibers of the auditory nerve to produce action potentials
  • The outer hair cells amplify vibrations of the basilar membrane
slide29
The cochlea works as a frequency analyzer

It operates on the incoming sound’s frequencies

Frequency Theory

Place Theory

frequency theory
Frequency Theory

BM vibrates in synchrony with the sound entering the ear, producing action potentials-- in auditory nerve cells -- at the same frequency

(e.g., 50 Hz sound -> 50 APs/sec).

Limitations: max APs/sec = 200 Hz.

Use this theory for Frequencies <100Hz

place theory
Place Theory

4mm2

1mm2

  • High frequency sounds selectively vibrate the BM of the inner ear near the oval window.
  • =>Each position along the BM has a characteristic frequency at which it has maximum vibration.
  • Lower frequencies travel further along the membrane before causing excitation of the membrane.
  • The place along the basilar membrane where maximum excitation of the hair cells occurs determines the perception of pitch

32-35 mm long

At the base, the basilar membrane is stiff and thin (more responsive to high Hz)

At the end or “apex”, the basilar membrane is wide and floppy (more responsive to low Hz)

tuning curves of auditory nerve fibers
Tuning curves of auditory nerve fibers

Tonotopic map on Cochlea: Cells in different spots on the cochlea respond to different frequencies, with high frequencies near the base, and low frequencies near the apex.

  • Method to verify
    • Apply 50ms tone bursts every 100ms
    • Increase sound level until discharge rate increases by 1 spike
    • Repeat for all frequencies

Response curve is a BPF with almost constant Q(=f0/BW)

auditory neuron
Auditory Neuron

The auditory nerve takes electrical impulses from the cochlea and the semicircular canals

Makes connections with both auditory areas of the brain.

Auditory Area of Brain

Information from both ears goes to both sides of the brain - binaural information is present in all of the major relay stations.

----- Left ear information

___ Right ear information

auditory neurons adaptation
Auditory Neurons Adaptation
  • When a stimulus is suddenly applied spike rate of an auditory neuron fiber increases rapidly
  • If the stimulus remains (a steady tone for eg.) the rate decreases exponentially
  • Spontaneous rate: neuron firings in the absence of stimulus.
  • Neuron is more responsive to changes than to steady inputs
perception of sound
Perception of Sound

Threshold of hearing

  • How it is measured
  • Age effects

Equal Loudness curves

Bass loss problem

Critical bands

Frequency Masking

Temporal Masking

threshold of hearing
Threshold of Hearing

Hearing area is the area between the Threshold in quiet and the threshold of pain.

Note:

Shift in threshold of quiet for those who listen to loud music

The sound intensity required to be heard is quite different for different frequencies.

Threshold of hearing at 1000 Hz is nominally taken to be 0 dB.

Marked discrimination against low frequencies so that about 60 dB is required to be heard at 30 Hz.

The maximum sensitivity at about 3500 to 4000 Hz is related to the resonance of the auditory canal.

slide37
Bekesy Tracking
  • Used to measure Threshold in quiet or JNL of a test tone
  • STEPS:
  • Play a tone
  • Vary its amplitude till its audible
  • Then tone’s amplitude is reduced to definitely inaudible and the frequency is slowly changed
  • Then increase the SPL till you can hear and so on.

Whole recording will last atleast 15minutes

Change in level at fine steps <2dB else clicks become audible and act as a cue to listener

threshold in quiet variation with age
Threshold in Quiet variation with age
  • Hearing sensitivity decreases with age especially at High frequencies
  • Note we also loose the sensitivity at 3.5-4Khz
  • Presbycusis: hearing loss because of age
  • Hair cells which process HF are closest to the oval window and are often the first to be damaged.
equal loudness curves
Equal Loudness Curves

Loudness is not simply sound intensity!

Subjective term describing the strength of the ear's perception of a sound.

Have to include the ear's sensitivity to the particular frequencies contained in the sound as in the equal loudness curves.

Sound must be increased in intensity by a factor of ten for the sound to be perceived as twice as loud.

the bass loss problem
The Bass Loss Problem

For very soft sounds, near the threshold of hearing, the ear strongly discriminates against low frequencies.

For mid-range sounds around 60 phons, the discrimination is not so pronounced

For very loud sounds in the neighborhood of 120 phons, the hearing response is more nearly flat.

Eg. Rock music

Too lowno bass

Too hightoo much bass

ohms law of hearing
Ohms law of hearing

The sound quality of a complex tone depends ONLY on the amplitudes

and NOT relative phases of its harmonics.

elephants
Elephants
  • Sound Production
    • A a typical male elephant’s rumble is around an average minimum of 12 Hz, a female's rumble around 13 Hz and a calf's around 22 Hz.
    • Produce sounds ranging over more than 10 octaves, from 5 Hz to over 9,000 Hz
    • Produce very gentle, soft sounds as well as extremely powerful sounds. (112dB recorded a meter away)
  • Hearing
    • Wider tympanic membranes
    • Longer ear canals (20 cm)
    • Spacious middle ears.

Low frequency detection

ad