Outline. Audio SignalsSamplingQuantizationAudio file formatWAV/MIDIHuman auditory system . What is Sound ?. Sound is a wave phenomenon, involving molecules of air being compressed and expanded under the action of some physical device.A speaker in an audio system vibrates back and forth and pr
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
2. Outline Audio Signals
Audio file format
Human auditory system
3. What is Sound ? Sound is a wave phenomenon, involving molecules of air being compressed and expanded under the action of some physical device.
A speaker in an audio system vibrates back and forth and produces a longitudinal pressure wave that we perceive as sound.
Since sound is a pressure wave, it takes on continuous values, as opposed to digitized ones.
If we wish to use a digital version of sound waves, we must form digitized representations of audio information.
4. Digitization Digitization means conversion to a stream of numbers, and preferably these numbers should be integers for efficiency.
1-dimensional nature of sound: amplitude values (sound pressure/level) depend on a 1D variable, time.
5. Digitization cont’d Digitization must be in both time and amplitude
Sampling: measuring the quantity we are interested in, usually at evenly-spaced intervals
The first kind of sampling, using measurements only at evenly spaced time intervals, is simply called sampling. The rate at which it is performed is called the sampling frequency
For audio, typical sampling rates are from 8 kHz (8,000 samples per second) to 48 kHz. This range is determined by Nyquist theorem discussed later.
Sampling in the amplitude or voltage dimension is called quantization
6. Sampling and Quantization
7. Audio Digitization (PCM)
8. Parameters in Digitizing To decide how to digitize audio data we need to answer the following questions:
1. What is the sampling rate?
2. How finely is the data to be quantized, and is quantization uniform?
3. How is audio data formatted? (file format)
9. Sampling Rate Signals can be decomposed into a sum of sinusoids.
-- weighted sinusoids can build up quite a complex signals
10. Sampling Rate cont’d If sampling rate just equals the actual frequency
a false signal (constant ) is detected
If sample at 1.5 times the actual frequency
an incorrect (alias) frequency that is lower than the correct one
it is half the correct one -- the wavelength, from peak to peak, is double that of the actual signal
11. Nyquist Theorem For correct sampling we must use a sampling rate equal to at least twice the maximum frequency content in the signal. This rate is called the Nyquist rate.
Sampling theory – Nyquist theorem
If a signal is band(frequnecy)-limited, i.e., there is a lower limit f1 and an upper limit f2 of frequency components in the signal, then the sampling rate should be at least 2(f2 - f1).
Proof and more math: http://en.wikipedia.org/wiki/Nyquist-Shannon_sampling_theorem
12. Quantization (Pulse Code Modulation) At every time interval the sound is converted to a digital equivalent
Using 2 bits the following sound can be digitized
Tel: 8 bits
CD: 16 bits
13. Digitize audio Each sample quantized, i.e., rounded
e.g., 28=256 possible quantized values
Each quantized value represented by bits
8 bits for 256 values Example: 8,000 samples/sec, 256 quantized values --> 64,000 bps
Receiver converts it back to analog signal:
some quality reduction
CD: 1.411 Mbps
MP3: 96, 128, 160 kbps
Internet telephony: 5.3 - 13 kbps
14. Audio Quality vs. Data Rate
15. More on Quantization Quantization is lossy !
Roundoff errors => quantization noise/error
16. Quantization Noise Quantization noise: the difference between the actual value of the analog signal, for the particular sampling time, and the nearest quantization interval value.
At most, this error can be as much as half of the interval.
The quality of the quantization is characterized by the Signal to Quantization Noise Ratio (SQNR).
A special case of SNR (Signal to Noise Ratio)
17. Signal to Noise Ratio (SNR) Signal to Noise Ratio (SNR): the ratio of the power of the correct signal and the noise
A common measure of the quality of the signal.
SNR is usually measured in decibels (dB), where 1 dB is 1/10 Bel. The SNR value, in units of dB, is defined in terms of base-10 logarithms of squared voltages, as follows:
18. Signal to Noise Ratio (SNR) cont’d
The actual power in a signal is proportional to the square of the voltage. For example, if the signal voltage Vsignal is 10 times the noise, then the SNR is 20 log10(10)=20dB.
if the power from ten violins is ten times that from one violin playing, then the ratio of power is 10dB, or 1B.
19. Common sound levels
20. Quantization Noise Ratio (SQNR) Revisit For a quantization accuracy of N bits per sample, the peak SQNR can be simply expressed:
6.02N is the worst case.
If the input signal is sinusoidal, the quantization error is statistically independent, and its magnitude is uniformly distributed between 0 and half of the interval, then it can be shown that the expression for the SQNR becomes:
21. Outline Audio Signals
Audio file format
Human auditory system
22. Audio File Format: .WAV Microsoft format: Interleaved multi-channel samples
24. Audio File Format: MIDI MIDI: Musical Instrument Digital Interface
A simple scripting language and hardware setup
MIDI codes “events" that stand for the production of sounds. E.g., a MIDI event might include values for the pitch of a single note, its duration, and its volume.
MIDI is a standard adopted by the electronic music industry for controlling devices, such as synthesizers and sound cards, that produce music.
Supported by most sound cards
25. Outline Audio Signals
Audio file format
Human auditory system
26. Computer vs. Ear Multimedia signals are interpreted by humans!
Need to understand human perception
Almost all original multimedia signals are analog signals:
A/D conversion is needed for computer processing
27. Properties of HAS: Human Auditory System Range of human’ hearing: 20Hz - 20kHz
? Minimal sampling rate for music: 40 kHz (Nyquist frequency)
44.1 kHz sampling rate
each sample is represented by a 16-bit signed integer
2 channels are used to create stereo system
44100 * 16 * 2 = 1,411,200 bits / second (bps)
Speech signal: 300 Hz – 4 KHz
? Minimum sampling rate is 8 KHz (as in telephone system)
28. Properties of Human Auditory System Hearing threshold varies dramatically at different frequencies
Most sensitive around 2KHz
29. Properties of Human Auditory System Critical Bands:
Our brains perceive the sounds through 25 distinct critical bands, the bandwidth grows logarithmically with frequency.
At 100Hz, the bandwidth is about 160Hz;
At 10kHz it is about 2.5kHz in width.
30. Properties of Human Auditory System Masking effect:
what we hear depends on what audio environment we are in
One strong signal can overwhelm/ hide another
31. Properties of Human Auditory System Masking thresholds in the time domain:
32. HAS: Audio Filtering Prior to sampling and AD (Analog-to-Digital) conversion, the audio signal is also usually filtered to remove unwanted frequencies.
For speech, typically from 50Hz to 10kHz is retained, and other frequencies are blocked by the use of a band-pass filter that screens out lower and higher frequencies
An audio music signal will typically contain from about 20Hz up to 20kHz
At the DA converter end, high frequencies may reappear in the output (Why ?)
because of sampling and then quantization, smooth input signal is replaced by a series of step functions containing all possible frequencies
So at the decoder side, a lowpass filter is used after the DA circuit
33. HAS: Perceptual audio coding The HAS properties can be exploited in audio coding:
Different quantizations for different critical bands
If you can’t hear the sound, don’t encode it
Discard weaker signal if a stronger one exists in the same band (frequency-domain masking)
Discard soft sound after a loud sound (time-domain masking)
Stereo redundancy: At low frequencies, we can’t detect where the sound is coming from. Encode it mono.
More on later (MP3, APE…)
34. Further Exploration Links for Chapter 6 in “Further Exploration” of the textbook page
An extensive list of audio file formats.
CD audio file formats are somewhat different. The main music format is called “red book audio.“ A good description of various CD formats is on the website.
A General MIDI Instrument Patch Map, along with a General MIDI Percussion Key Map.
A link to good tutorial on MIDI and wave table music synthesis.
A link to a java program for decoding MIDI streams.
A good multimedia/sound page, including a source for locating Internet sound/music materials.