Multimedia Systems Lecture3 – Digital Audio Representation

Multimedia SystemsLecture3 – Digital Audio Representation

What Is Sound? • Sound is the brain's interpretation of electrical impulses being • sent by the inner ear through the nervous system. There are some • sounds the human ear cannot perceive—those which have a very • high or low frequency. • You can use sound in a multimedia project in two ways. In fact, • all sounds fall into two broad categories: • Content sound: provides information to audiences, for example, dialogs in movies or theater. • Ambient sound: consists of an array of background and sound effects.

How Do We Hear? • When an object moves back and forth (vibrates), it pushes the air • immediately next to it a bit to one side and, when coming back, • creates a slight vacuum. This process of oscillation creates a wave. • You will find that your voice sounds different in a tape recording • than you sound to yourself. This is because sound waves inside your • body travel through the bones, cartilage, and muscles between your • voice box and your inner ear. Sounds from tape recorders (and other • people) travel through the air and reach your eardrum, and thus • sound different.

Characteristics of Sound • Amplitude • Wavelength (w) • Frequency ( ) • Timbre • Hearing: [20Hz – 20KHz] • Speech: [200Hz – 8KHz]

Doppler Effect • Why does the horn of an approaching car sound high-pitched • when it is coming close to you, yet suddenly becomes low when • it moves away? • As a car and its horn move toward you, the pushes of sound—the sound waves—get crammed together, which makes them higher pitched. On the other hand, when the car and the horn move away from you, the sound waves are spread out further apart. That makes a lower pitched sound. • This is called the Doppler effect.

Discrete vs. Continuous form • All multimedia elements have to be in digital format. In • contrast, other multiple media such as TV programs & • Films are analog in nature. • The line made on computer screen is discrete. Since the • pixels on screen are very closed to each other, our eyes • cannot tell the difference and we perceive a continuous • line. • The plants and trees that we see around us are continuous, but their • digital pictures are forced to be discrete. Nevertheless, we have seen • that if we include enough data in our digital representation, our eyes • cannot tell the difference.

Digital Audio • The sound heard by the ear (also called audio) is • analog in nature and is a continuous waveform. • Acoustic instruments produce analog sounds. • A computer needs to transfer the analog sound wave into • its digital representation, consisting of discrete numbers.

Digital Representation of Audio • Must convert wave form to digital: • Sampling • Quantization • Compress

1- Sampling • Sampling is the reduction of a continuous signal to a discrete signal. • A sample refers to a value or set of values at a point in time • and/or space. • A sampler is a subsystem or operation that extracts samples from • a continuous signal. • A common example is the conversion of a sound wave (a • continuous signal) to a sequence of samples (a discrete-time • signal).

Sampling - Example Signal sampling representation. The continuous signal is represented with a green color whereas the discrete samples are in blue.

Sampling Rate • When sampling a sound, the computer processes snapshots of the • waveform. The frequency of these snapshots is called the • sampling rate. The rate can vary typically from 5000-90,000 • samples per second. • Example. Your mother is scolding you for breaking her precious • vase . Your sister hears only bits of the conversation because she is • not interested in the matter. Later you ask your sister if the scolding • was justified and your sister replies that she did not listen to the • whole conversation. This is because she sampled the voices at a very • wide range.

Digitization • Digitization is the process of assigning a discrete value to each • of the sampled values. It is performed by an Integrated Chip • (IC) called an A to D Converter. In the case of 8-bit digitization, • this value is between 0 and 255. In 16-bit digitization, this • value is between 0 and 65,535. • The process of digitization introduces noise in a signal. This is • related to the number of bits per sample. A higher number of bits • used to store the sampled value leads to a more accurate sample, • with less noise.

Nyquist Sampling Theorem • For lossless digitization, the sampling rate should be at least twice the maximum frequency response, to insure that no information is lost. • In mathematical terms: fs > 2*fmwhere fsis sampling frequency and fm is the maximum frequency in the signal. • Thus, to represent a sound with a frequency of 440 Hz, • it is necessary to sample that sound at a minimum rate of 880 • samples per second. Therefore, Sampling rate = 2 x Highest • frequency.

Nyquist Limit • max data rate = 2 H log2V bits/second, where H = bandwidth (in Hz) V = discrete levels (bits per signal change) • Shows the maximum number of bits that can be sent per second on a noiseless channel with a bandwidth of H, if V bits are sent per signal • Example: what is the maximum data rate for a 3kHz channel that transmits data using 2 levels (binary) ? • (2*3,000*ln(2)/ln(2)=6,000bits/second)

Limited Sampling • But what if one cannot sample fast enough? • Reduce signal frequency to half of maximum sampling frequency • low-pass filter removes higher-frequencies • e.g., if max sampling frequency is 22kHz, must low-pass filter a signal down to 11kHz

Sampling Ranges • Auditory range (human hearing) 20Hz to 22.05 kHz • must sample up to 44.1kHz to ensure that the information is not lost. • Speech frequency [200 Hz, 8 kHz] • sample up to 16 kHz • but typically 4 kHz to 11 kHz is used

2- Quantization • Quantization is the process of approximating a continuous range • of values (or a very large set of possible discrete values) by a • relatively small set of discrete symbols or integer values. • Quantization of an audio sample means allocation of number of • bits to the height of the sampled waveform; • Typically use: • 8 bits = 256 levels • 16 bits = 65,536 levels • Ex: Telephone applications frequently use 8 bit quantization. • Compact Disc uses 16 bit quantization.

Quantization • How should the levels be distributed: • Linearly? (PCM) • Perceptually? (u-Law) • Differential? (DPCM) • Adaptively? (ADPCM)

1- Pulse Code Modulation • Is a method to digitally represent sampled analog signals. • Pulse modulation • Use discrete time samples of analog signals • Transmission is composed of analog information sent at different times • Variation of pulse amplitude or pulse timing allowed to vary continuously over all values • It is the standard form for digital audio in computers and various Compact Disc and DVD formats, as well as other uses such as digital telephone systems.

Linear Quantization (LPCM) • Divide amplitude spectrum into N units (for log2N bit quantization). • LPCM is a particular method of pulse-code modulation which represents an audio waveform as a sequence of amplitude values recorded at a sequence of times.

2- Perceptual Quantization (u-Law) • Algorithms that reduce the dynamic range of an audio signal. • Intensity values logarithmically mapped over N quantization units. Quantization Index Sound Intensity

3- Differential Pulse Code Modulation (DPCM) • Is a signal encoder that uses the baseline of PCM but adds some functionalities based on the prediction of the samples of the signal. • It represents first PCM-coded sample as a whole and following samples as differences from the previous PCM-coded sample. • What if we look at sample differences, not the samples themselves? • dt = xt-xt-1

Differential Quantization (DPCM) • Calculates difference (changes) between two adjacent samples. • Send value, then relative changes. • value uses full bits, changes use fewer bits • E.g., 220, 218, 221, 219, 220, 221, 222, 218,.. (all values between 218 and 222) • Difference sequence sent: 220, +2, -3, +3, -1, -1, -1, 4, ..

DPCM • Result: originally for encoding sequence 0..255 numbers need 8 bits; • Difference coding: need only 3 bits • Coding table: • Example:3-bits for encoding

Adaptive Differential Pulse Code Modulation (ADPCM) • Is a variant of DPCM that varies the size of the quantization step, to allow further reduction of the required bandwidth for a given signal-to-noise ratio. • Encode difference in 4 bits, but vary the mapping of bits to difference, dynamically. • If rapid change, use large differences • If slow change, use small differences

Quantization Error • Difference between actual and sampled value • amplitude between [-A, A] • quantization levels = N • e.g., if A = 1,N = 8, = 1/4

Signal-to-Noise Ratio • Is a measure used in science and engineering that compares the • level of a desired signal to the level of background noise. It is • defined as the ratio of signal power (energy) to the noise power.

Signal-to-Noise Ratio

Signal To Noise Ratio • Measures strength of signal to noise SNR (in DB)= • Given sound form with amplitude in [-A, A] • Signal energy =

Compute Signal to Noise Ratio • Signal energy = ; Noise energy = ; • Noise energy = • Signal to noise = • Every bit increases SNR by ~ 6 decibels

Example • Consider a full load sinusoidal modulating signal of amplitude A, which utilizes all the representation levels provided. • The average signal power is P= A2/2 • The total range of quantizer is 2A because modulating signal swings between –A and A. Therefore, if it is N=16 (4-bit quantizer), Δ = 2A/24 = A/8 • The quantization noise is Δ2/12 = A2/768 • The SNR is (A2/2)/(A2/768) = 384; SNR (in dB) 25.8 dB

Data Rates • Data rate = sample rate * quantization * channel • Compare rates for CD vs. mono audio: • 8000 samples/second * 8 bits/sample * 1 channel= 8 kBytes / second • 44000 samples/second * 16 bits/sample * 2 channel • = 176 kBytes / second

Comparison and Sampling/Coding Techniques

Summary

The End

Multimedia Systems Lecture3 – Digital Audio Representation