Digital Audio Basics “Any signal can be completely reconstructed from samples.” - Harry Nyquist

Digital Audio Basics “Any signal can be completely reconstructed from samples.” - Harry Nyquist

Know your prefixes! • Giga (G) – billion; Mega (M) – million; kilo (k) – thousand • Bit (b) – a binary digit • Byte (B) – 8 bits • Word – can be several bytes (wordlength), usually measured in bits • Mbps – Mega bits per second • kHz – kilo Hertz • GB – Giga Bytes

Converting Voltage to numbers • Microphone converts sound pressure into voltage; the voltage is constantly changing over time (just like the sound pressure – think analogous) • An Analog to Digital Converter (ADC) measures the voltage at set intervals in time (snapshots) and records each measured voltage as a number (numbers = digits) • This process is called sampling. The number of snapshots the ADC takes every second is called the sampling rate • The computer stores these numbers in such a way that they can be recalled in the order in which they happened • On playback, these samples are converted back to voltages by the Digital to Analog Converter (DAC) • The DAC sends out a pulse the amplitude of which is determined by the value of the sample (Pulse Code Modulation or PCM) • All of these pulses added together recreate the original waveform (motion picture analogy). • This process can take a certain amount of time, which can cause an audible delay in the reproduced audio. This delay, caused by processing, is called latency

(microphone) TYPICAL SIGNAL FLOW IN A DIGITAL AUDIO CHAIN Often the computer acts as the ADC, Digital Recorder, and DAC A computer soundcard can be internal or external. Many external soundcards use Firewire or USB. All computer soundcards have ADC’s and DAC’s built into them. Some even have built-in preamps. Voltage Preamp ADC Digital Recorder (speaker) Numbers (digits) Voltage DAC Power Amp

Who was Nyquist and why you should care • Any signal can be completely reconstructed from samples. In order to accurately reproduce the original signal, you must sample the signal at more than twice its highest frequency • Harry Nyquist • Humans hear frequencies between 20 Hz and 20 kHz. In order to accurately reproduce all frequencies in this bandwidth, you must take samples faster than 40 kHz • The sampling rate for CD-quality audio is 44.1 kHz • You must filter out any frequencies above the Nyquist limit (fs/2) • If you do not remove frequencies above the limit, aliasing will occur • Aliasing – an artifact where frequencies which are higher than the Nyquist limit are folded back into the hearable spectrum (e.g. Nyquist limit = 20 kHz; 30 kHz becomes 10 kHz) • Generally, the higher the sampling rate, the more frequencies you are able to reproduce (greater bandwidth) • Sampling rate corresponds with accuracy in reproducing the frequency component of audio

Importance of bit depth • Computers use data that is stored in binary numbers. • All of the sampled measurements of voltage must be converted to binary numbers • Computers have a limited number of fixed values that can be used to represent the measured voltage. • The math: an 8-bit converter has 28 or 256 possible values, a 16-bit converter has 216 or 65,536 possible values, and a 24-bit converter has 224 or 16,777,216 possible values • Measurements that fall between these values are rounded off, affecting the accuracy of the reproduced signals • This can be translated into increased dynamic range, better signal-to-error ratio, more headroom and/or better resolution. • Generally, the higher the bit depth, the more accurate the reproduction of the signal. • Bit Depth corresponds to accuracy in reproducing the amplitude component of audio.

Role of the word clock • Imagine a world where every clock would measure time differently (a minute is 45 seconds here and 65 seconds there) • All digital devices have internal clocks. (ex. Your computer’s processor speed is also known as its clock speed) • When digital devices transfer data, their clocks must be synchronized. In digital audio, this is accomplished using a word clock. • The sending device sends a word clock signal which overrides the internal clock of the receiving device, ensuring that the two devices are “on the same page”. • If the clocks are not synchronized, there will be “clock errors” – audible clicks and pops in the audio. • In many studios, there is one master clock which controls the clocks in all of the digital devices in the studio, ensuring that they are all operating at the same sampling rate and that their clocks are all “ticking” at the same time • Clock signal can be transmitted with the audio or can be sent separately (generally via a coaxial cable with a BNC-type connector)

Different formats = Alphabet Soup • There are many different types of digital audio signals. There are three that are very common. • SPDIF – Sony/Phillips Digital interface. Mostly uses RCA connectors and carries two channels of digital audio over each connection. Unbalanced – short cable runs only • AES/EBU – Audio Engineering Society/European Broadcast Union: Mostly uses XLR connectors and carries two channels over each connection. Balanced – can accommodate long cable runs without loss • ADAT – Alesis Digital Audio Tape: A proprietary format from Alesis. Carries 8 channels of digital audio over a single fibre-optic connection at sampling rates up to 48 kHz. Can carry 4 channels at sampling rates of 88 kHz and 96 kHz • These three types of connections carry clock signal embedded in the data stream

DSP – mixing is math • DSP – Digital Signal Processing • Every change to your digital audio signal – even something as mundane as changing the volume – is a mathematical operation on the stored digital audio samples. • Mixing two signals together is a simple matter of addition. Changing volume is a multiplication problem. • All kinds of complicated processing is done using math to change the original sampled data • Some DSP is done in “real-time” while other processing is file-based meaning that it actually changes the data in the digital audio file.

Digital Zero – yet another dB scale! • Most analog meters are measured in Volume Units (dBVU). • 0 dBVU usually corresponds to the voltage of a line level signal (+4 dBu) • Digital meters use dBFS. A zero on this meter means the converter has run out of numbers to represent the waveform. • If you try to go above this level, you will get digital distortion. • Unlike analog distortion, there is never anything pleasant about digital distortion. • A converter’s sensitivity can be adjusted to correspond to different levels. Common levels are 0 dBVU = -16 dBFS or 0 dBVU = -18 dBFS. • In the first case, the converter would not be able to digitally represent a signal that is greater than +16 dBVU – way off the scale of most analog meters. • On an Analog meter, a reading of 0 dB usually means that you still can push a bit more level before you seriously distort the signal. • On a Digital meter, 0 dB means you have no values left to represent the signal • PROPER GAIN STAGING IS EVEN MORE IMPORTANT IN DIGITAL RECORDING

Digital Audio File Formats • There are many different audio file formats. They fall into two general categories: compressed and uncompressed. • Uncompressed formats include Wave (.wav, .bwf), Audio Interchange File Format (AIFF) (.aif), and Sound Designer II (SDII). • These file formats are PCM audio files (Pulse Code Modulation) and they contain ALL of the samples that make up the digital audio file exactly as they were recorded. • Compressed formats include MPEG-1 layer III (.mp3), MPEG-2 AAC, RealAudio (.ra), Windows Media Audio File (.wma), and OggVorbis. • Compressed audio files go through a process known as lossy data compression - the data that make up the file are completely altered and much of the information is discarded • THIS IS NOT TO BE CONFUSED WITH USING AN AUDIO COMPRESSOR IN A STUDIO!!!!

How data compression works • Data Compression is a process where a program analyzes a file to see how much of the data can be done away with while still retaining the ability to reconstruct the original data (e.g. a Winzip file) • When compressing an audio file, there is a target bit rate in mind. An algorithm is called upon to see how much audio data must be thrown away to reach this bit rate • In a perceptual coder (e.g. an MP3 encoder), the algorithm is designed to estimate how much of the audio you will actually perceive, based on knowledge of the frequency response of human hearing. Enough energy at one frequency may impair your ability to hear energy at another frequency. Anything that the algorithm thinks that you won’t miss is gotten rid of. • The Audio is divided into different frequency bins. Certain frequency bands are often done away with entirely. The audio is often distorted by the process, since there is a tradeoff between accuracy in the frequency and time domains.

Summary • Analog voltages are converted to digital values through sampling; Your sampling rate must be 2x the highest frequency in your signal • Bit depth is the number of bits used to encode a single sample; More bits are usually better • Latency is delay caused by processing • PCM audio files (WAV, AIF) preserve every single sample. • Lossy compression formats (MP3) throw away much of the audio information that was originally recorded • When recording from a DIGITAL source (recorders, external converters), make sure your clock is set correctly! • Beware of the red light: DIGITAL DISTORTION IS NOT A GOOD THING!

Digital Audio Basics “Any signal can be completely reconstructed from samples.” - Harry Nyquist