Audio Codecs

Audio Codecs Dan Mechanic CS W4995

Why are there different codecs? Each trying to find the best balance, between: • Fast Processing • Good Compression • Quality (accurate) decoding

The best balance can depend on application: Music: wav encoder compromises compression • lossless • ~1.4Mbps • Sacrifice: Compression aac encoder compromises fast processing • technically lossy, but still quality decoding • via sophisticated compression algorithms 320kbps • Sacrifice: Processing Compact Disc: 16-bit 44.1kHz

The best balance can depend on application: Music: wav encoder compromises compression • lossless • ~1.4Mbps • Sacrifice: Compression aac encoder compromises fast processing • technically lossy, but still quality decoding • via sophisticated compression algorithms 320kbps • Sacrifice: Processing

Why are there different codecs? Standards • Recommendations from the ITU (International Telecommunications Union) Existing Technologies • G.711 was created in the early seventies for pstn lines supporting 8-bit 8000 samples per second • Now G.711 can be a good choice for VOIP because it sounds like a traditional land line and has low latency (less processing at the media gateways) Patents End User Expectations

Other constraints… Nyquist Theorem - “When converting from an analog signal to digital (or otherwise sampling a signal at discrete intervals), the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version.” source: http://www.fact-index.com/n/ny/nyquist_shannon_sampling_theorem.html

What methods do codecs meant for speech use? • Many, many codecs… • only a handful of methodologies.

Pulse Code Modulation image source: http://en.wikipedia.org/wiki/Pulse-code_modulation

Pulse Code Modulation can require a high bitrate G.711 uses different “companding” algorithms to reduce bitrate. • Compression - to reduce audio peaks • Expansion - raise the floor of the audio. • Actually performed via a logarithmic transformation of a 13-14bit number to a 8-bit number

μ-law and A-law algorithms μ-law • Used in North America and Japan • specifically for turning 14-bit encoding to 8 A-law • Used in Europe • converts 13 bit to 8 bit

Differential Pulse Code Modulation • Waveforms act fairly predictably • We can look at a previous sample and predict the value of the next one. • If coder and decoder agree on what algorithm to predict with, only the difference between prediction and actual needs to be transmitted.

Differential Pulse Code Modulation image from “Speech Compression” by Mark Handley: www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf

Adaptive Differential Pulse Code Modulation • Algorithms for next-sample prediction can be dynamic to more accurately represent the waveform we are encoding/decoding. • Vary predictor to adapt to the changing characteristics of the audio being recorded. • G.721 uses the previous 8 samples, and can quantized the difference to 4-bits (32Kbs)

Sub-Band Differential Pulse Code Modulation “not all frequencies created equal” • Lower frequencies (50Hz-3.5kHz) are important to understanding speech, and are more sensitive to quantization errors. • Higher frequencies (3.5kHz-7kHz) are used for conveying emotion and recognition of the speaker

Sub-Band Differential Pulse Code Modulation “not all frequencies created equal” …so don’t treat them the same • Lower frequencies (50Hz-3.5kHz) sample at 16kHz • Higher frequencies (3.5Khz-7kHz), less important, down-sample to 8kHz • mux these together to get (64kbs)… same compression, better decoding quality, at the price of processing • G.721, G.726

Linear PredictiveSource-Filter Speech Model • An algorithm that models speech image source: http://mtg.upf.edu/~xserra/cursos/TDP/referencies/Park-LPC-tutorial.pdf

Linear Predictive Based on a simple model of human speech • Buzzer - your glottis or vocal chords, provides pitch • Tube - builds resonance and gives rise to ‘formants’ • Hiss and pops - tongue, lips and throat make sibilants and plosives (“s”,”k”,”p”)

Linear Predictive Formants • peaks in the frequency spectrum caused by acoustic resonance. image of the frequency response of the typical vowel sound source: http://mtg.upf.edu/~xserra/cursos/TDP/referencies/Park-LPC-tutorial.pdf

Linear Predictive Encoding • operates on a sample of sound (around 20ms) • remove formants, and leave ‘residue’ sound (buzz), determine tone of ‘residue’ • Determine whether sound is voiced or unvoiced • voiced - tonal “m” “v” • unvoiced - sibilance and plosives “s” “k” • optimized using a series of linear predictive coefficients

Linear Decoding img source: www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf “Speech Compression” Mark Handley

Linear Predictive Encoding What’s the limitation? Our speech creation is not in fact so simple. For some sounds, nasal passages create a ‘side-branch’ to our tube..

Code Excited Linear Predictive(CELP) • Instead of sending a series of coefficients, agree on a ‘codebook’ of coefficients, and send a reference to the code you are using. • Don’t need a codebook for every pitch. One pitch can be delayed for lower frequencies. • Speex (open-source patent free)

Linear Predictive - Other Variants • Regular-Pulse Excitation Long-Term Predictor (GSM) • Low Delay Code Excited Linear Prediction (G.728) • Conjugated Structure Algebraic Code Excited Linear Prediction (G.729)

Many Codecs follow these two basic models

References • http://www.cs.columbia.edu/~hgs/audio/codecs.html • http://www.fact-index.com/p/pu/pulse_code_modulation_1.html • http://www.fact-index.com/n/ny/nyquist_shannon_sampling_theorem.html • http://en.wikipedia.org/wiki/Pulse-code_modulation • http://www1.cs.columbia.edu/~sedwards/classes/2004/4840/reports/manic.pdf • http://www-mobile.ecs.soton.ac.uk/speech_codecs/standards/adpcm.html • http://www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf “Speech Compression” Mark Handley • http://www.myspace.com/growing_up_is_hard_2_do - speak n spell image • A good introduction to LPC Dr. Sung-won Park[2] Texas A&M University-Kingsville • http://en.wikipedia.org/wiki/G.711 • ITU-T recomendation G.711 • http://en.wikipedia.org/wiki/%CE%9C-law_algorithm • Soundfiles: www.Data-Compression.com • http://www.otolith.com/otolith/olt/lpc.html

Audio Codecs

Audio Codecs

Presentation Transcript

Media Foundation: Supporting Hardware Codecs and Cameras

Understanding Video and Audio Codecs

EE5359 Multimedia Processing Project Study and Comparison of AC3, AAC and HE-AAC Audio Codecs

Recording Options: Codecs and Containers

Audio Slideshow: Audio Tips

Audio

The Case for Layered Codecs

Network Video Codecs

Audio

Audio Codecs

Data Formats and Codecs

AUDIO

Audio

Audio

Data Formats and Codecs

Recording Options : Codecs and Containers

Video Codecs

Global Smartphone Audio Codecs Market 2014-2018

Global Smartphone Audio Codecs Market 2014-2018

Global Smartphone Audio Codecs Industry 2015 Market Analysis, Development, Growth, Insights, Overview and Forecasts

Global Smartphone Audio Codecs Market Trends & Growth 2021

Data Formats and Codecs