audio codecs l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Audio Codecs PowerPoint Presentation
Download Presentation
Audio Codecs

Loading in 2 Seconds...

play fullscreen
1 / 25

Audio Codecs - PowerPoint PPT Presentation


  • 247 Views
  • Uploaded on

Audio Codecs. Dan Mechanic CS W4995. Why are there different codecs?. Each trying to find the best balance, between: Fast Processing Good Compression Quality (accurate) decoding. The best balance can depend on application:. Music: wav encoder compromises compression lossless

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Audio Codecs' - Samuel


Download Now An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
audio codecs

Audio Codecs

Dan Mechanic

CS W4995

why are there different codecs
Why are there different codecs?

Each trying to find the best balance, between:

  • Fast Processing
  • Good Compression
  • Quality (accurate) decoding
the best balance can depend on application
The best balance can depend on application:

Music:

wav encoder compromises compression

  • lossless
  • ~1.4Mbps
  • Sacrifice: Compression

aac encoder compromises fast processing

  • technically lossy, but still quality decoding
  • via sophisticated compression algorithms 320kbps
  • Sacrifice: Processing

Compact Disc: 16-bit 44.1kHz

the best balance can depend on application4
The best balance can depend on application:

Music:

wav encoder compromises compression

  • lossless
  • ~1.4Mbps
  • Sacrifice: Compression

aac encoder compromises fast processing

  • technically lossy, but still quality decoding
  • via sophisticated compression algorithms 320kbps
  • Sacrifice: Processing
why are there different codecs5
Why are there different codecs?

Standards

  • Recommendations from the ITU (International Telecommunications Union)

Existing Technologies

  • G.711 was created in the early seventies for pstn lines supporting 8-bit 8000 samples per second
  • Now G.711 can be a good choice for VOIP because it sounds like a traditional land line and has low latency (less processing at the media gateways)

Patents

End User Expectations

other constraints
Other constraints…

Nyquist Theorem -

“When converting from an analog signal to digital (or otherwise sampling a signal at discrete intervals), the sampling frequency must be greater than twice the highest frequency of the input signal in order to be able to reconstruct the original perfectly from the sampled version.”

source: http://www.fact-index.com/n/ny/nyquist_shannon_sampling_theorem.html

what methods do codecs meant for speech use
What methods do codecs meant for speech use?
  • Many, many codecs…
  • only a handful of methodologies.
pulse code modulation
Pulse Code Modulation

image source: http://en.wikipedia.org/wiki/Pulse-code_modulation

pulse code modulation can require a high bitrate
Pulse Code Modulation can require a high bitrate

G.711 uses different “companding” algorithms to reduce bitrate.

  • Compression - to reduce audio peaks
  • Expansion - raise the floor of the audio.
  • Actually performed via a logarithmic transformation of a 13-14bit number to a 8-bit number
law and a law algorithms
μ-law and A-law algorithms

μ-law

  • Used in North America and Japan
  • specifically for turning 14-bit encoding to 8

A-law

  • Used in Europe
  • converts 13 bit to 8 bit
differential pulse code modulation
Differential Pulse Code Modulation
  • Waveforms act fairly predictably
  • We can look at a previous sample and predict the value of the next one.
  • If coder and decoder agree on what algorithm to predict with, only the difference between prediction and actual needs to be transmitted.
differential pulse code modulation12
Differential Pulse Code Modulation

image from “Speech Compression” by Mark Handley: www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf

adaptive differential pulse code modulation
Adaptive Differential Pulse Code Modulation
  • Algorithms for next-sample prediction can be dynamic to more accurately represent the waveform we are encoding/decoding.
  • Vary predictor to adapt to the changing characteristics of the audio being recorded.
  • G.721 uses the previous 8 samples, and can quantized the difference to 4-bits (32Kbs)
sub band differential pulse code modulation
Sub-Band Differential Pulse Code Modulation

“not all frequencies created equal”

  • Lower frequencies (50Hz-3.5kHz) are important to understanding speech, and are more sensitive to quantization errors.
  • Higher frequencies (3.5kHz-7kHz) are used for conveying emotion and recognition of the speaker
sub band differential pulse code modulation15
Sub-Band Differential Pulse Code Modulation

“not all frequencies created equal”

…so don’t treat them the same

  • Lower frequencies (50Hz-3.5kHz) sample at 16kHz
  • Higher frequencies (3.5Khz-7kHz), less important, down-sample to 8kHz
  • mux these together to get (64kbs)… same compression, better decoding quality, at the price of processing
  • G.721, G.726
linear predictive source filter speech model
Linear PredictiveSource-Filter Speech Model
  • An algorithm that models speech

image source: http://mtg.upf.edu/~xserra/cursos/TDP/referencies/Park-LPC-tutorial.pdf

linear predictive
Linear Predictive

Based on a simple model of human speech

  • Buzzer - your glottis or vocal chords, provides pitch
  • Tube - builds resonance and gives rise to ‘formants’
  • Hiss and pops - tongue, lips and throat make sibilants and plosives (“s”,”k”,”p”)
linear predictive18
Linear Predictive

Formants

  • peaks in the frequency spectrum caused by acoustic resonance.

image of the frequency response of the typical vowel sound source: http://mtg.upf.edu/~xserra/cursos/TDP/referencies/Park-LPC-tutorial.pdf

linear predictive19
Linear Predictive

Encoding

  • operates on a sample of sound (around 20ms)
  • remove formants, and leave ‘residue’ sound (buzz), determine tone of ‘residue’
  • Determine whether sound is voiced or unvoiced
    • voiced - tonal “m” “v”
    • unvoiced - sibilance and plosives “s” “k”
  • optimized using a series of linear predictive coefficients
linear decoding
Linear Decoding

img source: www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf

“Speech Compression” Mark Handley

linear predictive encoding
Linear Predictive Encoding

What’s the limitation?

Our speech creation is not in fact so simple.

For some sounds, nasal passages create a ‘side-branch’ to our tube..

code excited linear predictive celp
Code Excited Linear Predictive(CELP)
  • Instead of sending a series of coefficients, agree on a ‘codebook’ of coefficients, and send a reference to the code you are using.
  • Don’t need a codebook for every pitch. One pitch can be delayed for lower frequencies.
  • Speex (open-source patent free)
linear predictive other variants
Linear Predictive - Other Variants
  • Regular-Pulse Excitation Long-Term Predictor (GSM)
  • Low Delay Code Excited Linear Prediction (G.728)
  • Conjugated Structure Algebraic Code Excited Linear Prediction (G.729)
references
References
  • http://www.cs.columbia.edu/~hgs/audio/codecs.html
  • http://www.fact-index.com/p/pu/pulse_code_modulation_1.html
  • http://www.fact-index.com/n/ny/nyquist_shannon_sampling_theorem.html
  • http://en.wikipedia.org/wiki/Pulse-code_modulation
  • http://www1.cs.columbia.edu/~sedwards/classes/2004/4840/reports/manic.pdf
  • http://www-mobile.ecs.soton.ac.uk/speech_codecs/standards/adpcm.html
  • http://www.cs.columbia.edu/~hgs/teaching/ais/slides/04-speech-coding.pdf “Speech Compression” Mark Handley
  • http://www.myspace.com/growing_up_is_hard_2_do - speak n spell image
  • A good introduction to LPC Dr. Sung-won Park[2] Texas A&M University-Kingsville
  • http://en.wikipedia.org/wiki/G.711
  • ITU-T recomendation G.711
  • http://en.wikipedia.org/wiki/%CE%9C-law_algorithm
  • Soundfiles: www.Data-Compression.com
  • http://www.otolith.com/otolith/olt/lpc.html