1 / 9

Audio

Audio. Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003. Common narrowband audio codecs. Common wideband audio codecs. iLBC – MOS behavior with packet loss. Recent audio codecs. iLBC: optimized for high packet loss rates (frames encoded independently) AMR-NB

Download Presentation

Audio

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Audio Henning Schulzrinne Dept. of Computer Science Columbia University Fall 2003

  2. Common narrowband audio codecs

  3. Common wideband audio codecs

  4. iLBC – MOS behavior with packet loss

  5. Recent audio codecs • iLBC: optimized for high packet loss rates (frames encoded independently) • AMR-NB • 3G wireless codec • 4.75-12.2 kb/s • 20 ms coding delay

  6. Speex • Open-source patent-free speech codec • CELP (code-excited linear prediction) codec • operating modes: • narrowband (8 kHz sampling rate) • 2.15 – 24.6 kb/s • delay of 30 ms • wideband (16 kHz sampling rate) • 4-44.2 kb/s • delay of 34 ms • ultra-wideband (32 kHz sampling rate) • intensity stereo encoding • variable bit rate (VBR) possible • voice activity detection (VAD)

  7. Ogg Vorbis • Similar in application to AAC, MP3, VQF, …, but claims to be free of patents • Ogg = container format file (also for Speex, FLAC) • Vorbis = music speech codec • near CD quality = 160 kb/s • forward-adaptive modified DCT (discrete cosine transform) • overlapping windows • floor: carries frequency representation as piecewise linear interpolated representation on a dB amplitude scale and linear frequency scale • residue: subtract out floor  cascaded (multi-pass) vector quantization • entropy (Huffman) coding • carries codec parameters in header

  8. Sound localization • Human ear uses 3 metrics for stereo localization: • intensity • time of arrival (TOA) – 7 µs • direction filtering and spectral shaping by outer ear • For shorter wavelengths (4 – 20 kHz), head casts an acoustical shadow giving rise to a lower sound level at the ear farthest from the sound sources • At long wavelength (20 Hz - 1 KHz) the, head is very small compared to wavelengths • In this case localization is based on perceived Interaural Time Differences (ITD) UCSC CMPE250 Fall 2002

  9. Audio samples • http://www.cs.columbia.edu/~hgs/audio/codecs.html • Speex: http://www.speex.org/audio/samples/ • both narrowband and wideband

More Related