1 / 41

A Tutorial on MPEG/Audio Compression

A Tutorial on MPEG/Audio Compression. Davis Pan, IEEE Multimedia Journal , Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004. Outline. Introduction Technical Overview Polyphase Filter Bank Psychoacoustic Model Coding and Bit Allocation Conclusions and Future Work.

jacob
Download Presentation

A Tutorial on MPEG/Audio Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Tutorial on MPEG/Audio Compression Davis Pan, IEEE Multimedia Journal, Summer 1995 Presented by: Randeep Singh Gakhal CMPT 820, Spring 2004

  2. Outline • Introduction • Technical Overview • Polyphase Filter Bank • Psychoacoustic Model • Coding and Bit Allocation • Conclusions and Future Work

  3. Introduction • What does MPEG-1 Audio provide? A transparently lossy audio compression system based on the weaknesses of the human ear. • Can provide compression by a factor of 6 and retain sound quality. • One part of a three part standard that includes audio, video, and audio/video synchronization.

  4. Technical Overview

  5. MPEG-I Audio Features • PCM sampling rate of 32, 44.1, or 48 kHz • Four channel modes: • Monophonic and Dual-monophonic • Stereo and Joint-stereo • Three modes (layers in MPEG-I speak): • Layer I: Computationally cheapest, bit rates > 128kbps • Layer II: Bit rate ~ 128 kbps, used in VCD • Layer III: Most complicated encoding/decoding, bit rates ~ 64kbps, originally intended for streaming audio

  6. Human Audio System (ear + brain) • Human sensitivity to sound is non-linear across audible range (20Hz – 20kHz) • Audible range broken into regions where humans cannot perceive a difference • called the critical bands

  7. MPEG-I Encoder Architecture[1]

  8. MPEG-I Encoder Architecture • Polyphase Filter Bank: Transforms PCM samples to frequency domain signals in 32 subbands • Psychoacoustic Model: Calculates acoustically irrelevant parts of signal • Bit Allocator: Allots bits to subbands according to input from psychoacoustic calculation. • Frame Creation: Generates an MPEG-I compliant bit stream.

  9. The Polyphase Filter Bank

  10. Polyphase Filter Bank • Divides audio signal into 32 equal width subband streams in the frequency domain. • Inverse filter at decoder cannot recover signal without some, albeit inaudible, loss. • Based on work by Rothweiler[2]. • Standard specifies 512 coefficient analysis window, C[n]

  11. Polyphase Filter Bank • Buffer of 512 PCM samples with 32 new samples, X[n], shifted in every computation cycle • Calculate window samples for i=0…511: • Partial calculation for i=0…63: • Calculate 32 subsamples:

  12. Polyphase Filter Bank • Visualization of the filter[1]:

  13. Polyphase Filter Bank • The net effect: • Analysis matrix: • Requires 512 + 32x64 = 2560 multiplies. • Each subband has bandwidth π/32T centered at odd multiples of π/64T

  14. Polyphase Filter Bank • Shortcomings: • Equal width filters do not correspond with critical band model of auditory system. • Filter bank and its inverse are NOT lossless. • Frequency overlap between subbands.

  15. Polyphase Filter Bank • Comparison of filter banks and critical bands[1]:

  16. Polyphase Filter Bank • Frequency response of one subband[1]:

  17. Psychoacoustic Model

  18. The Weakness of the Human Ear • Frequency dependent resolution: • We do not have the ability to discern minute differences in frequency within the critical bands. • Auditory masking: • When two signals of very close frequency are both present, the louder will mask the softer. • A masked signal must be louder than some threshold for it to be heard  gives us room to introduce inaudible quantization noise.

  19. MPEG-I Psychoacoustic Models • MPEG-I standard defines two models: • Psychoacoustic Model 1: • Less computationally expensive • Makes some serious compromises in what it assumes a listener cannot hear • Psychoacoustic Model 2: • Provides more features suited for Layer III coding, assuming of course, increased processor bandwidth.

  20. Psychoacoustic Model • Convert samples to frequency domain • Use a Hann weighting and then a DFT • Simply gives an edge artifact (from finite window size) free frequency domain representation. • Model 1 uses 512 (Layer I) or 1024 (Layers II and III) sample window. • Model 2 uses a 1024 sample window and two calculations per frame.

  21. Psychoacoustic Model • Need to separate sound into “tones” and “noise” components • Model 1: • Local peaks are tones, lump remaining spectrum per critical band into noise at a representative frequency. • Model 2: • Calculate “tonality” index to determine likelihood of each spectral point being a tone • based on previous two analysis windows

  22. Psychoacoustic Model • “Smear” each signal within its critical band • Use either a masking (Model 1) or a spreading function (Model 2). • Adjust calculated threshold by incorporating a “quiet” mask – masking threshold for each frequency when no other frequencies are present.

  23. Psychoacoustic Model • Calculate a masking threshold for each subband in the polyphase filter bank • Model 1: • Selects minima of masking threshold values in range of each subband • Inaccurate at higher frequencies – recall how subbands are linearly distributed, critical bands are NOT! • Model 2: • If subband wider than critical band: • Use minimal masking threshold in subband • If critical band wider than subband: • Use average masking threshold in subband

  24. Psychoacoustic Model • The hard work is done – now, we just calculate the signal-to-mask ratio (SMR) per subband • SMR = signal energy / masking threshold • We pass our result on to the coding unit which can now produce a compressed bitstream

  25. Psychoacoustic Model (example) • Input[1]:

  26. Psychoacoustic Model (example) • Transformation to perceptual domain[1]:

  27. Psychoacoustic Model (example) • Calculation of masking thresholds[1]:

  28. Psychoacoustic Model (example) • Signal-to-mask ratios[1]:

  29. Psychoacoustic Model (example) • What we actually send[1]:

  30. Coding and Bit Allocation

  31. Layer Specific Coding • Layer specific frame formats[1]:

  32. Layer Specific Coding • Stream of samples is processed in groups[1]:

  33. Layer I Coding • Group 12 samples from each subband and encode them in each frame (=384 samples) • Each group encoded with 0-15 bits/sample • Each group has 6-bit scale factor

  34. Layer II Coding • Similar to Layer I except: • Groups are now 3 of 12 samples per-subband = 1152 samples per frame • Can have up to 3 scale factors per subband to avoid audible distortion in special cases • Called scale factor selection information (SCFSI)

  35. Layer III Coding • Further subdivides subbands using Modified Discrete Cosine Transform (MDCT) – a lossless transform • Larger frequency resolution => smaller time resolution • possibility of pre-echo • Layer III encoder can detect and reduce pre-echo by “borrowing bits” from future encodings

  36. Bit Allocation • Determine number of bits to allot for each subband given SMR from psychoacoustic model. • Layers I and II: • Calculate mask-to-noise ratio: • MNR = SNR – SMR (in dB) • SNR given by MPEG-I standard (as function of quantization levels) • Now iterate until no bits to allocate left: • Allocate bits to subband with lowest MNR. • Re-calculate MNR for subband allocated more bits.

  37. Bit Allocation • Layer III: • Employs “noise allocation” • Quantizes each spectral value and employs Huffman coding • If Huffman encoding results in noise in excess of allowed distortion for a subband, encoder increases resolution on that subband • Whole process repeats until one of three specified stop conditions is met.

  38. Conclusions and Future Work

  39. Conclusions • MPEG-I provides tremendous compression for relatively cheap computation. • Not suitable for archival or audiophile grade music as very seasoned listeners can discern distortion. • Modifying or searching MPEG-I content requires decompression and is not cheap!

  40. Future Work • MPEG-1 audio lays the foundation for all modern audio compression techniques • Lots of progress since then (1994!) • MPEG-2 (1996) extends MPEG audio compression to support 5.1 channel audio • MPEG-4 (1998) attempts to code based on perceived audio objects in the stream • Finally, MPEG-7 (2001) operates at an even higher level of abstraction, focusing on meta-data coding to make content searchable and retrievable

  41. References [1] D. Pan, “A Tutorial on MPEG/Audio Compression”, IEEE Multimedia Journal, 1995. [2] J. H. Rothweiler, “Polyphase Quadrature Filters – a New Subband Coding Technique”, Proc of the Int. Conf. IEEE ASSP, 27.2, pp1280-1283, Boston 1983.

More Related