1 / 33

Digital Audio Compression

Digital Audio Compression. CIS 465 Spring 2013. Speech Compression. Compression of voice data We have previously mentioned several methods that are used to compress voice data mu-law and A-law companding ADPCM and delta modulation

willem
Download Presentation

Digital Audio Compression

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Digital Audio Compression CIS 465 Spring 2013

  2. Speech Compression • Compression of voice data • We have previously mentioned several methods that are used to compress voice data • mu-law and A-law companding • ADPCM and delta modulation • These are examples of methods which work in the time domain (as opposed to the frequency domain) • Often they are not even considered compression methods

  3. Speech Compression • Although the previous techniques are generally applied to speech data they are not designed specifically for such data • Vocoders, instead, are • Can’t be used with other analog signals • Model speech so that the salient features can be captured in as few bits as possible • Linear Predictive Coders model the speech waveform in time • Also channel vocoders and formant vocoders • In electronic music, vocoders allow a voice to modulate a musical source (via synthesizer, e.g.)

  4. General Audio Compression • If we want to compress general audio (not just speech), different techniques are needed • In particular, music compression is a more general form of audio compression • We make use of psychoacoustical modeling • Enable perceptual encoding based upon an analysis of the ear and brain perceive sound • Perceptual encoding exploits audio elements that the human ear cannot hear well

  5. Psychoacoustics • If you have been listening to very loud music, you may have trouble afterwards hearing soft sounds (that normally you could hear) • Temporal masking • A loud sound at one frequency (a lead guitar) may drown out a sound at another frequency (the singer) • Frequency masking

  6. Equal-Loudness Relations • If we play two pure tones, sinusoidal sound waves, with the same amplitude but different frequencies • One may sound louder than another • The ear does not hear low or high frequencies as well as mid-range ones (speech) • This can be shown with equal-loudness curves which plot perceived loudness on the axes of true loudness and frequency

  7. Equal-Loudness Relations

  8. Threshold of Hearing • The following image is a plot of the threshold of human hearing for pure tones – at loudness below the curve, we don’t hear a tone

  9. Threshold of Hearing • A loud sound can mask other sounds at nearby frequencies as shown below

  10. Frequency masking • We can determine how a pure tone at a particular frequency affects our ability to hear tones at nearby frequencies • Then, if a signal can be decomposed into frequencies, for those frequencies that are only partially masked, only the audible part will be used to set the quantization noise thresholds

  11. Critical Bands • Human hearing range divides into critical bands • Human auditory system cannot resolve sounds better than within about one critical band when other sounds are present • Critical bandwidth represents the ear’s resolving power for simultaneous tones • At lower frequencies the bands are narrower than at higher frequencies • The band is the section of the inner ear which responds to a particular frequency

  12. Critical Bands

  13. Critical Bands • Generally, the audio frequency range for hearing (20 Hz – 20 kHz) can be partitioned into about 24 critical bands (25 are typically used for coding applications • The previous slide does not show several of the highest frequency critical bands • The critical band at the highest audible frequency is over 4000 Hz wide • The ear is not very discriminating within a critical band

  14. Temporal Masking • A loud tone causes the hearing receptors in the inner ear to become saturated, and they require time to recover • This leads to the temporal masking effect • After the loud tone we cannot immediately hear another tone – post-masking • The length of the masking depends on the duration of the masking tone • A masking tone can also block sounds played just before – pre-masking (shorter time)

  15. Temporal Masking • MPEG audio compression takes advantage of both temporal and frequency masking to transmit masked frequency components using fewer bits

  16. MPEG Audio Compression • MPEG (Motion Picture Experts Group) is a family of standards for compression of both audio and video data • MPEG-1 (1991) CD quality audio • MPEG-2 (1994) Multi-channel surround sound • MPEG-4 (1998) Also includes MIDI, speech, etc. • MPEG-7 (2003) Not compression – searching • MPEG-21 (2004) Not compression – digital rights management

  17. MPEG Audio Compression • MPEG-1 defined three downward compatible layers of audio compression • Each layer offers more complexity in the psychoacoustic model used and hence better compression • Increased complexity leads to increased delay • Compatibility achieved by shared file header information • Layer 1 – used for Digital Audio Tape • Layer 2 – proposed for digital audio broadcasting • Layer 3 – music (MPEG-1 layer 3 == mp3)

  18. MPEG Audio Compression • MPEG audio compression relies on quantization, masking, critical bands • The encoder uses a bank of 32 filters to decompose the signal into sub-bands • Uniform width – not exactly aligned to crit. bands • Overlapping • A Fourier transform is used for the psycho-acoustical model • Layer 3 adds a DCT to the sub-band filtering so that layers 1 and 2 work in the temporal domain and layer 3 in the frequency domain

  19. MPEG Audio Compression • PCM input filtered into 32 bands • PCM FFT transformed for PA model • Windows of samples (384, 576, 1152) coded at a time

  20. MPEG Audio Compression • Since the sub-bands overlap, aliasing may occur • This is overcome by the use of a quadrature mirror filter bank • Attenuation slopes of adjacent bands are mirror images

  21. MPEG Audio Algorithm • The PCM audio data is assembled into frames • Header – sync code of 12 1s • SBS format – describe how many sub-band samples (SBS) are in the frame • The SBS (384 in Layer 1, 1152 in Layers 2, 3) • Ancillary data – e.g. multi-lingual data or surround-sound data

  22. MPEG Audio Algorithm • The sampling rate determines the frequency range • That range is divided up into 32 overlapping bands • The frames are sent through a corresponding 32-filter filter bank • If X is the number of samples per frame, each filter produces X/32 samples • These are still samples in the temporal domain

  23. MPEG Audio Algorithm • The Fourier transform is performed on a window of samples surrounding the samples in the frame (either 1024 or 2*1024 samples) • This feeds into the psychoacoustic model (along with the subband samples) • Analyze tonal and nontonal elements in each band • Determine spreading functions (how much each band affects another)

  24. MPEG Audio Algorithm • Find the masking threshold and signal-to-mask ratios for each band • The scaling factor for each band is the maximum amplitude of the samples in that band • The bit-allocation algorithm takes the SMRs and scaling factor and determines how many bits can be allocated (quantization granularity) for each band • In MP3, the bits can be moved from band to band as needed to ensure a minimum amount of compression while achieving higher quality

  25. MPEG Audio Algorithm • Layer 1 has 12 samples encoded per band per frame • Layer 2 has 3 groups of 12 (36 samples) per frame • Layer 3 has non-equal frequency bands • Layer 3 also performs a Modified DCT on the filtered data, so we are in the frequency (not time) domain • Layer 3 does non-uniform quantization followed by Huffman coding • All of these modifications make for better (if more complex) performance for MP3

  26. Stereo Encoding • MPEG codes stereo data in several different ways • Joint stereo • Intensity stereo • Etc. • We are not discussing these

  27. MPEG File Format • MPEG files do not have a header (so you can start playing/processing anywhere in the file) • Consist of a sequence of frames • Each frame has a header followed by audio data

  28. MPEG File Format

  29. MPEG File Format • ID3 is a metadata container most often used in conjunction with the MP3 audio file format. • Allows information such as the title, artist, album, track number, year, genre, and other information about the file to be stored in the file itself. • Last 128 bytes of the file

  30. Bit Rates • Audio (or Video) compression schemes can be characterized as either constant bit rate (CBR) or variable bit rate (VBR) • In general, higher compression can be achieved with VBR (at the cost of added complexity for code/decode) • MPEG-1 Layers 1 and 2 are CBR only • MP3 is either VBR or CBR • Average Bit Rate (ABR) is a compromise

  31. MPEG-2 AAC • MPEG-2 (which is used for encoding DVDs) has an audio component as well • MPEG-2 AAC (Advanced Audio Coding) standard was aimed at transparent sound reproduction for theatres • 320 kbps for five channels (left, right, center, left-surround and right-surround) • 5.1 channel systems include a low-frequency enhancement channel (“woofer”) • AAC can also deliver high-quality stereo sound at bitrates less than 128 kbps

  32. MPEG-2 AAC • AAC is the default audio format for (e.g.): YouTube, iPod (iTunes), PS3, Nintendo Dsi, etc. • Compared to MP3 • More sampling frequencies • More channels • More efficient, simpler filterbank (pure MDCT) • Arbitrary bit rates and variable frame lengths • Etc. etc.

  33. MPEG-4 Audio • MPEG-4 audio integrates a number of audio components into one standard • Speech compression • Text-to-speech • MIDI • MPEG-4 AAC (similar to MPEG-2 AAC) • Alternative coders (perceptual coders and structured coders)

More Related