Data Compression II (Codecs and Container Formats)‏

Data Compression II(Codecs and Container Formats)‏

Codecs • A codec is a device or program capable of performing encoding and decoding on a digital data stream or signal. • The word codec may be a combination of any of the following: 'compressor-decompressor', 'coder-decoder', or 'compression/decompression algorithm'.

Codecs (Usage)‏ • Codecs encode a stream or signal for transmission, storage or encryption and decode it for viewing or editing. Codecs are often used in videoconferencing and streaming media applications. • An audio compressor converts analogue audio signals into digital signals for transmission or storage. A receiving device then converts the digital signals back to analogue using an audio decompressor, for playback.

Codecs • Most codecs are lossy, allowing the compressed data to be made smaller in size. • There are also lossless codecs, but for most purposes the slight increase in quality might not be worth the increase in data size, which is often considerable. • Codecs are often designed to emphasise certain aspects of the media to be encoded (motion vs. color for example).

Codec Compatibility • There are hundreds or even thousands of codecs ranging from those downloadable for free to ones costing hundreds of dollars or more. This can create compatibility and obsolescence issues. • By contrast, lossless PCM audio (44.1 kHz, 16 bit stereo, as represented on an audio CD or in a .wav or .aiff file) offers more of a persistent standard across multiple platforms and over time.

Container Formats • Many multimedia data streams need to contain both audio and video data, and often some form of metadata that permits synchronisation of audio and video. Each of these three streams may be handled by different programs, processes, or hardware; but for the multimedia data stream to be useful in stored or transmitted form, they must be encapsulated together in a container format.

Container Formats • The widely spread notion of AVI being a codec is incorrect as AVI (nowadays) is a container format, which many codecs might use. • There are other well known alternative containers such as Ogg, ASF, QuickTime, RealMedia, Matroska, AIFF, DivX, FLV and MP4.

Popular Audio Codecs • The most popular audio codecs are all lossy. • Dolby Digital (AC3)‏ • Digital Theatre System Coherent Acoustics (DTS)‏ • MP3 (MPEG-1 Audio Layer 3)‏ • Advanced Audio Coding (AAC)‏ • Vorbis • Free Lossless Audio Codec (FLAC)

Popular Video Codecs • The most popular video codecs are all lossy. • Cinepak • MPEG-1 Video (VCD)‏ • MPEG-2 Video (DVD)‏ • MPEG-4 ASP (includes Xvid, FFmpeg and DivX)‏ • MPEG-4 AVC (x264, HD-DVD, Blu-Ray)‏ • RealVideo • Windows Media Video (includes ASF)‏

Dolby Digital (AC3)‏ • Dolby Digital, or AC-3, is the common version containing up to six discrete channels of sound, with five channels for normal-range speakers (20 Hz – 20,000 Hz) (right front, center, left front, right rear and left rear) and one channel (20 Hz – 120 Hz) for the subwoofer driven low-frequency effects. Mono and stereo modes are also supported. AC-3 supports audio sample-rates up to 48KHz. • Batman Returns was the first film to use Dolby Digital technology when it premiered in theaters in Summer 1992.

Digital Theater System (DTS)‏ • On the consumer level, DTS is the shorthand for the DTS Coherent Acoustics codec, transportable through S/PDIF (Sony-Philips Digital Interconnect Format) and used on DVDs, CDDAs, LDs and in wave files. This system is the consumer version of the DTS standard, using a similar codec without needing separate DTS CD-ROM media. • There are significant technical differences between commercial/theatrical and home variants: the former being a traditional ADPCM compression system and the latter a sophisticated hybrid perceptual and signal-redundancy compressor based on ADPCM called APTX-100.

MP3 • MP3 uses a lossy compression algorithm that is designed to greatly reduce the amount of data required to represent audio recordings, yet still sound like faithful reproductions of the original uncompressed audio to most listeners. An MP3 digital file created using the mid-range bitrate setting of 128 kbit/s results in a file that is typically about 1/10th the size of the CD file created from the same audio source. • LAME is a popular open source MP3 encoder.

MP3 • It provides a representation of sound within a short term time/frequency analysis window, by using psychoacoustic models to discard or reduce precision of components less audible to human hearing, and recording the remaining information in an efficient manner. These techniques are called "Perceptual Coding". • Other techniques such as Huffman Coding (lossless), variable speed encoding and joint stereo encoding are part of the mp3 codec.

Variable BitRate (VBR) Encoding • Variable BitRate encoding is designed for size & quality optimalization. Where there is silence in the music, it is less "demanding" in terms of its encodability, it makes sense to drop the bit rate, simply because there's not much there to encode, and the wasted space is overkill. Where the full orchestra and high noise percussion is joining in, the encoder will choose a higher bitrate appropriate to the demands. Some parts of the music can be encoded in 128 kbps (kilo bits per second) without any quality loss, other parts get the full 320 kbps to make the best of it.

Joint Stereo Encoding • Joint stereo is a method to save some bandwidth by encoding certain parts of the spectrum in mono (i.e. only once) for which the human ear has no directional hearing. These are very low and very high tones. • The bandwidth is saved by recording a wider sum channel and a narrower difference channel, where the difference channel does not contain these spectral components. • This works very well and produces excellent quality at 128 Kbit/s for most pieces of music (but not all!).

Advanced Audio Coding (AAC)‏ • Advanced Audio Coding (AAC) is a standardized, lossy compression and encoding scheme for digital audio. Designed to be the successor of the MP3 format, AAC generally achieves better sound quality than MP3 at the same bitrate, particularly below 192 kbit/s. • AAC's best known use is as the default audio format of Apple's iPhone, iPod, iTunes, and the format used for all iTunes Store audio (with extensions for proprietary digital rights management).

Advanced Audio Coding (AAC)‏ • AAC is a wideband audio coding algorithm that exploits two primary coding strategies to dramatically reduce the amount of data needed to represent high-quality digital audio. • 1. Signal components that are perceptually irrelevant are discarded; • 2. Redundancies in the coded audio signal are eliminated. • 3. The signal is processed by a modified discrete cosine transform (MDCT) according to its complexity. Avoids overlapping artifacts.

Vorbis Audio Codec • Vorbis is a free and open source, lossy audio codec project headed by the Xiph.Org Foundation and intended to serve as a replacement for MP3. It is most commonly used in conjunction with the Ogg container and is therefore called Ogg Vorbis. • For many applications, Vorbis has clear advantages over other lossy audio codecs in that it is patent-free and has free and open-source implementations and therefore is free to use, implement, or modify as one sees fit, yet produces smaller files than most other codecs at equivalent or higher quality

Free Lossless Audio Codec • FLAC is an audio format using lossless compression. It is the most popular lossless audio codec. • FLAC achieves compression rates of 30–50% for most music, with significantly greater compression for voice recordings. • FLAC uses linear prediction to convert the audio samples to a series of small, uncorrelated numbers (known as the residual), which are stored efficiently using Golomb-Rice coding (quotient/remainder pairs using a divisor that is a power of two: 2,4,8...). Entropy coding is used for silent parts.

Cinepak Video Codec • Cinepak is a video codec developed by SuperMatch, a division of SuperMac Technologies, and released in 1992 as part of Apple Computer's Quicktime video suite. It was designed to encode 320x240 resolution video at 1x (150 kbyte/s) CD-ROM transfer rates. The codec was ported to the Windows platform in 1993. It was also used on first-generation and some second-generation CD-ROM game consoles, such as the Atari Jaguar CD, Sega CD, Sega Saturn, and 3DO.

Cinepak Video Codec • Cinepak is based on vector quantization, which is a significantly different algorithm from the discrete cosine transform (DCT) algorithm used by most current codecs (in particular the MPEG family, as well as JPEG). This permitted implementation on relatively slow CPUs, but tended to result in blocky artifacting at low bitrates.

Cinepak Video Codec • Cinepak divides a movie into key images and intra-coded images. Each image is divided into a number of horizontal bands which have individual 256-color palettes transferred in the key images. Each band is subdivided into 4x4 pixel blocks. The compressor uses vector quantization to determine the one or two band palette colors which best match each block and encodes runs of blocks as either one color byte or two color bytes plus a 16-bit vector which determines which pixel gets which color.

MPEG-1 Video Codec • MPEG-1 video was originally designed with a goal of achieving acceptable video quality at around 1.5 Mb/sec data rates and 352x240 pixels (29.97 frame per second) / 352x288 pixels (25 frame per second) resolution. While MPEG-1 applications are often low resolution and low bitrate, the standard allows any resolution less than 4095x4095. One big disadvantage of MPEG-1 video is that it supports only progressive pictures. This deficiency helped prompt development of the more advanced MPEG-2.

MPEG-1 Video Codec • MPEG-1 starts with a relatively low resolution video sequence of about 352 by 240 frames by 30 frames/s, but original high (CD) quality audio. The images are in color, but converted to YUV space, and the two chrominance channels (U and V) are decimated further to 176 by 120 pixels. • The basic scheme is to predict motion from frame to frame in the temporal direction, and then to use DCT's to organize the redundancy in the spatial directions. The DCT's are done on 8x8 blocks, and the motion prediction is done in the luminance (Y) channel on 16x16 blocks.

MPEG-1 Video Codec • In other words, given the 16x16 block in the current frame that you are trying to code, you look for a close match to that block in a previous or future frame. The DCT coefficients (of either the actual data, or the difference between this block and the close match) are quantized. Hopefully, many of the coefficients will then end up being zero. The results of all of this, which include the DCT coefficients, the motion vectors, and the quantization parameters (and other stuff) is Huffman coded using fixed tables.

MPEG-1 Video Codec • There are three types of coded frames. There are I or intra frames. They are simply a frame coded as a still image, not using any past history. You have to start somewhere. • Then there are P or predicted frames. They are predicted from the most recently reconstructed I or P frame. Each macroblock in a P frame can either come with a vector and difference DCT coefficients for a close match in the last I or P, or it can just be "intra" coded (like in the I frames) if there was no good match.

MPEG-1 Video Codec • Lastly, there are B or bidirectional frames. They are predicted from the closest two I or P frames, one in the past and one in the future. You search for matching blocks in those frames, and try three different things to see which works best. You try using the forward vector, the backward vector, and you try averaging the two blocks from the future and past frames, and subtracting that from the block being coded. If none of those work well, you can intra- code the block. B frames are not so popular because the image sequence must be transmitted/stored out of order so that the future frame is available to generate the B frames.

MPEG-2 Video Codec • MPEG-2 is widely used as the format of digital television signals that are broadcast by terrestrial (over-the-air), cable, and direct broadcast satellite TV systems. It also specifies the format of movies and other programs that are distributed on DVD and similar disks. As such, TV stations, TV receivers, DVD players, and other equipment are often designed to this standard.

MPEG-2 Video Codec • The Video section, part 2 of MPEG-2, is similar to the previous MPEG-1 standard, but also provides support for interlaced video, the format used by analog broadcast TV systems. MPEG-2 video is not optimized for low bit-rates, especially less than 1 Mbit/s at standard definition resolutions. However, it outperforms MPEG-1 at 3 Mbit/s and above. All standards-compliant MPEG-2 Video decoders are fully capable of playing back MPEG-1 Video streams.

MPEG-4 Video Codec • MPEG-4 is a collection of methods defining compression of audio and visual (AV) digital data. It was introduced in late 1998 and designated a standard for a group of audio and video coding formats and related technology. • MPEG-4 absorbs many of the features of MPEG-1 and MPEG-2 and other related standards, adding new features such as (extended) VRML support for 3D rendering, object-oriented composite files (including audio, video and VRML objects), support for externally-specified Digital Rights Management and various types of interactivity.

Parts of MPEG-4 Video Codec • MPEG-4 Part 2 is a video compression technology developed by MPEG. It is a discrete cosine transform compression standard, similar to previous standards such as MPEG-1 and MPEG-2. Several popular codecs including DivX, Xvid and Nero Digital are implementations of this standard. • H.264 is a standard for video compression, and is equivalent to MPEG-4 Part 10, or MPEG-4 AVC (for Advanced Video Coding). As of 2008, it is the latest block-oriented motion-compensation-based codec standard. x264 is a free software library for encoding H.264/MPEG-4 AVC video streams.

Lumi Masking • Lumi masking is a technique used by video compression software, which reduces quality in very bright or very dark areas of the picture, as quality loss in these areas is less likely to be visible. It is also known as "psychovisual enhancements" or "adaptive quantization". The reduction in quality (and therefore bitrate) in certain areas of the picture caused by using lumi masking allows more bits to be allocated to the rest of the video, thus improving overall quality. Lumi masking is not perfect, however, and in some cases the degradation in quality it causes is visible.

Trellis Quantization • Trellis quantization is an algorithm that can improve data compression in DCT-based encoding methods. It is used to optimize residual DCT coefficients after motion estimation in lossy video compression encoders such as Xvid. • Trellis quantization reduces the size of some DCT coefficients while recovering others to take their place. Trellis quantization effectively finds the optimal quantization for each block to maximize the peak s/n ratio relative to bitrate. It has varying effectiveness depending on the input data and compression method.

Xvid Video Codec • Xvid is a video codec library following the MPEG-4 standard. Xvid features MPEG-4 Advanced Simple Profile features such as b-frames, global and quarter pixel motion compensation, lumi masking, trellis quantization, and H.263 (another codec), MPEG and custom quantization matrices. • Xvid is a primary competitor of the DivX Pro Codec but unlike the latter, it is free software. Xvid encoded files can be written to a CD or DVD and played in a DivX compatible DVD player. However, Xvid can optionally encode video with advanced features that most DivX Certified set-top players do not support.

FFmpeg • ffmpeg is a command line tool to convert one video file format to another. It also supports grabbing and encoding in real time from a TV card. • libavcodec is a library containing all the FFmpeg audio/video encoders and decoders. Most codecs were developed from scratch to ensure best performance and high code reusability. • libavformat is a library containing demuxers and muxers (multiplexing) for audio/video container formats. • Its best implementation is in the ffdshow encoder.

ffdshow • ffdshow is a media decoder and encoder mainly used for the fast and high-quality decoding of video in the MPEG-4 ASP (e.g. encoded with DivX, Xvid or FFmpeg MPEG-4) and AVC (H.264) formats, but supporting numerous other video and audio formats as well. It is free software released under the GPL license, runs on Windows and is implemented as a DirectShow decoding filter. • The project is now known as ffdshow-tryouts, where bug fixes, stability fixes, new features, and codec updates continue.

Container formats : AVI • Audio Video Interleave, known by its acronym AVI, is a multimedia container format introduced by Microsoft in November 1992 as part of its Video for Windows technology. AVI files can contain both audio and video data in a file container that allows synchronous audio-with-video playback. Like the DVD video format, AVI files support multiple streaming audio and video, although these features are seldom used.

Container formats : AVI • The AVI container has no native support for modern MPEG-4 features like B-Frames. Hacks are sometimes used to enable modern MPEG-4 features and subtitles, however, this is the source of playback incompatibilities. • AVI files do not contain pixel aspect ratio information. It renders all AVI files with square pixels. Therefore, the frame appears stretched or squeezed horizontally when the file is played back. There are other video container formats that allow irregular shaped pixels.

Container formats : DIVX • In June 2005, DivX, Inc. released its own container format called DivX Media Format (.divx extension) to succeed the AVI + DivX combo. However, this format is basically an enhanced AVI format (based on the same RIFF structure, for backward compatibility with existing players and devices) and so far, has gained no perceivable consumer traction, even where the DivX codec was once popular (the Xvid codec has instead become the codec of choice among most of the file-sharing groups).

Container formats : Matroska • The Matroska Multimedia Container is an open standard free container format, a file format that can hold an unlimited number of video, audio, picture or subtitle tracks inside a single file. It is intended to serve as a universal format for storing common multimedia content, like movies or TV shows. • Matroska file types are .MKV for video (with subtitles and audio), .MKA for audio-only files and .MKS for subtitles only. The most common use of .MKV files are used to store HD video files.

Container formats : MP4 • MPEG-4 Part 14, is a multimedia container format standard specified as a part of MPEG-4. It is most commonly used to store digital audio and digital video streams, especially those defined by MPEG, but can also be used to store other data such as subtitles and still images. Like most modern container formats, MPEG-4 Part 14 allows streaming over the Internet. The official filename extension for MPEG-4 Part 14 files is .mp4, thus the container format is often referred to simply as MP4.

Container formats : FLV • FLV files contain video bit streams which are a variant of the H.263 video standard, under the name of Sorenson Spark. Flash Player 8 and newer revisions support the playback of On2 TrueMotion VP6 video bit streams. On2 VP6 can provide a higher visual quality than Sorenson Spark, especially when using lower bit rates. On the other hand it is computationally more complex and therefore will not run as well on certain older system configurations. Flash Player 9 Update 3 includes support for H.264 video standard (MPEG-4 part 10, or AVC) which is even more computationally demanding, but offers significantly better quality/bitrate ratio.

End of lesson

Data Compression II (Codecs and Container Formats)‏