Concepts of Multimedia Processing and Transmission

Concepts of Multimedia Processing and Transmission IT 481, Lecture #7 Dennis McCaughey, Ph.D. 23 October, 2006

Broadcast Environment IT 481, Fall 2006

H.264 Overview • 1997: ITU-T Video Coding Experts Group began work • 2001: ISO/MPEG joined the ITU-T and formed a Joint Video Team (JVT) that took over the H.264 project • The JVT objective was the creation of a single video coding standard that would simultaneously result in a new part of the MPEG-4 family of standards and a new ITU-T (H.264) Recommendation IT 481, Fall 2006

History IT 481, Fall 2006

H.264 Advantages • Up to 50% savings in bit rate: Compared to H.263+ or MPEG-4 Simple Profile • High quality video: H.264 offers consistently good video quality at high and low bit rates. • Error resilience: H.264 provides the tools necessary to deal with packet loss in packet networks and bit errors in error-prone wireless networks. • Network friendliness: Through the Network Adaptation Layer, H.264 bit streams can be easily transported over different networks IT 481, Fall 2006

Relationship to Other Standards • Identical specifications have been approved in both ITU-T / VCEG and ISO/IEC / MPEG • In ITU-T / VCEG this is a new & separate standard • ITU-T Recommendation H.264 • ITU-T Systems (H.32x) will be modified to support it ! In • ISO/IEC / MPEG this is a new “part” in the MPEG-4 suite • Separate codec design from prior MPEG-4 visual • New part 10 called “Advanced Video Coding” (AVC – similar to “AAC” position in MPEG-2 as separate codec) • MPEG-4 Systems / File Format has been modified to support it • H.222.0 | MPEG-2 Systems also modified to support it • IETF finalizing RTP payload packetization IT 481, Fall 2006

Applications • Entertainment Video (1-8+ Mbps, higher latency) • Broadcast / Satellite / Cable / DVD / VoD / FS-VDSL / … • DVB/ATSC/SCTE, DVD Forum, DSL Forum • Conversational Services (usu. <1Mbps, low latency) • Circuit Switched • H.320 Conversational H324/M • 3GPP Conversational H.324/M • Packet-switched • H.323 Conversational Internet/best effort IP/RTP • 3GPP Conversational IP/RTP/SIP • Streaming Services (usu. lower bit rate, higher latency) • 3GPP Streaming IP/RTP/RTSP • Streaming IP/RTP/RTSP (without TCP fallback) • Other Services • 3GPP Multimedia Messaging Services IT 481, Fall 2006

Profiles and Levels • Many standards contain different configurations ofcapabilities – often based in “profiles” & “levels” • A profile is usually a set of algorithmic features • A level is usually a degree of capability (e.g. resolution or speed of decoding) • H.264/AVC has three profiles • Baseline (lower capability plus error resilience, e.g., videoconferencing, mobile video) • Main (high compression quality, e.g., broadcast) • Extended (added features for efficient streaming) IT 481, Fall 2006

Grouping of Capabilities into Profiles • Three profiles now: Baseline, Main, and Extended • Baseline (e.g., Videoconferencing & Wireless) • I and P picture types (not B) • In-loop deblocking filter • 1/4-sample motion compensation • Tree-structured motion segmentation down to 4x4 block size • VLC-based entropy coding (CAVLC) • Some enhanced error resilience features • Flexible macroblock ordering/arbitrary slice ordering • Redundant slices • Note: No support for interlaced video in Baseline IT 481, Fall 2006

Non-Baseline Profiles • Main Profile (esp. Broadcast/Entertainment) • All Baseline features except enhanced error resilience features • B pictures • Adaptive weighting for B and P picture prediction • Picture and MB-level frame/field switching • CABAC • Note: Main is not exactly a superset of Baseline • Extended Profile (esp. Streaming/Internet) • All Baseline features • B pictures • Adaptive weighting for B and P picture prediction • Picture and MB-level frame/field switching • More error resilience: Data partitioning • SP/SI switching pictures • Note: Extended is a superset of Baseline (but not of Main) IT 481, Fall 2006

H.264 Encoder IT 481, Fall 2006

H.264 Main Stages • Dividing each video frame into blocks of pixels so that processing of the video frame can be conducted at the block level • Exploiting the spatial redundancies that exist within the video frame by coding some of the original blocks through transform, quantization and entropy coding. • Exploiting the temporal dependencies that exist between blocks in successive frames so that only changes between successive frames need to be encoded. For any given block, a search is performed in the previously coded one or more frames to determine motion vectors that are then used by the encoder and decoder to predict the subject block • Exploiting any remaining spatial redundancies that exist within the video frame by coding residual blocks; i.e. the difference between the original blocks and the corresponding predicted blocks again through transform, quantization and entropy coding. IT 481, Fall 2006

H.264 Features • Enhanced Motion compensation • Multiple block sizes and shapes (MPEG-2 has 0nly 8x8) • Higher resolution ¼ pixel motion estimation • Multiple reference frame selection and bi-directional mode selection • Employs an integer based DCT that does not have the mismatch problem in the inverse transform • Improved entropy coding IT 481, Fall 2006

Intra Prediction & Coding • Intra coding refers to the case where only spatial redundancies within a video picture are exploited. • The resulting frame is referred to as an I-picture and are typically encoded by directly applying the transform to the different macroblocks in the frame. • Encoded I-pictures are large in size since a large amount of information is usually present in the frame, and no temporal information is used as part of the encoding process. • In order to increase the efficiency of the intra coding process in H.264, spatial correlation between adjacent macroblocks in a given frame is exploited. • Based on the observation that adjacent macroblocks tend to have similar properties. • First step in the encoding process for a given macroblock, is the prediction of the macroblock of interest from the surrounding macroblocks (typically the ones located on top and to the left of the macroblock of interest, since those macroblocks would have already been encoded). • The difference between the actual macroblock and its prediction is then coded, which results in fewer bits to represent the macroblock of interest IT 481, Fall 2006

H.264 4x4 Intra Prediction Modes H.264 offers 9 modes for prediction of 4x4 luminance blocks, including DC prediction (Mode 2) and 8 directional modes, labeled 0 thru 8 in Figure 4. This process is illustrated above in which pixels A to M from neighboring blocks have already been encoded and may be used for prediction IT 481, Fall 2006

Examples • Mode 0 (Vertical Prediction) • a, e, i and m are equal to A, • b, f, j and n are equal to B, • c, g, k and o are equal to C, and • d, h, l and p are equal to D. • Mode 3 (Diagonal-Down-Left) • a is equal to (A+2B+C+2)/4, • b, e are equal to (B+2C+D+2)/4, • c, f, i are equal to (C+2D+E+2)/4, • d, g, j, m are equal to (D+2E+F+2)/4, • h, k, n are equal to (E+2F+G+2)/4, • l, o are equal to (F+2G+H+2)/4, and • p is equal to (G+3H+2)/4. IT 481, Fall 2006

Other Intra Prediction Modes • 8x8 Intra Prediction Modes • Luminance Blocks • Uses all nine prediction modes • Chrominance Blocks • Uses four prediction modes (DC, Vertical, Horizontal and Planar). • 16x16 Intra Prediction Modes • For regions with less spatial detail (i.e., flat regions), • Uses four prediction modes (DC, Vertical, Horizontal and Planar) is chosen for the prediction of the entire luminance component of the macroblock • The prediction mode must be encoded for each block. • The mode for each block is efficiently coded by assigning shorter symbols to more likely modes, • The probability of each mode is determined based on the modes used for coding the surrounding blocks IT 481, Fall 2006

Inter Prediction and Coding • Inter prediction and coding is based on using motion estimation and compensation to take advantage of the temporal redundancies that exist between successive frames, hence. • Motion estimation in H.264 supports most of the key features adopted in earlier video standards, but with improved efficiency. • Supports P-pictures (with single and multiple reference frames) and B-pictures, and a new inter-stream transitional picture called an SP-picture. • The inclusion of SP-pictures in a bit stream enables efficient switching between bit streams with similar content encoded at different bit rates, as well as random access and fast playback modes. • Four main motion estimation features used in H.264: • (1) the use of various block sizes and shapes, • (2) the use of high-precision sub-pixel motion vectors, • (3) the use of multiple reference frames, and • (4) the use of de-blocking filters in the prediction loop IT 481, Fall 2006

Block Sizes The availability of smaller motion compensation blocks improves prediction. The small blocks improve the ability of the model to handle fine motion detail and result in better subjective viewing quality by not large blocking artifacts IT 481, Fall 2006

Tree Structure H.264 allows a combination of 4x8, 8x4, or 4x4 sub-blocks within an 8x8 sub-block as shown for a 16x16 macroblock. Zig-zag scan pattern IT 481, Fall 2006

Motion Estimation Accuracy • The prediction capability of the motion compensation algorithm in H.2 64 is further improved by allowing motion vectors to be determined with higher levels of spatial accuracy than in existing standards. • Quarter-pixel accurate motion compensation is the lowest-accuracy form of motion compensation in H.264 • In contrast with prior standards based primarily on half-pixel accuracy, with quarter-pixel accuracy only available in the newest version of MPEG-4. • ¼-pixel spatial accuracy can yield as much as 20% in bit rate savings as compared to using integer-pixel spatial accuracy. IT 481, Fall 2006

Multiple Reference Picture Selection • The H.2 64 standard offers the option of having multiple reference frames in inter picture coding, • Results in better subjective video quality and more efficient coding of the video frame under consideration. • Multiple reference frames might help making the H.264 bit stream error resilient. • There would be additional processing delays and higher memory requirements at both the encoder and decoder. • Using 5 reference frames for prediction can yield 5-10% in bit rate savings as compared to using only one reference frame. IT 481, Fall 2006

De-Blocking (Loop) Filter • H.264 specifies the use of an adaptive de-blocking filter that operates on the horizontal and vertical block edges within the prediction loop • Removes artifacts caused by block prediction errors. The filtering is generally based on 4x4 block boundaries, in which two pixels on either side of the boundary may be updated using a different filter. • The rules for applying the de-blocking filter are intricate and quite complex, • Its use is optional for each slice (loosely defined as an integer number of macroblocks). • The improvement in subjective quality often more than justifies the increase in complexity. • The de-blocking filter yields a substantial improvement in subjective quality. IT 481, Fall 2006

IT 481, Fall 2006

Integer Transform • Prediction error blocks resulting from either intra prediction or inter prediction are then transformed using a new integer DCT. • H.264 is unique in that it employs a purely integer spatial transform • An approximation of the DCT as opposed to the usual floating-point DCT specified with rounding-error tolerances as used in earlier standards. • H.264 allows the use of both 4x4 and 8x8 transform block sizes. • The small shape helps reduce blocking and ringing artifacts, • The precise integer specification eliminates any mismatch issues between the encoder and decoder in the inverse transform. IT 481, Fall 2006

Quantization & transform Coefficient Scanning • The quantization step provides a significant portion of the data compression. • In H.264, the transform coefficients are quantized using scalar quantization. • Fifty-two different quantization step sizes can be chosen on a macroblock basis • Prior standards (H.263 supports thirty-one, for example). • In H.264 the step sizes are increased at a compounding rate of approximately 12.5%, rather than by a constant increment. • The fidelity of chrominance components is improved by using finer quantization step sizes as compared to those used for the luminance. IT 481, Fall 2006

Entropy Coding • Entropy coding is based on assigning shorter code words to symbols with higher probabilities of occurrence, and longer codewords to symbols with less frequent occurrences. • Parameters to be entropy coded include: • Transform coefficients for the residual data, • Motion vectors and • Other encoder information. • Two types of entropy coding have been adopted: • Variable-Length coding (VLC) • Context-Based Adaptive Binary Arithmetic Coding (CABAC). IT 481, Fall 2006

Variable Length Encoding • In some video coding standards, symbols and the associated codewords are organized in look-up tables, referred to as VLC tables, which are stored at both the encoder and decoder. • In H.263, a number of VLC tables are used, depending on the type of data under consideration (e.g., transform coefficients, motion vectors). • H.264 offers a single Universal VLC (UVLC) table that is to be used in entropy coding of all symbols in the encoder except for the transform coefficients. • Simple, • Disadvantage, in that a single table is usually derived using a static probability distribution model, which ignores the correlations between the encoder symbols IT 481, Fall 2006

VLC-Transform Coefficients • In H.264, the transform coefficients are coded using Context Adaptive Variable Length Coding (CAVLC). • CAVLC is designed to take advantage of several characteristics of quantized 4x4 blocks. • First, non-zero coefficients at the end of the zigzag scan are often equal to +/- 1. CAVLC encodes the number of these coefficients (“trailing 1s”) in a compact way. • Second, CAVLC employs run-level coding efficiently to represent the string of zeros in a quantized 4x4 block. • Moreover, the numbers of non-zero coefficients in neighboring blocks are usually correlated. • The number of non-zero coefficients is encoded using a look-up table that depends on the numbers of non-zero coefficients in neighboring blocks. • Finally, the magnitude (level) of non-zero coefficients gets larger near the DC coefficient and get smaller around the high frequency coefficients. • CAVLC takes advantage of this by making the choice of the VLC look-up table for the level adaptive in a way where the choice depends on the recently coded levels IT 481, Fall 2006

Context-Based Adaptive Binary Arithmetic Coding (CABAC) • Arithmetic coding makes use of a probability model at both the encoder and decoder for all the syntax elements • Transform coefficients • Motion vectors. • A process called context modeling increases the coding efficiency of arithmetic coding, • the underlying probability model is adapted to the changing statistics with a video frame • Context modeling provides estimates of conditional probabilities of the coding symbols. • Suitable context models, allow inter-symbol redundancy to be exploited • Switching between different probability models according to already coded symbols in the neighborhood of the current symbol to encode. IT 481, Fall 2006

Context Models • Different models are often maintained for each syntax element (e.g., motion vectors and transform coefficients have different models). • If a given symbol is non-binary valued, it will be mapped onto a sequence of binary decisions, so-called bins. • The actual binarization is done according to a given binary tree – • The UVLC binary tree is often used. • Each binary decision is then encoded with the arithmetic encoder using the new probability estimates, which have been updated during the previous context modeling stage. • After encoding of each bin, we adjust upward the probability estimate for the binary symbol that was just encoded. • Hence, the model keeps track of the actual statistics IT 481, Fall 2006

Example IT 481, Fall 2006

Reference • “Emerging H.264 Standard: Overview and TMS320C64x Digital Media Platform Implementation” UB Video Inc White Paper, 2002 IT 481, Fall 2006

Concepts of Multimedia Processing and Transmission