Context-based adaptive binary arithmetic coding in the H.264/AVC video compression

CABAC Context-based adaptive binary arithmetic coding in the H.264/AVC video compression IEEE CSVT July 2003 Detlev Marpe, Heiko Schwarz, and Thomas Wiegand 2003/11/04 Presented by Chen-hsiu Huang

Outline • Introduction • The CABAC framework • Detailed description of CABAC • Experimental result • Conclusion

Past deficiencies • Entropy coding such as MPEG-2, H.263, MPEG-4 (SP) is based on fixed tables of VLCs. • Due to VLCs, coding events with probability > 0.5 cannot be efficiently represented. • The usage of fixed VLC tables does not allow an adaptation to the actual symbol statistics. • Since there is a fixed assignment of VLC tables and syntax elements, existing inter-symbol redundancies cannot be exploited. Why?

Solutions Jump! • The first hybrid block-based video coding schemes that incorporate an adaptive binary arithmetic coder was presented in [6]. • The first standard that use arithmetic entropy coder is given by Annex E of H.263 [4]. • However, the major drawbacks contains: • Annex E is applied to the same syntax elements as the VLC elements of H.263. • All the probability models an non-adaptive that their underlying probability as assumed to be static. • The generic m-ary arithmetic coder used involves a considerable amount of computational complexity.

The bypass coding mode is chosen for selected bins in order to allow a speedup of the whole encoding/decoding process by means of simplified coding engine without the usage of an explicitly assigned model. In the regular coding mode, each bin enters the context modeling stage, where a probability model is selected such that the corresponding choice may depend on previously encoded syntax elements of bins. The CABAC Framework • binarization  context modeling  binary arithmetic coding Figure 1.

Binarization Back • Consider the value “3” of mb_type, which signals the macroblock type “P_8x8”, is given by “001”. • The symbol probability p(“3”) is equal to the product of p(C0)(“0”), p(C1)(“0”), and p(C2)(“1”), where C0, C1, and C2 are denote the binary probability models of the internal nodes. Figure 2.

Adaptive m-ary binary arithmetic coding (m > 2) is in general requiring at least two multiplication for each symbol to encode as well as a number of fairly operations to perform the probability update [36]. • Contrary, fast, multiplication-free variants of binary arithmetic coding, one of which was specifically developed for the CABAC frame, as described below. • Since the probability of symbols with larger bin strings is typically very low, the computation overhead in fairly small and can be easily compensated by using a fast binary coding engine. • Finally, binarization enables context modeling on sub-symbol level. For the most frequently observed bins, conditional probability can be used, while less frequently observed bins can be treaded using a joint, typically zero-order probability model. Why?

Binarization Schemes • A binary representation for a given non-binary valued syntax element should be close to a minimum redundancy code. • Instead of Huffman tree, the design of CABAC (mostly) relies the a few basic code trees, whose structure enables a simple on-line computation of all code words without the need for storing any tables. • Unary code (U) and truncated unary code (TU) • The kth order Exp-Golomb code (EGk) • The fixed-length code (FL) • All the binarization schemes have less probability when the codeword length becomes longer. • In addition, there are binarization schemes based on a concatenation of these elementary types. • As an exception, there are five specific binary trees selected manually for the coding of macroblock and sub-macroblock types. Two of them show in Figure 2.

Unary and Truncated Unary Binarization • For each unsigned integer valued symbol x >= 0, the unary code word in CABAC consists if x “1” bits plus a terminating “0” bit. • The truncated unary (TU) code is only defined for x with 0 <= x <= S, where for x < S the code is given by the unary code, whereas for x=S the terminating “0” bit is neglected. • For example: • U: 5 => 111110 • TU with S=9: • 6: => 1111110 • 9: => 111111111

kth order Exp-Golomb Binarization • The prefix part of the EGk codeword consists of a unary code corresponding to the value l(x)=floor(log2(x/2k+1)) • The EGk suffix part is computed as the binary representation of x+2k(1-2l(x)) using k+l(x) significant bits.

Fixed-Length Binarization • Let x denote a given value of such a syntax element, where 0 <= x <= S. Then, the FL codeword of x is simply given by the binarization representation of x with a fixed (minimum) number lFL=ceil(log2S) of bits. • Typically, FL binarization is applied to syntax elements with a nearly uniform distribution or to syntax elements, where each bit in the FL binary representation represents a specific coding decisions. • E.g. In the part of the coded block pattern symbol related to the luminance residual data.

Concatenation schemes • Three binarization schemes are derived • Concatenation of a 4-bit FL prefix as a representation of the luminance related part of the coded block pattern and a TU suffix with S=2 representing the chrominance related part of code_block_pattern. • Both the second and third are derived from the TU and the EGk binarization, which are referred as Unary/kth order Exp-Golomb (UEGk) binarization, are applied to motion vector differences and absolute values of transform coefficients levels.

The design of these concatenated binarization scheme is motivated by the following observations: • First, the unary code is the simplest prefix-free code in terms of implementation cost. • Second, it permits a fast adaptation of the individual symbol probabilities in the sub-sequent context modeling stage, since the arrangement of the nodes in the corresponding tree is typically such that with increasing distance of the internal nodes from the root node the corresponding binary probabilities are less skewed. • These observations are only accurate for small values of the absolute motion vector differences and transform coefficient levels. For larger values, there is not much use of an adaptive modeling leading to the idea of concatenating and adaptation.

E.g. mvd, motion vector difference • For the prefix part of the UEGk bin string, a TU binarization with a cutoff S=9 is involed for min(|mvd|, 9). • If mvd is equal to zero, the bin string consists only the prefix codeword “0”. • If the condition |mvd| >= 9 holds, the suffix in constructed as an EG3 codeword for the value of |mvd| - 9, to which the sign of mvd is appended using the sign bit “1” for a negative mvd and “0” otherwise. For mvd values with 0 < |mvd| < 9, the suffix consists only of the sign bit.

With the choice of the Exp-Golomb parameter k=3, the suffix code words are given such that a geometrical increase of the prediction error in units of two samples is captured by a linear increase in the corresponding suffix code word length. Figure 3. UEG0 binarization for encoding of absolution values of transform coefficient levels.

Context modeling • Suppose a pre-defined set T of past symbols, a so-called context template, and a related set C={0,…,C-1} of contexts is given, where the context are specified by a modeling function F:TC operating on the template T. • For each symbol x to be code, a conditional probability p(x|F(z)) is estimated by switching between different probability models according to the already coded neighboring symbols z in T. Thus, p(x|F(z)) is estimated on the fly by tracking the actual source statistics. • The number τ of different conditional probabilities to be estimated for an alphabet size of m is equal to τ=C(m-1). • This implies that by increasing the number of C, there is a point where overfitting of the model may occur.

In CABAC, only very limited context templates T consisting of a few neighboring of the current symbol to encode are employed such that only a small number of different context models C is used. • Second, context modeling is restricted to select bins of the binarized symbols. As a result, the model cost is drastically reduced. • Four basic design types of context models can be distinguished in CABAC. The first type involves a context template with up to two neighboring syntax elements in the past of the current syntax element to encode. Figure 4. illustration of a context template consisting of two neighboring syntax element A and B to the left and on top of the current syntax element C.

Types of context modeling • The second type of current is only defined for the syntax elements of mb_type and sub_mb_type. • For this kind of context models, the values of prior coded bins (b0,b1,...,bi-1) are used for the choice of a model for a given bin with index i. Note that in CABAC these context models are only used to select different models for different internal nodes of the corresponding binary trees. Figure 2.

Both the third and fourth type of context models is applied to residual data only. In contrast to all other types of context models, both types depend on the context categories of different block types. • The third type does not rely on past coded data, but on the position in the scanning path. • Significant map • The fourth type, modeling functions are specified that involve the evaluation of the accumulated number of encoded/decoded levels with a specific value prior to the current level bin to encode/decode. • Level information

Context index γ • The entity of probability models used in CABAC can be arranged in a linear fashion called context index γ. • Each probability model relate to a given context index γ is determined by two values, a 6-bit probability state index αγand the (binary) βγof the most probable symbol (MPS). • (αγ βγ,) for 0≤ γ ≤398 represented as 7-bit unsigned integer. Figure 5. syntax elements and associated range of context indices

The context indices in the range from 0 to 72 are related to syntax elements of macroblock, sub-macroblock, prediction modes of special and temporal as well as slice-based and macroblock-based control information. • For this type, a corresponding context index γ can be calculated as γ=ΓS+χS.. ΓS denotes the context index offset, which is defined as the lower value of the range given in Figure 5. And χS denotes the context index increment of a given syntax element S. • Context indices of from 73 to 398 are related to the coding of residual data. • The range value in the lower row of the corresponding syntax elements in Figure 5 specify the context indices for field-based coding mode. In pure frame only 277 out of the total 399 probabilities models are actually used.

Back Figure 6. Basic types with number of coefficients and associated context categories. • For other syntax elements of residual data, a context index γ is given by: γ=ΓS+ΔS(ctx_cat)+χS. Here the context category (ctx_cat) dependent offset ΔS is employed. (Figure 6) • Note that only past coded value of syntax elements are evaluated that belong to the same slice, where the current coding process takes place.

Binary arithmetic coding • Binary arithmetic is based on the principal of recursive interval subdivision. • Suppose that an estimate of the probability pLPS in (0,0.5] of the least probable symbol (LPS) is given and its lower bound L and its width R. Based on this, the given interval is sub-divided into two sub-intervals: RLPS=R•pLPS (3), and the dual interval is RMPS=R-RLPS. • In a practical implementation, the main bottleneck in terms of throughput is the multiplication operation required. • A significant amount of work has been published aimed at speeding up the required calculation by introducing some approximations of either the range R or of the probability pLPS such that multiplication can be avoided. [32-34]

The Q coder [32] and QM and MQ coder [35] both have their inefficiency. Here we designed an alternative multiplication-free one, called modulo coder (M coder), shown to provide a higher throughout than the MQ coder [36]. • The basic idea of M coder is to project both the legal range [Rmin,Rmax) of interval width R and the probability range with the LPS onto a small set of representative Q={Q0,...,QK-1}, P={p0,...,pN-1}. Thus the multiplication on the right-hand side of (3) can be approximated by using a table of K*N pre-computed values. • A reasonable size of the corresponding table and a sufficient good approximation was found by using a set Q of K=4 quantized range values together with a set P of M=64 LPS related probability values. • Another distinct feature in H.264/AVC, as already mentioned above, is its simplicity bypass coding mode (assumed to be uniformly distributed).

Details of CABAC • The syntax elements are divided into two categories. • The first contains elements related to macroblock type, sub-macroblock type, and information of prediction modes both of spatial and of temporal type as well as slice and macroblock-based control information. • In the second, all residual data elements, i.e., all syntax elements related to the coding of transform coefficients are combined. • In addition, a more detailed explanation of the probability estimation process and the table-based binary arithmetic coding engine of CABAC is given.

Coding of macroblock type, prediction mode, and control information • At the top level of the macroblock layer syntax the signaling of mb_skip_flag and mb_type is performed. The binary-valued mb_skip_flag indicates whether the current macroblock in a P/SP or B slice is skipped. • For a given macroblock C, the related context models involves the mb_skip_flag values of the neighboring A at left and B on top. Given by: • χMbSkip(C) = (mb_skip_flag(A) != 0) ? 0: 1 + (mb_skip_flag(B) != 0) ? 0: 1 • If one or both of the neighboring A or B are not available, the mb_skip_type (C) value is set to 0.

Macroblock type • As already stated above. Figure 2 shows the binarization trees for mb_type and sub_mb_type that are used in P/SP slices. • Note the mb_type value of “4” for P slices is not used in CABAC entropy coding mode. For the values “5”-”30” of mb_type, which is further specified in [1]. • For coding a bin value corresponding to the binary decision at an internal node shown in Figure 2, separate context models denote by C0,...,C3 for mb_type and C’0,...,C’3 for sub_mb_type are employed. Figure 2

Coding of prediction modes • Intra prediction modes for luma 4x4: the luminance intra prediction modes for 4x4 blocks are itself predicted resulting in the syntax elements of the binary-values prev_intra4x4_pred_mode_flag and rem_intra4x4_pred_mode, where the latter is only present if the former takes a value of 0. • For coding these syntax elements, two separate probability models are utilized: one for coding of the flag and another for coding each bin value of the 3-FL binarization value of rem_intra4x4_pred_mode.

Intra prediction modes for chroma: • χChPerd(C) = (ChPredInDcMode(A) != 0) ? 0: 1 + (ChPredInDcMode(B) != 0) ? 0: 1 • Reference Picture Index: • χRefIdx(C) = (RefIdxZeroFlag(A) != 0) ? 0: 1 + 2× ((RefIdxZeroFlag(B) != 0) ? 0: 1) • Components of motion vector differences: • mvd(X,cmp) denote the value of a motion vector difference component of direction cmp in {hori, vert} related to a macroblock or sub-macroblock partition X.

Macroblock-based quantization parameter change: • For updating the quantization parameter on a macroblock level, mb_qp_delta is present for each non-skipped macroblock. For coding the signed value δ(C) of this syntax element, δ(C) is first mapped onto a positive value by • δ+(C)=2| δ (C)|-((δ(C)>0) ? 1: 0) • Then δ+(C) is binarized using the unary binarization scheme. • End of slice flag: • For signaling the last macroblock (macroblock pair) in a slice, the end_of_slice_flag is present for each macroblock (pair). • The event of non-terminating macroblock is related to the highest possible MPS possibility • Macroblock pair field flag: • χMbField(C) = mb_field_decoding_flag(A) + mb_field_decoding_flag(B)

Coding of residual data • A one-bit symbol coded_block_flag and a binary-valued significant map are used to indicate the occurrence and the location of non-zero transform coefficients in a given block. • Non-zero levels are encoded in reverse scanning order. • Context models for coding of nonzero transform coefficients are chosen based on the number of previously transmitted nonzero levels within the reverse scanning path.

First the coded block flag is transmitted for the given block of transform coefficients unless the coded block pattern or the macroblock mode indicated that the regarded block has no nonzero coefficients. • If the coded block flag is nonzero, a significant map specifying the position of significant coefficients is encoded. • Finally, the absolute value of the level as well as the sign is encoded for each significant transform coefficient. Figure 7.

Encoding process of residual data • Coded block pattern: For each non-skipped macroblock with prediction mode not equal to intra_16x16, the coded_block_pattern symbol indicates which of the six 8x8 blocks – four luminance and two chrominance– contain nonzero transform coefficients. • A given value of the syntax element coded_block_pattern is binarized using the concatenation of a 4-bit FL and a TU binarization with cutoff value S=2. • Coded block flag: is a one-bit symbol, which indicate if there are significant, i.e. nonzero coefficients inside single block of transform coefficients. • Scanning of transform coefficients: the 2-D array of transform coefficient levels of those sub-blocks for which the coded_block_flag indicates nonzero entries are first mapped onto a 1D list using a given scanning pattern.

Significance map: If the significant_coeff_flag symbol is one, a further one-bit symbol last_significant_coefficient is sent. This symbol indicates if the current significant coefficient is the last in inside the block or if further significant coefficients follow. • Level information: The value of significant coefficients (levels) are encoded by using two coding symbols: coeff_abs_level_minus1, and coeff_sign_flag. The UEG0 binarization scheme is used for encoding of coeff_abs_level_minus1. • The levels are transmitted in reverse scanning order allowing the usage of reasonable adjust context models.

Context models for residual data • Context modes for residual data: In H.264/AVC, there 12 types of transform coefficient blocks, which typically have different kinds of statistics. To keep the number of different context models small, they are classified into five categories as in Figure 6. • For each of these categories, a special set of context models is used for all syntax elements related to residual data. • coded_block_pattern: For bin indices from 0 to 3 corresponding to the four 8x8 luminance blocks, • χCBP(C,bin_idx) = ((CBP_Bit(A) != 0) ? 0: 1) + 2*((CBP_Bit(B) != 0) ? 0: 1) • For indices 4 and 5, are specified in [1] Figure 6

Coded Block Flag: Coding of the coded_block_flag utilizes four different probability models for each of the five categories as specified in Figure 6. • χCBFlag(C) = coded_block_flag(A) + 2*coded_block_flag(B) • Significant map: For encoding the significant map, up to 15 different probability models are used for both significant_coeff_flag and last_significant_flag. • The choice of the models and the context index increments depend on the scanning position • χSIG(coeff[i]) = χLAST(coeff[i]) = i • Level information: Reverse scanning of the level information allows a more reliable estimation of the statistics, because at the end of the scanning path it is very likely to observe the occurrence of successive so-called trailing 1’s.

Probability estimation • For CABAC, 64 representative probability values pσ in [0.01875, 0.5] were derived for the LPS by: • Pσ=α* Pσ-1 for all σ=1,...,63 • α=(0.01875 / 0.5)^(1/63)andp0=0.5 Figure 8. LPS probability values and transition rules for updating the probability estimation of each state after observing a LPS (dashed lines in left direction) and a MPS (solid lines in right direction).

Both the chosen scaling factor α ≈ 0.95 and the cardinality N=64 of the set probabilities represent a good compromise between the desire for fast adaptation (α 0, small N) and sufficiently stable and accurate estimate (α 1, large N). • As a result of this design, each context model in CABAC can be completely determined by two parameters: its current estimate of the LPS probability and its value of MPS βbeing either 0 or 1. • Actually, for a given probability state, the update depends on the state index and the value of the encoded symbol identified either as a LPS or a MPS. • The derivation of the transition rules for the LPS probability is based on the following relation between a given LPS probability pold and its updated counterpart pnew:

Table-based binary arithmetic coding • Actually, the CABAC coding engine consists of two sub-engines, one for regular coding mode and the other for bypass coding engine. • Interval sub-division in regular coding mode: The internal state of the arithmetic encoding engine is as usual characterized by two quantities: the current interval R and the base L of the current code interval. Figure 9.

First, the current interval R is approximated by a quantized value Q(R), using an equi-partition of the whole range 28≤R<29 into four cells. But instead of using the corresponding representative quantized values Q0, Q1, Q2, and Q3. Q(R) is only addressed by its quantizer index ρ, e.g. ρ=(R>>6) & 3. • Thus, this index and the probability state index are used as entries in a 2D table TabRangeLPS to determine (approximate) the LPS related sin-interval range RLPS. Here the table TabRangeLPS contains all 64x4 pre-computed product values pσ․Qρ for 0≤σ≤63, and 0≤ ρ≤3 in 8 bit precision.

Bypass coding mode: To speed up the encoding/decoding of symbols, for which R-RLPS ≈RLPS ≈R/2 is assumed to hold. • The variable L is doubled before choosing the lower or upper sub-interval depending on the value of the symbol to encode (0 or 1). • In this way, doubling of L and R in the sub-sequent renormalization in the bypass is operated with doubled decision threshold. Figure 10.

Renormalization and carry-over control: A renormalization operation after interval sub-division is required whenever the new interval range R no longer stays with its legal range of [28,29). • For the CABAC engine, the renormalization process and carry-over control of [37] was adopted. • This implies, in particular, that the encoder has to resolve any carry propagation by monitoring the bits that are outstanding for being emitted. • More details can be found in [1].

Experimental result • In our experiments, we compare the coding efficiency of CABAC to the coding efficiency of the baseline entropy coding method of H.264/AVC. The baseline entropy coding method uses the zero-order Exp-Golomb code for all syntax elements with the exception of the residual data, which are coded using the coding method of CAVLC [1], [2]. • For the range of acceptable video quality for broadcast application of about 30–38 dB and averaged over all tested sequences, bit-rate savings of 9% to 14% are achieved, where higher gains are obtained at lower rates.

References Back • [1] “Draft ITU-T Recommendation H.264 and Draft ISO/IEC 14 496-10 AVC," in Joint Video Team of ISO/IEC JTC1/SC29/WG11 & ITU-T SG16/Q.6 Doc. JVT-G050, T. Wieg, Ed., Pattaya, Thailand, Mar. 2003. • [2] T. Wiegand, G. J. Sullivan, G. Bjontegaard, and A. Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Trans. Circuits Syst. Video Technol., vol. 13, pp. 560–576, July 2003. • [4] “Video Coding for Low Bitrate Communications, Version 1,” ITU-T, ITU-T Recommendation H.263, 1995. • [6] C. A. Gonzales, “DCT Coding of Motion Sequences Including Arithmetic Coder,” ISO/IEC JCT1/SC2/WP8, MPEG 89/187, MPEG 89/187, 1989. • [32] W. B. Pennebaker, J. L. Mitchell, G. G. Langdon, and R. B. Arps, “An overview of the basic principles of the Q-coder adaptive binary arithmetic coder,” IBM J. Res. Dev., vol. 32, pp. 717–726, 1988. • [33] J. Rissanen and K. M. Mohiuddin, “A multiplication-free multialphabet arithmetic code,” IEEE Trans. Commun., vol. 37, pp. 93–98, Feb. 1989. • [34] P. G. Howard and J. S. Vitter, “Practical implementations of arithmetic coding,” in Image and Text Compression, J. A. Storer, Ed. Boston, MA: Kluwer, 1992, pp. 85–112. • [36] D. Marpe and T.Wiegand, “A highly efficient multiplication-free binary arithmetic coder and its application in video coding,” presented at the IEEE Int. Conf. Image Proc. (ICIP), Barcelona, Spain, Sept. 2003.

Q1. Back • The problem with this scheme lies in the fact that Huffman codes have to be an integral number of bits long. • The optimal number of bits to be used for each symbol is the -log2(1/p), where p is the probability of a given character. • Thus, if the probability of a character is 1/256, such as would be found in a random byte stream, the optimal number of bits per character is log base 2 of 256, or 8. • If the probability goes up to 1/2, the optimum number of bits needed to code the character would go down to 1. • If a statistical method can be developed that can assign a 90% (> 0.5) probability to a given character, the optimal code size would be 0.15 bits. The Huffman coding system would probably assign a 1 bit code to the symbol, which is 6 times longer than is necessary.

Q2. Back • For each symbol to encode, the upper bound u(u) and low bound l(l) of the interval containing the tag for the sequence must be computed.

H.264 / MPEG-4 Part 10 : Introduction to CABAC • When entropy_coding_mode is set to 1, an arithmetic coding system is used to encode and decode H.264 syntax elements. • The arithmetic coding scheme selected for H.264, Context-based Adaptive Binary Arithmetic Coding or CABAC, achieves good compression performance through • Selecting probability models for each syntax element according to the element’s context, • Adapting probability estimates based on local statistics and • Using arithmetic coding.

Coding stages • Binarization • CABAC uses Binary Arithmetic Coding which means that only binary decisions (1 or 0) are encoded. • A non-binary-valued symbol (e.g. a transform coefficient or motion vector) is "binarized" or converted into a binary code prior to arithmetic coding. • This process is similar to the process of converting a data symbol into a variable length code but the binary code is further encoded (by the arithmetic coder) prior to transmission.

Context model selection • A "context model" is a probability model for one or more bins of the binarized symbol. • This model may be chosen from a selection of available models depending on the statistics of recently-coded data symbols. • The context model stores the probability of each bin being "1" or "0". • Arithmetic encoding • An arithmetic coder encodes each bin according to the selected probability model. • Note that there are just two sub-ranges for each bin (corresponding to "0" and "1"). • Probability update • The selected context model is updated based on the actual coded value (e.g. if the bin value was "1", the frequency count of "1"s is increased). • Above stages are repeated for each bit (or “bin”) of the binarized symbol.

Context-based adaptive binary arithmetic coding in the H.264/AVC video compression

Context-based adaptive binary arithmetic coding in the H.264/AVC video compression

Presentation Transcript

Arithmetic Coding: Basic Ideas

Overview of the H.264/AVC Video Coding Standard

Overview of the H.264

Chapter 4 –Binary Arithmetic

Overview of the H. 264/AVC video coding standard

Binary Arithmetic

Arithmetic operations in binary

Registers and Binary Arithmetic

Binary Arithmetic

H.264

Arithmetic Coding

Adaptive Huffman Coding

Combined scalability coding based on the scalable extension of H.264/AVC

H.264

Context-based Adaptive Coding and the Emerging H.26L Video Compression Standard

Context-Based Adaptive Binary Arithmetic Coding in the H.264/AVC Video Compression Standard

Video Transcoding in H.264

H.264/AVC

Scalable Video Coding Scalable Extension of H.264 / AVC

Context-based, Adaptive, Lossless Image Coding (CALIC)

Overview of the H. 264/AVC video coding standard

Overview of the H.264/AVC Video Coding Standard