Introduction to H.264/AVC Video Coding

Introduction toH.264/AVC Video Coding Thomas Wiegand, Gary J. Sullivan, Gisle Bjøntegaard, and Ajay Luthra, “Overview of the H.264/AVC Video Coding Standard,” IEEE Transactions on Circuits and Systems for Video Technology, Vol. 13, No. 7, JULY 2003 Jörn Ostermann, Jan Bormans, Peter List, Detlev Marpe, Matthias Narroschke, Fernando Pereira, Thomas Stockhammer, and Thomas Wedi, “Video coding with H.264/AVC: Tools, Performance, and Complexity,” Circuits and Systems Magazine, IEEE ,Vol. 4 ,Issue: 1 ,First Quarter 2004

Outline • Goals of the H.264/AVC • Structure of H.264/AVC video encoder • Design feature highlights • prediction methods • Transform details and VLC • Robustness on transmission • Video coding layer • Hypothetical reference decoder • Profiles and Levels • Network adaptation layer • Comparisons MC-2009 VC Lab

Goals of the H.264/AVC • Video Coding Experts Group (VCEG), ITU-T SG16 Q.6 • H.26L project (early 1998) • Target –double the coding efficiency in comparison to any other existing video coding standards for a broad variety applications. • H.261, H.262 (MPEG-2), • H.263 (H.263+, H.263++) MC-2009 VC Lab

Structure of H.264/AVC video encoder H.264/AVC Conceptual Layers Video Coding Layer Encoder Video Coding Layer Decoder VCL-NAL Interface Network Abstraction Layer Encoder Network Abstraction Layer Decoder NAL Decoder Interface NAL Encoder Interface Transport Layer H.264 to File Format TCP/IP H.264 to H.320 H.264 to MPEG-2 H.264 to H.324/M …… …… …… Wired Networks Wireless Networks MC-2009 VC Lab

Design feature highlights (1) — improved on prediction methods • Variable block-size motion compensation with small block sizes • A minimum luma motion compensation block size as small as 4×4. • Quarter-sample-accurate motion compensation • First found in an advanced profile of the MPEG-4 Visual (part 2) standard, but further reduces the complexity of the interpolation processing compared to the prior design. MC-2009 VC Lab

Design feature highlights (2)— improved on prediction methods • Motion vectors over picture boundaries • First found as an optional feature in H.263 is included in H.264/AVC. ------------------------------------------------- • Multiple reference picture motion compensation • Decoupling of referencing order from display order • (X)IBBPBBPBBP… => IPBBPBBPBB… • Bounded by a total memory capacity imposed to ensure decoding ability. • Enables removing the extra delay previously associated with bi-predictive coding. MC-2009 VC Lab

Design feature highlights (3)— improved on prediction methods • Decoupling of picture representation methods from picture referencing capability • Ｂ－framecould not be used as references for prediction • Referencing to closest pictures • Weighted prediction • A new innovation in H.264/AVC allows the motion-compensated prediction signal to be weighted and offset by amounts specified by the encoder. • For scene fading, etc ----------------------------------------------------- MC-2009 VC Lab

Design feature highlights (4)— improved on prediction methods • Improved “skipped” and “direct” motion inference • Inferring motion in “skipped” areas => for global motion • Enhanced motion inference method for “direct” MC-2009 VC Lab

Design feature highlights (5)— improved on prediction methods • Directionalspatial prediction for intra coding • Allowing prediction from neighboring areas that were not coded using intra coding • Something not enabled when using the transform-domain prediction method found in H.263+ and MPEG-4 Visual MC-2009 VC Lab

Design feature highlights (6)— improved on prediction methods • In-the-loop deblocking filtering • Building further on a concept from an optional feature of H.263+ • The deblocking filter in the H.264/AVC design is brought within the motion-compensated prediction loop MC-2009 VC Lab

Design feature highlights (7)— other parts • Small block-size transform • The new H.264/AVC design is based primarily on a 4×4 transform. • Allowing the encoder to represent signals in a more locally-adaptive fashion, which reduces artifacts known colloquially as “ringing”. • Quantization: DPCM for DC terms • Spurious frequencies: truncation mismatch periods MC-2009 VC Lab

Design feature highlights (8)— other parts • Hierarchical block transform • Using a hierarchical transform to extend the effective block size use for low-frequency chroma information to an 8×8 array • Allowing the encoder to select a special coding type for intra coding, enabling extension of the length of the luma transform for low-frequency information to a 16×16 block size MC-2009 VC Lab

Design feature highlights (9)— other parts • Short word-length transform • While previous designs have generally required 32-bit processing, the H.264/AVC design requires only 16-bit arithmetic. • Exact-match inverse transform • Building on a path laid out as an optional feature in the H.263++ effort, H.264/AVC is the first standard to achieve exact equality of decoded video content from all decoders. • Integer transform MC-2009 VC Lab

Design feature highlights (10)— other parts • Arithmetic entropy coding • While arithmetic coding was previously found as an optional feature of H.263, a more effective use of this technique is found in H.264/AVC to create a very powerful entropy coding method known as CABAC (context-adaptive binary arithmetic coding) MC-2009 VC Lab

Design feature highlights (11)— other parts • Context-adaptive entropy coding • CAVLC (context-adaptive variable-length coding) • CABAC (context-adaptive binary arithmetic coding) MC-2009 VC Lab

Design feature highlights (12)— Robustness to data errors/losses and flexibility for operation over a variety of network environments • Parameter set structure • The parameter set design provides for robust and efficient conveyance header information • NAL unit syntax structure • Each syntax structure in H.264/AVC is placed into a logical data packet called a NAL unit MC-2009 VC Lab

Design feature highlights (13)— Robustness to data errors/losses and flexibility for operation over a variety of network environments • Flexible slice size • Unlike the rigid slice structure found in MPEG-2 (which reduces coding efficiency by increasing the quantity of header data and decreasing the effectiveness of prediction), • slice sizes in H.264/AVC are highly flexible, as was the case earlier in MPEG-1. MC-2009 VC Lab

Design feature highlights (14)— Robustness to data errors/losses and flexibility for operation over a variety of network environments • Flexible macroblock ordering (FMO) • Significantly enhance robustness to data losses by managing the spatial relationship between the regions that are coded in each slice • Arbitrary slice ordering (ASO) • sending and receiving the slices of the picture in any order relative to each other • first found in an optional part of H.263+ • can improve end-to-end delay in real-time applications, particularly when used on networks having out-of-order delivery behavior MC-2009 VC Lab

Design feature highlights (15)— Robustness to data errors/losses and flexibility for operation over a variety of network environments • Redundant pictures • Enhance robustness to data loss • A new ability to allow an encoder to send redundant representations of regions of pictures MC-2009 VC Lab

Design feature highlights (15)— Robustness to data errors/losses and flexibility for operation over a variety of network environments • Data Partitioning • Allows the syntax of each slice to be separated into up to threedifferent partitions for transmission, depending on a categorization of syntax elements • This part of the design builds further on a path taken in MPEG-4 Visual and in an optional part of H.263++. • The design is simplified by having a single syntax with partitioning of that same syntax controlled by a specified categorization of syntax elements. MC-2009 VC Lab

Design feature highlights (16)— Robustness to data errors/losses and flexibility for operation over a variety of network environments • SP/SI synchronization/switching pictures • A new feature consisting of picture types that allow exact synchronization of the decoding process of some decoders with an ongoing video stream produced by other decoders without penalizing all decoders with the loss of efficiency resulting from sending an I picture • Enable switching a decoder between different data rates, recovery from data losses or errors, as well as enabling trick modes such as fast-forward, fast-reverse, etc. MC-2009 VC Lab

Coded Video Sequences • A coded video sequence consists of a series of access units that are sequential in the NAL unit stream and use only one sequence parameter set. • Can be decoded independently • Start with an instantaneous decoding refresh (IDR) access unit – must be Intra. • A NAL unit stream may contain one or more coded video sequences. MC-2009 VC Lab

VCL (Video Coding Layer) input video DCT Q VLC - output bitstream 16×16 macroblocks IQ Intra- Prediction IDCT Intra / inter Motion Compensation De-blocking Filter Motion Estimation Frame Memory output video Clipping Decoder MC-2009 VC Lab YCbCr Color Space and 4:2:0 Sampling

Pictures, Frames, and Fields Bottom Field Progressive Frame Top Field ∆t Interlaced Frame (Top Field First) MC-2009 VC Lab

Slices and Slice Groups (1) Slice #0 Slice #1 Slice #2 Subdivision of a picture into slices when not using FMO. (Flexible Macroblock Ordering) MC-2009 VC Lab

Slices and Slice Groups (2) Slice Group #0 Slice Group #0 Slice Group #1 Slice Group #1 Slice Group #2 Subdivision of a QCIF frame into slices utilizing FMO. MC-2009 VC Lab

Slice coding types • I Slice • P Slice • B Slice • SP Slice • Switching between P slices • efficient switching between different pre-coded pictures becomes possible. • SI Slice • Switching between I slices • Allowing an exact match of a macroblock in an SP slice for random access and error recovery purposes. MC-2009 VC Lab

Adaptive Frame/Field Coding Operation • Three modes can be chosen adaptively for each frame in a sequence. • Frame mode • Field mode • Frame mode / Field coded • For a frames consists of mixed moving regions • The frame/field encoding decision can be made for each vertical pair of macroblocks (a 16×32 luma region) in a frame. • to code the nonmoving regions in frame mode and the moving regions in the field mode. • Macroblock-adaptive frame/field (MBAFF) Picture-adaptive frame/field (PAFF) 16% ~ 20% save over frame-only for ITU-R 601 “Canoa”, “Rugby”, etc. MBAFF MC-2009 VC Lab

Macroblock-adaptive frame/field (MBAFF) Top/Bottom Macroblocks in Field Mode A Pair of Macroblocks in Frame Mode MC-2009 VC Lab

PAFF vs. MBAFF • The main idea of MBAFF is to preserve as much spatial consistency as possible. • In MBAFF, one field cannot use the macroblocks in the other field of the same frame as a reference for motion prediction. • PAFF coding can be more efficient than MBAFF coding in the case of rapid global motion, scene change, or intra picture refresh. • MBAFF was reported to reduce bit rates14 ~ 16% over PAFF for ITU-R 601 (Mobile and Calendar, MPEG-4 World News) MC-2009 VC Lab

Intra-Frame Prediction (1) • Intra_4×4 • Well suited for coding of parts of a picture with significant detail. • Intra_16×16 together with chroma prediction • More suited for coding very smooth areas of a picture. • 4 prediction modes • I_PCM • Bypassprediction and transform coding and, send the values of the encoded samplesdirectly MC-2009 VC Lab

A B A +B Intra-Frame Prediction(2) • Intra_16  16 • Vertical prediction • Horizontal prediction • DC-prediction • Plane-prediction • Works very well in areas of a gently changing luminance. • Chrominance signals • 8  8 blocks • Very smooth in most cases. • Use the same modes as in Intra_16  16. MC-2009 VC Lab

Intra-Frame Prediction (3) • In H.263+ and MPEG-4 Visual • Intra prediction is conduced in the transform domain • In H.264/AVC • Intra prediction is always conducted in the spatial domain MC-2009 VC Lab

Intra-Frame Prediction (3) MC-2009 VC Lab

Intra-Frame Prediction (4) MC-2009 VC Lab Across slice boundaries is not allowed.

Inter-Frame Prediction in P slices (1) Segmentations of the macroblock MB Types 8 8 16 8 8 16 8 8 16 16 8 8 8x8 Types 8 4 8 4 4 4 4 8 8 4 *P_Skip MC-2009 VC Lab www.vcodex.com H.264 / MPEG-4 Part 10 : Inter Prediction

Inter-Frame Prediction in P slices (2) The accuracy of motion compensation A aa B b1=(E-5F+20G+20H-5I+J) h1=(A-5C+20G+20M-5R+T) b=(b1+16) >>5 h=(h1+16) >> 5 ---------- j1=cc-5dd+20h1+20m1-5ee+ff j = (j1+512)>>10 ---------- a=(G+b+1) >>1 e=(b+h+1) >> 1 C bb D clipped to 0~255 F G H I J E a b c d e f g clipped to 0~255 cc dd h i j k m ee ff n p q r K L M s N O P R S gg T hh U MC-2009 VC Lab

Inter-Frame Prediction in P slices (3) Multiframe motion-compensated prediction ∆=1 ∆=4 ∆=2 Current Picture 4 Prior Decoded Pictures As Reference MC-2009 VC Lab

Inter-Frame Prediction in B slices • Other pictures can refer pictures containing B slices • Weighted averaging of two distinct motion-compensated prediction • Utilizing two distinct lists of reference pictures (list0, list1) • 4predictiontypes • list0, list1, bi-predictive, direct prediction, B_Skip • For each partition, the prediction type can be chosen separately. MC-2009 VC Lab

Transform, Scaling, and Quantization(1) • 4  4 and 2  2 DCT • Integer transform matrix -1 17 16 INTRA_16 16 H2 H3 H3 0 1 4 5 18 19 22 23 H1 2 3 6 7 20 21 24 25 DCT 8 9 12 13 Cb Cr Cb Cr 10 11 14 15 Transmission order: -1,0,1, …, 24,25 Y Y MC-2009 VC Lab

Transform, Scaling, and Quantization(2)Repeated Transforms • Intra_16×16, chroma intra modes are intend coding for smooth areas • The DC coefficients undergo a second transform with the results that we have transform coefficients covering the whole macroblock 0 00 1 01 indices correspond to the indices of 2×2 inverse Hadamard transform 2 10 3 11 MC-2009 VC Lab Repeat transform for chroma blocks

Transform, Scaling, and Quantization(3) • Quantized by scalar quantizer; the quantization step size is chosen by a so-called quantization parameter (QP) that has 52 values. • An increment of QP by 1 results in an increase of the required data rate of approximately 12%. (The step size doubles with each increment of 6 of QP.) • A change of step size by approximately 12% also means roughly a reduction of bit rate by approximately 12% 21/6  1.12 QSTEP = 2(QP-4)/6 R (1/1.12)R if QSTEP 1.12QSTEP MC-2009 VC Lab

Transform, Scaling, and Quantization(4) • Scanning order • Zig-zag scan • For 2×2 DC coefficients of the chroma component • raster-scan order • Allinverse transform operations in H.264/AVC can be implemented using only additionsand bit-shiftingoperations of 16-bit integer values. No drift problem between encoders and decoders. • Only 16-bitmemory accesses are needed for a good implementation of the forward transform and quantization process in the encoder MC-2009 VC Lab

Entropy Coding • Two methods of entropy coding are supported • An exp-Golomb code - a single infinite-extent codeword table for all syntax elements. • For quantized transform coefficients • Context-Adaptive Variable Length Coding (CAVLC) MC-2009 VC Lab

CAVLC (1) • # of nonzero quantized coefficients (N) and the actual size, and position of the coefficients are coded separately 7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0. • # of nonzero coefficients (N) and “Trailing T1s • T1s = 2, N = 5, • These two values are coded as a combined event. One out of 4 VLC tables is used based on the number of coefficients in neighboring blocks. MC-2009 VC Lab

CAVLC (2) 7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0. 2) Encoding the value of Coefficients ForT1s, only sign need to be coded. Coefficient values are coded in reverse order: -2, 6, … A starting VLC is used for -2, and a new VLC may be used based on the just coded coefficient. In this way adaptation is obtained in the use of VLC tables, Six exp-Golomb code tables are available for this adaptation. MC-2009 VC Lab

CAVLC (3) 7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0. 3) Sign Information For T1s, this is sent as single bit. For the other coefficients, the sign bit is included in the exp-Golomb codes MC-2009 VC Lab

CAVLC (4) 7, 6, -2, 0, -1, 0, 0, 1, 0, 0, 0, 0, 0, 0 ,0 ,0. 4) TotalZeroes The number of zeros between the last nonzero coefficient of the scan and its start. TotalZeroes = 3 N=5, => the number must in the range 0-11, 15 tables are available for N in the range 1-15. (If N=16 there is no zero coefficient.) 5) RunBefore In this example it must be specified how the 3 zeros are distributed. The number of 0s before the last coefficient is coded. 2, => range:0-3 => a suitable VLC is used. 1, => range:0-1 MC-2009 VC Lab

CAVLC vs CABAC • The efficiency of entropy coding can be improved further if the Context-Adaptive Binary Arithmetic Coding (CABAC) is used. • Compared to CAVLC, CABAC typically provides a reduction in bit rate between 5%~15%. • The highest gains are typically obtained when coding interlaced TV signals. MC-2009 VC Lab

CABAC MC-2009 VC Lab

Introduction to H.264/AVC Video Coding

Introduction to H.264/AVC Video Coding

Presentation Transcript

Coding and Compliance

Coding 101 The Partnership TOT, September 22, 2008

An Introduction to Network Coding

Chapter 28 – Multimedia: Audio, Video, Speech Synthesis and Recognition

Video

INTRODUCTION TO CPT CODING

ICD-10-CM Coding

Practical Implementations of Arithmetic Coding

Chapter 9

Chapter 6

2004. 10. 20.

ASP.NET coding models

Creative Coding!

Erasure coding

Respiratory System

COM 205 Multimedia Applications

Audio Video coding Standard of (AVS) China

Transition to ICD-10 for Coding Professionals

CKC Chinese Input System