Yao-Chung Lin Image, Video, and Multimedia Systems Group Information Systems Laboratory

Introduction to H.264/SVC Differences, Possibilities, and Limits Yao-Chung Lin Image, Video, and Multimedia Systems Group Information Systems Laboratory May 10 2006

Scalable Video Coding • A research topic over 20 years • Single bitstream serves diversified clients • Display resolutions (QCIF, CIF, …, HDTV) • Frame rates (15Hz, 30Hz, …) • Bit rates/Qualities • Developing Standard • October 2003, MPEG Call for Proposal • March 2004, 14 proposals submitted and evaluated • 12 proposals are wavelet-based • 2 proposals are extension of H.264/AVC • October 2004, MPEG selected HHI proposal as starting point for H.264/ MPEG-4 AVC Amd.1 • ~2007, final draft will be released

Current Draft • Based on H.264 main profile • MCTF/Hierarchical B-picture (MCTF w/o update step) for temporal scalability • Layered pyramid prediction structure for spatial scalability • Layered, sub bit-plane, and (run,level) coding for SNR scalability

H.264 Profiles

Overall Architecture of SVC A two layer example

Outline • Introduction • Scalabilities • Temporal Scalability • Spatial Scalability • SNR (Quality) Scalability • Other Details • Simulation Results • Conclusion & Discussion

Temporal Scalability • Group of picture (GOP) • Concepts of motion compensated temporal filtering (MCTF) • Hierarchical B-picture

Group of Picture • Instantaneous decoding refreshment (IDR) pictures • Intra coded picture • Also a key picture • A GOP with only one picture • Provide random access ability • Key pictures • The last picture in a GOP • Intra coded • Inter coded by previous key picture • Provide lowest temporal resolution • Non-key picture • Hierarchically predicted B pictures • High pass signal of MCTF • Provide various temporal resolutions • Note: Reference frame number can not be greater than 16

Group of Pictures An example of a group of picture Dyadic, 4 temporal levels [ITU 2006 January, R202]

Concepts of MCTF • Based on lifting scheme • Insures perfect reconstruction • Even if non-linear operations are used • Open loop • Non-recursive temporal decomposition • Prevent drift error • Improves efficient scalable coding, especially with FGS

Lifting Scheme r: reference index m: motion vector Similar to P-picture Similar to B-picture [ITU, 2006 January, R202]

Motion Modes • Variable block-size inter modes from 16x16 to 4x4 • Intra modes: 16x16, 8x8, 4x4 • Direct mode: 16x16, 8x8

Decomposition Structure [HHI Webpage: Scalable Extension of H.264/AVC]

Decomposition Structure • A dyadic decomposition structure for 2N-1 frames delay, where N = temporal decomposition level • Update steps do not cross the GOP border [HHI Webpage: Scalable Extension of H.264/AVC]

Low Delay Support [ITU, 2006 January, R202]

Removal of update step • Introduce high complexity to decoder • Derivation of the motion information for update step • Smaller block sizes • 9-bit residual motion compensation • Provide insignificant coding efficiency than that of closed-loop coding with hierarchical B picture (HB) • Rate-distortion performance of closed-loop coding with HB is higher or similar to that of MCTF-based coding for all test sequences • Except ‘City’ sequence which has 0.5 dB gain • After temporal pre-filtering the sequence, the MCTF gain becomes insignificant [ITU, 2005 July, P059]

Two Closed Loops FGS Layer [ITU, 2005 July, P059]

Spatial Scalability • Layered pyramid prediction structure • Inter-layer intra texture prediction • Inter-layer motion prediction • Inter-layer residual prediction • Extended Spatial Scalability • Cropping • Generic upsampling (non-dyadic spatial resampling)

Layered Pyramid Prediction Structure • Same concepts used in H.262/MPEG-2, H.263, MPEG-4 with additional inter-layer prediction • Each spatial resolution is coded as a new layer with texture and motion refinement • Same mechanism for coarse grain SNR scalability (Spatial downsampling ratio=1)

Inheritance of modes Previous Spatial Layer Current Layer For spatial scaling ratio = 2

Inter-layer Intra Texture Prediction • Unrestricted inter-layer intra texture prediction • Decode and predict from all lower layer in the bitstream • Not supported in the standard • Constrained inter-layer intra texture prediction • For MBs in non-key pictures • The co-located block in the previous layer are intra coded • Not supported in the standard • Constrained inter-layer intra texture prediction for single-loop decoding • For MBs in all pictures (including key pictures) • The co-located block in the previous layer are intra coded • Allow decoding (motion compensation) only current layer • Supported by the current SVC draft

Generation of Inter-layer Texture Prediction • Directly de-block filtering • 4-sample border extension • Interpolation • 2x: Half-pel interpolation filter of AVC • Otherwise: quarter-pel interpolation filter [Schwarz, ICIP 2005]

Inter-layer Motion Prediction • Intra base layer • If previous layer is inter, use scaled partitioning and motion vectors of base layer • If previous layer is intra, predict from previous layer • Quarter pel refinement • Only for reduced spatial resolution • Refine the scaled motion vector of previous layer by +1, 0, and -1 in quarter-sample precision • Send the refinement • None • Motion vector prediction from neighbor blocks • Motion vector prediction from previous layer

Inter-layer Residual Prediction • Predict the residual from previous layer residual • Upsample the residual • 2x: separable bi-linear filter [1,1]/2 • Otherwise: quarter-pel interpolation • Helpful while the motion information is unchanged or slightly changed from previous layer

SNR Scalability • Coarse grain scalability (CGS) • Layered coding • The same mechanism as spatial scalability • Re-quantize the coefficients with finer step • Fine grain scalability • Sub-bitplane arithmetic coding • Re-quantize the coefficients with finer step • Provide a continuous refinement from a quality base layer

Coarse Grain Scalability • Same mechanism as spatial scalability • Except no upsampling • Provide discrete quality refinement • Close to single layer RD performance, if dQP > 6

Fine Grain SNR Scalability • Represent the residual between the original prediction error and base layer representation • Quantized to a bisection step size (dQP~6) • Coded in transform domain for single inverse transform at decoder • Adaptive references for FGS (AR-FGS) provide leaky prediction attenuating drift error

Illustration of AR-FGS Zero Coef. Block [ITU, 2006 Jan. R202]

Outline • Introduction • Scalabilities • Temporal Scalability • Spatial Scalability • SNR (Quality) Scalability • Other details • Simulation Results • Discussion

Other Details • Fidelity resolution extension (FRExt) • Support 8x8 Transform (High Profile) • Increase coding efficiency especially for high-resolution source • Motion search block segment size down to 8x8 only • Weighted prediction • Scale the reference pictures for prediction • Find the weights at encoder • Explicitly send in syntax • Implicitly derive from temporal distance (an option for B-picture)

Other Details • FGS motion • Progressive refinement slice (FGS slice) contains motion data • Provide better prediction • Adaptive GOP Structure (AGS) • Divide a GOP into several sub GOPs by appropriate mode decision • Decreasing the distance between two low-pass pictures • 0.62 dB gain • Detail in [ITU O018] • Loss Aware rate distortion optimization • The mode/parameter decision consider the packet loss • Detail in [ITU P057]

JSVM • Written in C++ • Accessing from CVS • Current version: 5.2 • Last Update: May 2, 2006

Simulation Results • Temporal Scalability • GOP sizes [ITU, 2005 July, P014] • Open loop MCTF vs. closed loop HB [ITU, 2005 July, P059] • Spatial • Given the same base layer • Exam the inter-layer prediction • SNR • CGS, DQP = 2 or 6 • FGS • Key pictures predict from base representation • FGS motion optimized at 1/3 bit rate • Open loop MCTF helpful ? [ITU, P059]

GOP Sizes

Open Loop vs. Closed Loop

Open Loop vs. Close Loop

Summary of Temporal Scalability Features • Hierarchical B pictures • B pictures gives 0.5~1 dB (IPP -> IBBPBBP) • Hierarchical prediction gives additional 0.5 ~ 1 dB • MCTF • Only ‘CITY’ has 0.5 dB gain compared to closed-loop HB • The gain is diminished by encoder MCTF pre-filtering • Improvement comes from hierarchical prediction structure

Simulation Results • Temporal Scalability • GOP sizes [ITU, 2005 July, P014] • Open loop MCTF vs. closed loop HB [ITU, 2005 July, P059] • Spatial • Given the same base layer, exam the inter-layer prediction • Multiple-loop decoding vs. single-loop decoding (constrained inter-layer prediction) [ITU, O074] • SNR • CGS, DQP = 2 or 6 • FGS • Key pictures predict from base representation • FGS motion optimized at 1/3 bit rate

Spatial Scalability [Schwarz, Marpe, and Wiegand, IWSSIP 05]

Spatial Scalability • [Schwarz, Marpe, and Wiegand, IWSSIP 05]

Constrained Inter-Layer Prediction CIF@30 CIF@15 QCIF@15 QCIF@7.5 CIF@15 Foreman, Munich test points

Constrained Inter-Layer Prediction 4CIF@60 CIF@30 QCIF@15 4CIF@30 CIF@30 QCIF@15 Crew, Munich test points

Summary of Inter-layer prediction tools • Inter-layer predictions bring ~2dB gain • Intra prediction ~1dB • Motion prediction 0.5~1dB • Residual prediction ~0.5dB • Constrained inter-layer intra prediction for single layer decoding • Provide low complexity decoding • Pay < 0.5 dB loss

Simulation Results • Temporal Scalability • GOP sizes [ITU, 2005 July, P014] • Open loop MCTF vs. closed loop HB [ITU, 2005 July, P059] • Spatial • Given the same base layer, exam the inter-layer prediction • Multiple-loop decoding vs. single-loop decoding (constrained inter-layer prediction) [ITU, O074] • SNR • CGS, DQP = 2 or 6 • FGS • Key pictures predict from base representation • FGS motion optimized at 1/3 bit rate • Open loop MCTF helpful ? [ITU, P059]

SNR Scalability [Schwarz, Marpe, and Wiegand, IWSSIP 05]

SNR Scalability

Yao-Chung Lin Image, Video, and Multimedia Systems Group Information Systems Laboratory

Yao-Chung Lin Image, Video, and Multimedia Systems Group Information Systems Laboratory

Presentation Transcript

Multimedia Systems

Multimedia Systems

Multimedia Systems

Multimedia Systems

Laboratory Information Systems

Multimedia Information Retrieval Systems

Laboratory Information Management Systems

Multimedia- and Web-based Information Systems

Multimedia Systems

TTM4142 Networked Multimedia Systems Video Basics Image and Video Lossless Compression

Multimedia Systems

Multimedia- and Web-based Information Systems

Multimedia Information Systems

Multimedia Systems

Multimedia- and Web-based Information Systems

Multimedia Systems

Multimedia- and Web-based Information Systems

Laboratory Information Systems Market

Multimedia-Systems:

Multimedia Systems

Laboratory Information Systems Market