420 likes | 636 Views
Video Coding Standards. Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2011 Last updated 2011. 5. 13. Agenda . History and Concepts JPEG and JPEG-2000 MPEG-1 and MPEG-2 MPEG-4 H.261 and H.263 H.264 Beyond H.264.
E N D
Video Coding Standards Heejune AHN Embedded Communications Laboratory Seoul National Univ. of Technology Fall 2011 Last updated 2011. 5. 13
Agenda • History and Concepts • JPEG and JPEG-2000 • MPEG-1 and MPEG-2 • MPEG-4 • H.261 and H.263 • H.264 • Beyond H.264
1. Standards and Standards Bodies • VCEG (video coding expert group) in ITU (formerly CCITT) • Focus on real-time, two-way video communication • MPEG/JPEG (moving picture expert group) in ISO • Focus on multimedia storage and distribution for entertainment • Some are overlapped ITU VCEG ISO MPEG/JPEG H.261 MPEG-1 MPEG-2 => H.262 JPEG H.263 MPEG-4 JPEG-2000 H.264 MPEG-4/AVC <= MPEG-7 H.264 High Profile H.264 SVC H.264 MVC HEVC(H.265) MPEG-21
History of Video Coding Standards HP HEVC SVC MVC 2011
ISO-MPEG/JPEG • JPEG (1992) : compression of still image (DCT) • MPEG-1 (1993) : real time play back of VHS quality on Video CD (1.4Mbps) • MPEG-2 (1995) : broadcasting quality video service (3~5Mbps) • MPEG-4 (1998) : wide bandwidth (20bps to high) and object oriented coding • JPEG-2000 (2000) : better quality still image • ITU-VCEG • H.261 (1990) : video telephony over ISDN (px64kbps) • H.263 (1995) : video telephony over circuit and packet network, at 20 kbps to high bandwidth • H.264 (2003) : multipurpose better quality video coding • Others • MPEG-7 (Multimedia content description interface) for search and retrieval in multimedia DB • MPEG-21(Multimedia Framework) for multimedia delivery for interoperability
Standards process and usage • Standards process • Understanding standards • Only Syntax and Decoder system are defined in Standards. • Encoder, application, and Implementation are open to users • Standards provides “profile and level” and recommended usage for helping users to choose from many technical options. Int’l St’ds Draft St’ds Test Model (Docs & ref. SW) Scope & Aim of St’ds Performance & complexity evaluation Proposals From Companies, Universities Improvement Proposals
2. JPEG • ISO IS-10918 • By ISO/IEC JTC1/SC29/WG10, (1984~1992) • Widely used in WWW and digital photography • Motion-JPEG is just a successive stream of JPEG images
Baseline JPEG Codec SSSS-value DC Huffman tables • RGB or YCbCr coded in either separately or in interleaved order dc quantization indices bits Differential Coding VLC input image Uniformscalarquantization Level offset 8x8 DCT [0,255] => [-128,127] Zig-zag scan Run-level coding VLC bits ac quantization indices Quantization tables AC Huffman tables RRRRSSSS-value 8x8 blocks
Lossless JPEG • DPCM used, prediction from 3 neighbors pixels • Optional mode • Progressive encoding • Store image data in order of DC only, low-frequency AC, high frequency AC • Hierarchical encoding • Store image data in low resolution to high resolution • Motion-JPEG • Just a sequence of JPEG still images • Low complexity, Error tolerance, Market awareness • Used for video conferencing and surveillance before widely available cheap MPEG-1/2/4 solution in a market
JPEG-2000 • Features • Good compression performance than JPEG • at high compression ratio, no blocking effects • Good compression for continuous tone, bi level (text) • Both lossless and lossy compression in one framework • ROI (region of interest) support • Error resilient support (data partitioning) • Rather slow in current embedded system due to complexity • Encoding process bits Arithmetic Encoder Quantizer (Tiling) Wavelet Transform image
Comparison between JPEG vs. JPEG-2000 Lenna, 256x256 RGB Baseline JPEG: 4572 bytes Lenna, 256x256 RGB JPEG-2000: 4572 bytes
Coder Control Control Data DCT Coefficients Intra-frame DCT Coder Quant - Intra-frame Decoder Decoder DeQ Entropy coder 0 Motion- Compensated Predictor Intra/Inter Motion Data Motion Estimator MPEG-1/2 • MC-DCT Hybrid Coding
MPEG-1 • MPEG-1 • Targeted VHS quality(352x288, 30fps, YCbCr420) on VCD (600MB) • 1.4 Mbps (1.2 Mbps video + 0.2 Mbps audio) VCD, 70 minutes • Three parts: Part 1 System, Part 2 Video, Part 3 Audio • Technology • MC-DCT Hybrid • Macro-block (16x16 pixels): Motion estimation unit • Block (8x8 pixels): DCT and Quant unit • GOP structure • I, P, B picture • Trade-off between random access and coding efficiency • Asymmetric complexity • Larger memory and high computation required at Encoder
MPEG-1 Structure • Syntax Hierarchy • Sequence layer • GOP layer • Picture Layer • Slice Layer • MB Layer • Block Layer
Picture Coding • I Picture: no interframe prediction • P Picture: interframe prediction from one casual reference picture • B Picture: interframe prediction from one previous and one future picture • GOP and picture order • display order (input at encoder) • Transmission order (Encoding/decoding order) I1 B1 B2 P1 B5 I2 B4 P2 B6 B7 B1 I1 B2 B5 P1 I2 P2 B4 B6 B7
MPEG-2 • Major target application • Digital television quality (720x576/480, 25/30 fps) at 3 ~ 4Mbps • Interlaced video support • Frame picture vs field picture : motion compensation unit • Frame DCT vs field DCT in frame picture field picture field picture frame picture Frame DCT Field DCT
Scalability Support • Spatial scalability • Low resolution at Base layer and high resolution at Enhancement layer • BL is used for prediction of EL • E.g. SD resolution at BL, HD resolution at EL • Temporal scalability • 30 fps at BL, 60 fps at EL • SNR scalability • Same resolution but different quality • Data partitioning • Coding Data is packed into different stream BL bit stream BL Dec Lower Quality BL Enc down EL Enc EL Enc Input video Higher Quality EL bit stream
Profile & Level • MPEG-2 has many options; all implementation do not needs all of them • Profiles • Simple : 4:2:0 input, I and P picture only, low complexity & low perf. • Main : 4:2:0 input, I,P,B Picture, interlaced • 4:2:2 : 4:2:2 input (same vertical resolution of color) • SNR : SNR scalable • Spatial : Spatial scalable • High : Spatial and 4:2:2 • Level • Low (352x288), Main(720x576), High 1440 (1440x1152), High (1920x1152) • E.g. • MPEG-1 : Main profile & Low Level • SD DTV, DVD : Main profile & Main Level • HDTV : Main profile & High Level (Historically MPEG-3’s target application)
MPEG-4 • Features • Support for low bit rate (from 20 Kbps) • Support for object based coding • Reuse of components, composition, and interactivity support. • In practice, object based is not well used • Object-based Coding • Video Object • Shape Coding : transparent/opaque region, binary or grey scale • Texture coding with arbitrary shape • DCT after zero filling in interblock and exrapolation in Intrablock VO3 VO1 VO2
H.261 • ITU Mostly focus on real-time communication • H.261 • First video coding std(1990) • N-ISDN (1990’s) • px64Kbps (p=1,..30), typically 64 ~ 384kbps • Circuit network based: low delay, reliable • H.261 key features • YCbCr420 CIF, QCIF input • MC-DCT • Integer-pel motion • Optional loop filter (for deblocking) • Filtering at 8x8 block boundary • FEC used
H.261 syntax structure • H.261 Bit structure
H.263 Versions Version 1 (1995) Improvement to H.261 4 optional modes Version 2 (2000, H.263+) 12 optional modes Version 3 (2002, H.263++) 19 optional modes Key Features Targets to 20 kbps and for packet based network also Half-pel prediction Redesigned 3-D VLC code H.263
H.263 Optional Modes • Annex D: Unrestricted motion vectors • Annex E: Syntax-based arithmetic coding • Annex F: Advanced Prediction • Annex G: PB Frames • Annex I : Advanced Intra Coding • Annex J: Deblocking Filter • Annex K: Slice Structured Mode • Annex L: Supplemental enhancement information • Annex M: Improved PB frames • Annex N: Reference Picture Selection • Annex O: Scalability • Annex P: reference picture resampling
(continued) • Annex Q: Reduced resolution update • Annex R: Indepenedent Segment Decoding • Annex S: Alternative inter VLC • Annex T: Modified Quantization • Annex U: Enhanced reference picture selection • Annex V: Data partition slice • Annex W: Additional supplemental enhancement information
H.264 • Name • ITU H.264 = ISO MPEG-4 Part 10/AVC • H.26L : Long term enhancement, not compatible H.263 • Now accepted in DMB-T/S, IPTV, replacing many MPEG-2 solutions • For 50% gain to H.263+
Key features • Smaller processing units (upto 4x4 pixel block) • Intra prediction • Inter prediction • Macroblock based Interframe prediction selection • ¼ pixel motion vector support • Motion vector options for subblocks • 4x4 Integer DCT • Deblocking filter • Universal VLC • CAVAC (content-based adaptive binary arithmetic coding)
A B M C D I J K L M A B C D I M A B C D J I K Mean (A-D, I-M) J M A B C D E F G H L K I L J K L H H H H H H V V V V V V H H …….. …….. Mean (H, V) Mean (H, V) V V …….. …….. Intra-frame Prediction • luma - 4x4: 9 modes - 16x16: 4 modes • chroma - 8x8: 4modes - The same prediction mode is always applied to both chroma blocks …
I P B Inter-frame Prediction
Transform and Quantization • Integer DCT • No encoder decoder mismatch • Three types of transformfollowed by quantization - Type 1: for the 4x4 array of luma DC coefficients in intra MBs predicted in 16x16 mode # -1 - Type 2: for the 2x2 array of chroma DC coefficients #16-17 - Type 3: for all other 4x4 blocks # 0-15, 18-25 ( 16x16 Intra Mode only) 16 17 -1 4 pixels 4 pixels 4 pixels 4 pixels 4 pixels 4 pixels 0 1 4 5 18 19 22 23 2 3 6 7 20 21 24 25 12 13 8 9 10 11 14 15 *Data is transmitted in the numbered order
4×4 DCT ( X – Input, Y – output) 4×4 integer transform - forward - backward Transform and Quantization W Post-scaling factor (PF)
A boundary-strength (BS) parameter is assigned to every 4×4 block BS = 0 No filtering BS = 1-3 Slight filtering BS = 4 Strong filtering Filters only when |P0-Q0|< α |P1-P0|< β |Q1-Q0|< β Thresholds α and β depend on the average quantization parameter (QP) The deblocking filtering accounts for 1/3 of the computational complexity of a decoder. Deblocking Filters
Network Adaptation • VCL & NAL • VCL (video coding layer) • NAL (network adaptation layer) • Error Resilient Tools • Flexible macroblock ordering (FMO) • Allows to assign MBs to slices In an order other than scan order • Arbitrary slice ordering (ASO) • Improved end-to-end delay in real-time applications • Redundant slices (RS) • Redundant representations are coded using different coding parameters Slice Group #0 Slice Group #1
Profile & Level • Main application • Baseline : Video telephony • Main : DTV and Storage • Extended :Streaming • Profile & tools
Conclusion • Many video coding standards • St’ds reflect Coding Technology and Implementation Technology • Coding performance has improved over 4 times since H.261 (1990) • What’s next • SVC (Scalable Video Coding) in H.264 (done) • H.264ext (further improvement of H.264) • 3-D and MVC (Multi-View Coding) is on going. • UDTV (ultra Definition TV: 3840x2160) • And what’s next?