ECE 4331, Fall, 2009

ECE 4331, Fall, 2009 Zhu Han Department of Electrical and Computer Engineering Class 22 Nov. 6th, 2009

Motivation – Image compression What linear combination of 8x8 basis signals produces an 8x8 block in the image?

Learning Objectives • Introduction to the DCT and IDCT. • Decomposition of a 2-D DCT to two 1-D DCTs. • Implementation of a 2-D DCT using a 1-D DCT.

EE 554/454 Fall 2007 Introduction • To perform the JPEG coding, an image (in colour or grey scales) is first subdivided into blocks of 8x8 pixels. • The Discrete Cosine Transform (DCT) is the performed on each block. • This generates 64 coefficients which are then quantised to reduce their magnitude.

Introduction • The coefficients are then reordered into a one-dimensional array in a zigzag manner before further entropy encoding. • The compression is achieved in two stages; the first is during quantisation and the second during the entropy coding process. • JPEG decoding is the reverse process of coding.

Implementation of the DCT • DCT-based codecs use a two-dimensional version of the transform. • The 2-D DCT and its inverse (IDCT) of an N x N block are shown below: • 2-D DCT: • 2-D IDCT: • Note: The DCT is similar to the DFT since it decomposes a signal into a series of harmonic cosine functions.

2-D DCT using a 1-D DCT Pair • One of the properties of the 2-D DCT is that it is separable meaning that it can be separated into a pair of 1-D DCTs. • To obtain the 2-D DCT of a block a 1-D DCT is first performed on the rows of the block then a 1-D DCT is performed on the columns of the resulting block. • The same applies to the IDCT. • This process is illustrated on the following slide.

2-D DCT using a 1-D DCT Pair

2-D DCT using a 1-D DCT Pair • 1-D DCT: - 1 N + p 2 ( 2 i 1 ) k å • 1-D IDCT: = x ( i ) C ( k ) X ( k ) cos[ ] N 2 N = 0 k k = 0, 1, 2, …, N-1. and i = 0, 1, 2, …, N-1.

Implementation Issues • Precalculate the DCT coefficients and scale them

Block-based DCT and IDCT in C • The most straightforward way of implementing a DCT and an IDCT is to use the 1-D DCT and IDCT. • The following C code shows how to translate the 1-D equations for the DCT and IDCT into C code. • The program also takes into account the numerical issues associated with fixed-point processors.

Using DCT in JPEG • DCT on 8x8 blocks

Using DCT in JPEG • Block size : small block • - faster • - correlation exists between neighboring pixels • large block • - better compression in “flat” regions • Power of 2 – for fast implementation

Using DCT in JPEG • DCT – basis

Basis vectors

Comparison of DF and DCT

Using DCT in JPEG • For almost flat surface most Gij=0 • For surface that oscillates much many Gij non zero • G00 = DC coefficient • Numbers at top left of Gij contribution of low freq. sinusoidal to the surface, bottom right – high freq.

Using DCT in JPEG • Numbers at top left of Gij contribution of low freq. sinusoidal to the surface, bottom right – high freq. • Scan each block in zig-zag order

Image compression using DCT • DCT enables image compression by concentrating most image information in the low frequencies • Loose unimportant image info buy cut Gij at right bottom • Decoder computes the inverse IDCT

Quantization and Coding Zonal Coding: Coefficients outside the zone mask are zeroed. • The coefficients outside the zone may contain significant energy • Local variations are not reconstructed properly

Poor Lena ( 30:1 compression)

JPEG : Lena survived after 12:1 compression

Motion JPEG • JPEG system for compressing static images could be applied to a sequence of images, compressing each individually, this is called motion JPEG • Motion JPEG takes no advantage of any correlation between successive images • In a typical scene there will be a great deal of similarity between nearby images of the same sequence.

In a typical scene there will be a great deal of similarity between nearby images of the same sequence.

Motion Compensation Approach Basic idea of Motion Compensation: • Many “moving” images or image sequences consist of a static background with one or more moving foreground objects. We can get coding advantage from this. • we code the first frame by baseline JPEG and use this frame as reference image. • Treat the second image block by block and compare each block with the same block in the reference image. • For blocks that have identical block in reference image, we only send a special code instead of whole code. • For other blocks, we just encode them as usual.

Motion Compensation Approach(cont.) Motion Vectors • static background is a very special case, we should consider the displacement of the block. • Motion vector is used to inform decoder exactly where in the previous image to get the data. • Motion vector would be zero for a static background.

Motion Compensation Approach(cont.) Block Matching--how to find the matching block? • Matching criteria: • In practice we couldn’t expect to find the exactly identical matching block, instead we look for close match. • Most motion estimation schemes look for minimum mean square error(MMSE) between block. • Matching block size: • How large the matching block will affect coding efficiency • block size MPEG used: 16×16

Motion Compensation Approach(cont.) Search range: • It’s reasonable to consider an displacement of 360 pixles/s or about 60pixels/image in standard-definition television. • In real-world scenes there is usually more or faster motion horizontally than vertically, generally the width of search area should be twice the height. • Suggested search range: ±60 pixles × 30 pixles

Motion Compensation Approach(cont.) Residuals • The differences between the block being coded and it’s best match are known as residuals. • The residuals maybe encoded and transmitted along with the motion vector, so the decoder will be able to reconstruct the block. • We should compare the bits of transmitting the motion vector plus the residuals with the bits of transmitting the block itself and use the most efficient mechanism.

MPEG-1 Introduction • MPEG: Moving Pictures Experts Group. • MPEG-video is addressing the compression of video signals at about 1.5Mbits/s • MPEG-1 is asymmetric system, the complexity of the encoder is much higher than that of the decoder. Table 1: MPEG-1 Constraints

MPEG Hierarchy The six layers of MPEG video bit stream • Sequence Layer: video clip, complete program item. • Group of Pictures Layer(GOP): include three different coding ways. • Frame Layer • Slice Layer: in case the data is lost or corrupted. • Macroblock Layer: 16×16 luminance block. • Block Layer(DCT unit)

Frame Types in MPEG • Intra frames (I-frames) • A I-frame is encoded using only information from within that frame(intra coded) -- no temporal compression(inter coded). • Non-intra frames (P-frames and B-frames) • motion compensated information will be used for coding. • P frame (predicted frame) use preceding frame as reference image • B frame (bidirectional frame) use both preceding frame and following frame as reference images

Motion estimation for different frames X Z Available from earlier frame (X) Available from later frame (Z) Y

To transmit buffer Variable length code Frame being coded DCT Quantize Inverse DCT Reference frame De- quantize Reconstructing a reference frame that will be the same as at the decoder

A typical group of pictures in display order I P B B B P B B B P B B B 1 5 2 3 4 9 6 7 8 13 10 11 12 A typical group of pictures in coding order I B B B P B B B P B B B P

Y CB CR 0 1 4 5 2 3      Spatial sampling relationship for MPEG-1  -- Luminance sample  -- Color difference sample Coding of Macroblock

Coding of Macroblock (cont.) • Intra coding of macroblocks • just as what JPEG does • MPEG has two default quantization tables, one for intra coding, another one for non-intra coding of residuals JPEG quantization table(luminance) MPEG quantization table(for intra coding)

Coding of Macroblock (cont.) Non-intra coding of macroblocks • The first step is to intra code the macroblock--just in case if we fail to find a reasonable match in motion estimation. • Then we use motion estimation to find the nearest match and get the motion vector. Only luminance samples are used in motion estimation. • Then each DCT block in macroblock will be treated separately. The residuals will be encode by DCT and quantization (use flat table) as in intra coding. DC along with AC • This process is applied to all six blocks in the macroblock • Motion vectors are coded predictively

Coding of Macroblock (cont.) • P-frames • If the block can be skipped, we just send a “skip” code • otherwise, we compare the number of total bits of inter and intra coding, choose the more efficient one. Mark this block accordingly. • B-frames • comparison among three methods of encoding

Rate controller Scale factor Buffer fullness Variable-length coder IN OUT Frame recorder Inverse DCT Transmit buffer Quantize DCT DC Prediction Motion predictor De- quantize Prediction encoder Reference frame Motion vectors A Simplified MPEG encoder

MPEG Standards • MPEG stands for the Moving Picture Experts Group. MPEG is an ISO/IEC working group, established in 1988 to develop standards for digital audio and video formats. There are five MPEG standards being used or in development. Each compression standard was designed with a specific application and bit rate in mind, although MPEG compression scales well with increased bit rates. They include: • MPEG1 • MPEG2 • MPEG4 • MPEG7 • MPEG21 • MP3

MPEG Standards • MPEG-1Designed for up to 1.5 Mbit/secStandard for the compression of moving pictures and audio. This was based on CD-ROM video applications, and is a popular standard for video on the Internet, transmitted as .mpg files. In addition, level 3 of MPEG-1 is the most popular standard for digital compression of audio--known as MP3. MPEG-1 is the standard of compression for VideoCD, the most popular video distribution format thoughout much of Asia. • MPEG-2Designed for between 1.5 and 15 Mbit/secStandard on which Digital Television set top boxes and DVD compression is based. It is based on MPEG-1, but designed for the compression and transmission of digital broadcast television. The most significant enhancement from MPEG-1 is its ability to efficiently compress interlaced video. MPEG-2 scales well to HDTV resolution and bit rates, obviating the need for an MPEG-3. • MPEG-4Standard for multimedia and Web compression. MPEG-4 is based on object-based compression, similar in nature to the Virtual Reality Modeling Language. Individual objects within a scene are tracked separately and compressed together to create an MPEG4 file. This results in very efficient compression that is very scalable, from low bit rates to very high. It also allows developers to control objects independently in a scene, and therefore introduce interactivity. • MPEG-7 - this standard, currently under development, is also called the Multimedia Content Description Interface. When released, the group hopes the standard will provide a framework for multimedia content that will include information on content manipulation, filtering and personalization, as well as the integrity and security of the content. Contrary to the previous MPEG standards, which described actual content, MPEG-7 will represent information about the content. • MPEG-21 - work on this standard, also called the Multimedia Framework, has just begun. MPEG-21 will attempt to describe the elements needed to build an infrastructure for the delivery and consumption of multimedia content, and how they will relate to each other.

JPEG • JPEG stands for Joint Photographic Experts Group. It is also an ISO/IEC working group, but works to build standards for continuous tone image coding. JPEG is a lossy compression technique used for full-color or gray-scale images, by exploiting the fact that the human eye will not notice small color changes. • JPEG 2000 is an initiative that will provide an image coding system using compression techniques based on the use of wavelet technology.

DV • DV is a high-resolution digital video format used with video cameras and camcorders. The standard uses DCT to compress the pixel data and is a form of lossy compression. The resulting video stream is transferred from the recording device via FireWire (IEEE 1394), a high-speed serial bus capable of transferring data up to 50 MB/sec. • H.261 is an ITU standard designed for two-way communication over ISDN lines (video conferencing) and supports data rates which are multiples of 64Kbit/s. The algorithm is based on DCT and can be implemented in hardware or software and uses intraframe and interframe compression. H.261 supports CIF and QCIF resolutions. • H.263 is based on H.261 with enhancements that improve video quality over modems. It supports CIF, QCIF, SQCIF, 4CIF and 16CIF resolutions. • H.264

DivX Compression • DivX is a software application that uses the MPEG-4 standard to compress digital video, so it can be downloaded over a DSL/cable modem connection in a relatively short time with no reduced visual quality. The latest version of the codec, DivX 4.0, is being developed jointly by DivXNetworks and the open source community. DivX works on Windows 98, ME, 2000, CE, Mac and Linux.

ECE 4331, Fall, 2009