T325: Technologies for digital media

T325: Technologies for digital media Second semester – 2011/2012Tutorial 5 – Video and Audio Coding (1-2) Arab Open University – Spring 2012

Introduction • Video coding in MPEG-2 • MPEG audio coding Outline Arab Open University – Spring 2012

Introduction Arab Open University – Spring 2012

Digital video coding techniques have been used since 1970s, in television studios where equipment costs and the large bandwidths required at the time were not major considerations. • Digital vs. Analog • Digital techniques allow much greater processing flexibility than analogue • Digital material can be re-recorded many times over without loss of quality. • BUT, The large bandwidth and higher costs of receivers, meant that digital video coding was not appropriate for domestic broadcast systems at that time. Digital vs. Analog – At the beginning Arab Open University – Spring 2012

Digital coding had become a practicable possibility for the domestic market due to: • Rapid reduction in costs for digital processing hardware  Reduce equipment cost • Development of highly efficient digital video compression techniques  Minimize bandwidth requirements Introduction Arab Open University – Spring 2012

What are the advantages of Digital techniques over analogue techniques for broadcast TV? Question Arab Open University – Spring 2012

Effect of transmission impairments on picture quality is far less than in the analogue case  Eliminate ‘ghost’ pictures due to the presence of multiple signal transmission paths and reflections. • Digital television allows more channels to be accommodated in a given bandwidth Different types of program material such as teletext or sub-titles in several languages can be accommodated much more flexibly with digital coding. Digital vs. Analog for Broadcast TV Arab Open University – Spring 2012

What do you know about the following standards: JPEG, MPEG? • What are the MPEG standards you have used? Questions Arab Open University – Spring 2012

JPEG stands for Joint Photographic Experts Group • Set up to develop standards for the digital coding of still pictures. • ‘Joint’  Done jointly by the CCITT (now ITU-T) and the ISO • ‘Experts’  were drawn from industry, universities, broadcasting authorities, etc. Introduction Arab Open University – Spring 2012

MPEG stands for Motion Picture Experts Group. • Set up by the ISO to develop coding standards for moving pictures. • Defined a number of standards for the compression of moving pictures • MPEG-1 • MPEG-2 • MPEG-4 • MPEG-7 • MPEG-21 MPEG Standards Arab Open University – Spring 2012

MPEG-1 designed mainly for the efficient storage of moving pictures on CD-ROM, but in a format not suitable for television. • MPEG-2 is effectively a ‘tool box’ of compression techniques which can cater for a wide range of systems, present and future, including low, standard and high definition systems. • MPEG-1 and -2 also include various audio standards, one of which -- the so-called Audio Layer III -- is the basis of MP3 coding. This standard is still widely used in digital television. MPEG standards Arab Open University – Spring 2012

MPEG-4 • Initially intended to provide very high compression rates allowing for transmission of moving images at rates of 64 kbps, or less. • Aims extended to the provision of flexible standards for a wide range of audiovisual material. • It is proposed for the new HDTV planned for many countries over the next few years. • It is already (in 2008) used in many commercial devices, such as domestic video cameras, personal digital assistants (PDAs) and web-based video such as the BBC iPlayer. MPEG standards Arab Open University – Spring 2012

MPEG-7 • Specifies the way multimedia content can be indexed, and thus searched for in a variety of ways relating to the specific medium. • It also has intellectual property aspects • Involves the idea of ‘metadata’ : data that describes the nature of the multimedia object to ease searching. • MPEG-21 • Includes additional digital rights management • Will be considered further in Block 2. MPEG standards Arab Open University – Spring 2012

Video Coding in MPEG-2 Arab Open University – Spring 2012

Both in films and television, moving scenes are shown as a series of fixed pictures • Generated at a rate of about 25 per second • The effect of motion being produced by changes from one picture to the next. • There is often very little change between consecutive pictures • MPEG-2 coding takes advantage of this to achieve high degrees of compression (inter-frame compression). • Even in the case of single pictures, there can be a good deal of redundancy it is possible to remove some fine detail without our perceiving any significant loss of quality (intra-frame compression). Introduction Arab Open University – Spring 2012

Digital audio and video systems are based on the principle of sampling the original sound or image, and processing the samples in order to achieve the desired result, whether transmission, storage or processing of the sound or vision. • The sampling rateultimately depends on the quantity of ‘information’ in the original signal. • Useful information in an audio or video signal is dependent on the way human beings perceive sound, light intensity and color. Introduction Arab Open University – Spring 2012

What will be covered in this part? • How the luminance and the two chrominance signals are sampled before any compression has been applied? • Compressed coding of still pictures, which involves the use of JPEG techniques • The way correlation between successive pictures is used by MPEG • Various levels of compression available with MPEG-2 • Forms of audio coding used with MPEG-2 Introduction Arab Open University – Spring 2012

The human eye is less sensitive to color than to brightness  The chrominance signal does not have to be sampled at such a high rate as the luminance signal. Sampling formats Arab Open University – Spring 2012

Figure (a) represents the luminance sampling. • The figure represents part of a camera scanning raster • Circles show the times when the camera output luminance signal is sampled. • The samples are taken consecutively along each line at the sampling rate • a sample is taken every 1/(13.5 × 106) = 0.074 μs. Sampling formats Arab Open University – Spring 2012

The Cb and Cr chrominance signals are sampled at half the luminance rate • 4:2:2 sampling takes chrominance samples which coincide with alternate luminance ones Sampling formats Arab Open University – Spring 2012

4:2:0 sampling • The chrominance sample values are obtained by averaging the values for corresponding points on two consecutive scan lines. • They represent the chrominance values half-way between these lines • This averaging avoids the more abrupt changes in color that would result from simply omitting half the chrominance samples. This is one of the main formats used for MPEG-2 coding. Sampling format Arab Open University – Spring 2012

Sampling formats : comparison Arab Open University – Spring 2012

When even lower resolution is acceptable, source intermediate format (SIF) may be used. • Used for MPEG-1 coding. • The quality is comparable with that of a VHS video recorder. Sampling formats Arab Open University – Spring 2012

4:2:0 sampling vs. SIF sampling • Sixteen luminance samples are replaced by four • Four chrominance samples are replaced by just one. • The net effect is that both the luminance and chrominance resolutions are halved in both the vertical and horizontal directions. Sampling format Arab Open University – Spring 2012

MPEG is designed to squeeze out as much redundancy as possible in order to achieve high levels of compression. This is done in two stages: • Spatial compression: • uses the fact that, in most pictures, there is considerable correlation between neighboring areas in a picture to compress separately each picture in a video sequence. • Temporal compression: • uses the fact that, in most picture sequences, there is normally very little change during the 1/25 s interval between one picture and the next. • The resulting high degree of correlation between consecutive pictures allows a considerable amount of further compression. The coding of still pictures Arab Open University – Spring 2012

The first stage of spatial compression uses a variety of Fourier transform known as a discrete cosine transform (DCT) on 8×8 blocks of data. • The luminance information for a row of “n” consecutive pixels will consist of “n” numbers. • Example of transform: doubling each number  form of amplification which would double the picture brightness. • Reversible transform in which a set of “n” original data values are converted into a new set of “n” values in such a way that the original set can be recovered by applying what is called the inverse transform to the new set. • Because the process is applied to digital data, that is to a set of discrete numbers, the transform is called a discrete transform. Discrete Cosine Transform (DCT) Arab Open University – Spring 2012

The row of pixels is a digital version of the original analogue signal consisting of a time-varying luminance signal • The value of each consecutive pixel representing the signal luminance at each consecutive sampling interval. • If the picture is a meaningful one, abrupt changes in sample values will be relatively rare. • There are a number of transforms that can be applied to digital data samples so as to yield a set of numbers which correspond, in effect, to the amplitudes of the frequency components of the spectrum of the original analogue signal. Discrete Cosine Transform Arab Open University – Spring 2012

What do you think this implies about the frequency spectrum of the luminance signal? • Abrupt changes in a signal correspond to high frequency components in its spectrum. • If there are not many abrupt changes in the luminance values, then the amplitude of high frequency components will, in general, be small compared with that of low frequency components. Activity 5.1 Arab Open University – Spring 2012

DCT is used for JPEG and MPEG coding. • DCT is a Reversible transform • Applied to “n” original samples, it yields “n” amplitude values, and applying the reverse transform to these “n” amplitudes enables one to recover the original sample values. • But converting “n” original values into “n” new one does not achieve anything in terms of compression. Discrete Cosine Transform Arab Open University – Spring 2012

If the high-frequency components are sufficiently small, then setting them to zero before carrying out the reverse transform will produce a picture which, to a human observer, is effectively the same as the original one  This is the essence of the compression process! Discrete Cosine Transform Arab Open University – Spring 2012

At the transmitter • A DCT is applied to sets of “n” picture samples to yield “n’ amplitude values. • In most cases, the majority of the amplitudes are negligible. • This results in a data set containing many zero values, and such a data set can be compressed and transmitted using far fewer bits than the original samples • Only the relatively few values that make a significant contribution to the perceived picture are transmitted directly. Discrete Cosine Transform Arab Open University – Spring 2012

At the receiving end • The reverse DCT is applied on a set of “n” samples consisting of the received samples, together with the appropriate number of zero-amplitude samples. • Ignoring the low amplitude, high frequency components means that the overall transform process is no longer reversible in a mathematical sense • BUT this does not matter, so long as the recovered picture is sufficiently like the original one to meet the reproduction quality requirements of the system. The discrete cosine transform Arab Open University – Spring 2012

Figure (a) shows the variation of luminance along part of a picture line of length w. • The variation with distance, x, along the line obeys a cosine law with one complete ‘period’ of the cosine taking place over distance w. • The variation of luminance with distance, L1, say, can be represented by the equation The discrete cosine transform Arab Open University – Spring 2012

The resulting picture is shown as thick line in figure (b). • Peak white at either end of the line and the darkest region in the middle. The discrete cosine transform Arab Open University – Spring 2012

Figure (c) shows the case when two complete cycles of luminance just fit into length w of the line. The luminance can be expressed as • The resulting pattern is shown in Figure (d). The discrete cosine transform Arab Open University – Spring 2012

Taking “w” as our unit of length, we can think of the pattern for L1 as having a spatial frequency of one cycle per unit length and of L2 as having a spatial frequency of two cycles per unit length. • This idea can be extended to higher frequencies with luminance components of the form • where r = 1, 2, 3 and Lr has spatial frequency “r” cycles per unit length. The discrete cosine transform Arab Open University – Spring 2012

Figure shows examples of five cosine patterns. • The spatial frequencies for (a) to (d) are 1, 2, 3 and 4 cycles per unit length respectively. • Figure (e) shows a zero frequency, that is constant luminance, pattern which corresponds to a DC component in terms of the usual frequency spectra. The discrete cosine transform Arab Open University – Spring 2012

By combining spatial luminance patterns in appropriate amountsby adding components with appropriately chosen amplitudes (so that the rth component is of form Arcos(2rx/w), with amplitude Ar), one can reproduce any luminance pattern. • In general, abrupt changes in amplitude values for a set of adjacent pixels will be unlikely. • Because of this, the higher spatial frequency components of the pattern may be negligible and do not need to be transmitted. • Applying the reverse transform at the receiving end will recover a satisfactory picture despite the absence of the higher components. The discrete cosine transform Arab Open University – Spring 2012

DCT yields a finite set of discrete frequency components equal in number to the original number of samples in the segment being analyzed • Frequencies are 1, 2, 3, 4 ... times the lowest frequency, together with a zero-frequency (dc) term. • If a line segment consists of eight samples, the DCT will yield eight amplitudes for components with spatial frequencies of 0, 1, 2, ..., 7 cycles per unit length. The discrete cosine transform Arab Open University – Spring 2012

A much higher degree of compression can be achieved by using DCT simultaneously Horizontally (lines) and vertically (columns). • This is done by applying a two-dimensional DCT to rectangular 8 × 8 blocks of pixels. • The two-dimensional DCT applied to the 64 luminance values of an 8 × 8 block yields 64 amplitudes of two-dimensional spatial cosine functions. • The spatial frequencies range from 0 (dc term) to 7 in both directions. • The luminance in each block varies as a cosine function in both the horizontal and vertical directions. The discrete cosine transform – 2D Arab Open University – Spring 2012

The 64 two dimensional cosine functions. The discrete cosine transform Arab Open University – Spring 2012

In general, DCTs can be carried out on arrays of n xnamplitude values. But why is n = 8 chosen? Question Arab Open University – Spring 2012

Computation turns out to be more efficient if “n” is a power of 2. So “n” could be chosen to be 2 or 4 or 8 or 16 and so on. • The bigger the value of “n”, the more computation is involved and the more time is taken by the transform process. • Also, the smaller the value of “n”, the greater the inherent errors in the process. • These errors show up as differences between the original amplitudes and the amplitudes obtained by using the reverse DCT on the result of carrying out a DCT on the original amplitudes. • Tests on typical data indicate errors of the order of 5% for a 4x4 transform and 1% for an 8x8 transform. Beyond this point, the errors drop very slowly with increasing “n”, being of the order of 0.5% for a 256x256 transform. • A 1% luminance error is not really perceptible, whereas a 5% error is. So the 8x8 transform is often the optimum choice. The discrete cosine transform Arab Open University – Spring 2012

The DCT computation is much more efficient, and hence faster, if the original block is symmetrical in both the horizontal and vertical directions. • Thus, if the DCT is to be applied to the block shown in (a) below, the transform which is used is applied to the extended block of (b). The discrete cosine transform Arab Open University – Spring 2012

The extended block to which the DCT is applied has twice the height and twice the width of the original block. • The original block lies in the top left quarter of the extended block and can be reconstructed by combining the top left quarters of the full two-dimensional cosine functions whose amplitudes have been determined by carrying out the DCT. The discrete cosine transform Arab Open University – Spring 2012

Fig. Example of an 8 × 8 DCT. In Figure above, each amplitude applies to a different component and the way the amplitudes are ordered in the right-hand transform output block is shown to the right. Arab Open University – Spring 2012

The output block is organized so that the horizontal frequencies increase from left to right and the vertical frequencies increase from top to bottom. • The top-left component, with zero vertical and horizontal frequencies, is the dc term which represents the average luminance of the block. • The minus signs in the DCT output represent phase differences. Arab Open University – Spring 2012

Looking at the DCT output block of figure above, the dc term, A00 = 826, the A20 term = 15 and the A14 term=−2. • It turns out that each of the components can either be in phase, or 180° out of phase, with any of the others. This is a consequence of applying the transform to a symmetrical block such as (b) above. The discrete cosine transform Arab Open University – Spring 2012

Humans are not very sensitive to fine detail at low luminance levels. • This allows higher spatial frequency components below a certain magnitude to be eliminated. • This is known as Thresholding. • The values of components below a certain threshold are each replaced by a zero value. • Threshold tables are stored in the encoder Thresholding and requantization Arab Open University – Spring 2012

Also, in general, humans are less sensitive to the contribution of high-frequency components compared with lower ones. • This is taken into account by using requantisation: fewer bits are used for the higher-frequency components which remain after Thresholding than for the low-frequency ones. Thresholding and requantization Arab Open University – Spring 2012

T325: Technologies for digital media