Entropy coding (Lempel/Ziv)

Entropy coding (Lempel/Ziv) • The coder and the decoder both build up an equivalent dictionary of metasymbols, each of which represents a whole sequence of input tokens. • If a sequence is repeated after a symbol (index) was found for it, then only the symbol becomes part of the coded data and the sequence of tokens referenced by the symbol becomes part of the decoded data later. • As the dictionary is built up based on the data, it is not necessary to put it into the coded data (as it is with the tree in a Huffman coder).

Example: Dictionary Input string:a b b a a b b a a b a b b a a a a b a a b b a Index(code)entry --------------------------------- 0 a 1 b 2 a b 3 b b 4 b a 5 a a 6 a b b 7 b a a 8 a b a 9 a b b a 10 a a a 11 a a b 12 b a a b 13 b b a

Entropy encoding - efficiency • This method becomes very efficient even on virtually random data. • The average compression on text and program data is about 1:2. • The ratio on image data comes up to 1:8 on the average image. • A high level of input noise degrades efficiency significantly (as in run-length encoding). • Entropy coders are a little tricky to implement, as there are usually a few tables (dictionary, etc.), all built up while the algorithm runs.

Entropy encoding – Ziv-Lempel alg. • The Lempel-Ziv(LZ) family of algorithms are calledsubstitutional compressors • A buffer of the previous context from the source stream is kept. • This is done in a form of a sliding window on previous context or a dictionary collecting some previously encountered sequences (substrings). • Current source stream is matched with the previous context. • If the incoming string shows a match with the previous context, a pointer towards the previous context is encoded instead of the original string.

Lempel-Ziv-Welch (LZW) Algorithm • The LZWalgorithm can be summarized as follows: Initialize the dictionary to contain all characters (blocks of length 1); w=nil; while (read a character k) { if wk exists in the dictionary w=wk; else add wk to the dictionary output the code for w; w=k; }

Example • Input string: a b b a a b b a a b a b b a a a a b a a b b a Dictionary Index entryw=nil; 0 a while (read a character k) 1 b { if wk exists in the dictionary w=wk; else add wk to the dictionary output the code for w; w=k; }

Example Input: a b b a a b b a a b a b b a a a a b a a b b a Output: 0 1 1 0 2 4 2 6 5 5 7 3 0 Index entry 0 a 1 b 2 a b 3 b b 4 b a 5 a a 6 a b b 7 b a a 8 a b a 9 a b b a 10 a a a 11 a a b 12 b a a b 13 b b a

GIF • GIF (Graphics Interchange Format) from CompuServe has been developed for graphic interchange, irrespective of the system. • GIF is a lossless method of compression. • A GIF graphic is stored as a sequence of pixels with RGB colour values. The position of individual pixels does not have to be indicated separately, since they are stored sequentially (from top left to bottom right).

GIF cont. • All GIF files start with a signature (version number, etc). This is followed by the screen definition and the global colour scale of the GIF-generator hardware. • Image definition, local colour scale and raster data follow. • The raster data is compressed according to the LZW algorithm • GIFs can also be animated

GIF cont. • The maximum compression available with a GIF depends on the amount of repetition there is in an image. A flat colour will compress well - sometimes even down to one tenth of the original file size - while a complex, non-repetitive image will fare worse, perhaps only saving only 20%. • There are problems with GIFs. One is that they are limited to a palette of 256 colours or less (more later).

Lossy coding techniques • Lossy image coding techniques normally have three components: • image modellingwhich defines such things as the transformation to be applied to the image • parameter quantisationwhereby the data generated by the transformation is quantised to reduce the amount of information (continuous numbers mapped to one of a set of discrete ones). • encoding, where a code is generated by associating appropriate codewords to the raw data produced by the quantiser. • Each of these operations contributes to the compression.

Modelling • Image modelling is aimed at the exploitation of statistical characteristics of the image • i.e. high correlation, redundancy • Typical examples are transform coding methods, in which the data is represented in a different domain (for example, frequency in the case of the Fourier Transform [FT] ) • A reduced number of coefficients contains most of the original information. In many cases this first phase does not result in any loss of information.

Quantisation and Encoding • The aim of quantisation is to reduce the amount of data used to represent the information within the new domain. • Quantisation is in most cases not a reversible operation: therefore, it belongs to the so called 'lossy' methods. • Encoding is usually error free. It optimises the representation of the information (helping, sometimes, to further reduce the bit rate), and may introduce some error detection codes.

Quantisation and Encoding • Many people don't have full-colour (24 bit per pixel) display hardware. Inexpensive display hardware stores 8 bits per pixel, so it can display at most 256 distinct colours at a time. To display a full-colour image, the computer must choose an appropriate set of representative colours and map the image into these colours. This process is called "colour quantization". • On the other hand, a GIF image is restricted to 256 or fewer colors. A GIF always has a specific number of colours in its palette, and the format doesn't allow more than 256 palette entries.

JPEG • Joint Photographic Experts Group standard • This was (and is) a group of experts nominated by national standards bodies and major companies to work to produce standards for continuous tone image coding. • For lossy compression of monochrome or full-colour (24bit), greyscale, digital still images. • Usually uses a transform method followed by run length encoding followed by Huffmann encoding • It subdivides the image into squares (8X8 pixels, etc.) or blocks • The squares can be seen on badly-compressed JPEGs • Then it uses a Discrete Cosine Transformation to turn the square of data into a set of curves, some small and some big, that together make up the image. This is where the lossy bit comes in: depending on how much you want to compress the image the algorithm throws away the less significant part of the data (the smaller curves) which adds less to the overall "shape" of the image.

JPEG • This means that, unlike GIF, you get a say in how much you want to compress an image by. However the lossy compression method can generate artifacts - unwanted effects such as false colour and blockiness (from subdividing the image into blocks) - if not used carefully. • JPEG is designed to exploit known limitations of the human eye, notably the fact that small colour changes are perceived less accurately than small changes in brightness. Thus, JPEG is intended for compressing images that will be looked at by humans. • If you plan to machine-analyze your images (e.g., for accurate measurements, etc.), the small errors introduced by JPEG may be a problem, even if they are invisible to the eye.

JPEG Vs. GIF • JPEG is not going to displace GIF entirely; for some types of images, GIF is superior in image quality, file size, or both. • Generally speaking, JPEG is superior to GIF for storing full-coluor or gray-scale images of "realistic" scenes (scanned photographs) • JPEG has a hard time with very sharp edges: a row of pure-black pixels adjacent to a row of pure-white pixelstend to come out blurred unless you use a very high quality setting. • Any smooth variation in colour, such as occurs in highlighted or shaded areas, will be represented more faithfully and in less space by JPEG than by GIF. • GIF does significantly better on images with only a few distinct colours, e.g. line drawings and simple cartoons. Not only is GIF lossless for such images, but it often compresses them more than JPEG can. For example, large areas of pixels that are all exactly the same colour are compressed very efficiently by GIF.

Example of how well GIFs handle flat colour This is a JPEG of a bar chart, compressed at the highest quality setting in Photoshop. If you look at this image in thousands or millions of colours it probably looks fine. But someone with a 256 colour graphics set-up sees a considerably dithered image, which is the same image seen under 8 bits, for example, in Netscape.

Example of how well GIFs handle flat colour Turning the graphic into a GIF, which uses a set palette instead of JPEGs inherent 16 million colours, forces Netscape to attempt to display the image with your original flat colours. It is possible to save even mpre space by reducing the bit-depth still further. This is a 4-bit image (eg 16 colours). There's a little false colour where the image was anti-aliased when its size was reduced in Photoshop, but the quality is still quite acceptable.

Example JPEG high quality: uses 43k of disk space medium quality: no visible artifacts , 13K of disk space Same image compressed with lowest quality setting in Photoshop, and then blown up 4 times (3.5k) same picture seen under 256 colours in Netscape

JPEG vs GIF • Summary: GIF uses a lossless compression scheme which is optimized for images with regions of solid colour. JPEG uses a lossy compression scheme which is optimized for images with many mixed colours (like photographs). • When to use GIF • Small, iconic and thumbnail images • Graphics, logos, logotype • Flat colour areas • Some small-area photographic/continuous-tone images • When you absolutely want to support every browser • When to consider not using GIF • Some small-area photographic/continuous-tone images • Large-area photographic/continuous-tone Images • When .gif is bigger than .jpeg • Lots and lots of colours

Entropy coding (Lempel/Ziv)