- 98 Views
- Uploaded on
- Presentation posted in: General

Image Compression (Chapter 8)

Image Compression (Chapter 8)

CS474/674 – Prof. Bebis

- Digital images require huge amounts of space for storage and large bandwidths for transmission.
- A 640 x 480 color image requires close to 1MB of space.

- The goal of image compression is to reduce the amount of data required to represent a digital image.
- Reduce storage requirements and increase transmission rates.

- Lossless
- Information preserving
- Low compression ratios

- Lossy
- Not information preserving
- High compression ratios

- Trade-off: image quality vs compression ratio

- Data and information are not synonymous terms!
- Datais the means by which information is conveyed.
- Data compression aims to reduce the amount of data required to represent a given quantity of information while preserving as much information as possible.

- The same amount of information can be represented by various amount of data, e.g.:

Ex1:

Your wife, Helen, will meet you at Logan Airport in Boston at 5 minutes past 6:00 pm tomorrow night

Your wife will meet you at Logan Airport at 5 minutes past 6:00 pm tomorrow night

Helen will meet you at Logan at 6:00 pm tomorrow night

Ex2:

Ex3:

compression

Compression ratio:

- Relative data redundancy:

Example:

(1) Coding

(2) Interpixel

(3) Psychovisual

- Compression attempts to reduce one or more of these redundancy types.

Code: a list of symbols (letters, numbers, bits etc.)

Code word: a sequence of symbols used to represent a piece of information or an event (e.g., gray levels).

Code word length: number of symbols in each code word

N x M image

rk: k-th gray level

P(rk): probability of rk

l(rk): # of bits for rk

Expected value:

- l(rk) = constant length

Example:

- l(rk) = variable length
- Consider the probability of the gray levels:

variable length

- Interpixel redundancy implies that any pixel value can be reasonably predicted by its neighbors (i.e., correlated).

autocorrelation: f(x)=g(x)

- To reduce interpixelredundnacy, the data must be transformed in another format (i.e., through a transformation)
- e.g., thresholding, differences between adjacent pixels, DFT

- Example:

(profile – line 100)

original

threshold

thresholded

(1+10) bits/pair

- The human eye does not respond with equal sensitivity to all visual information.
- It is more sensitive to the lower frequencies than to the higher frequencies in the visual spectrum.
- Idea: discard data that is perceptually insignificant!

Example: quantization

256 gray levels

16 gray levels

16 gray levels

i.e., add to each pixel a

small pseudo-random number

prior to quantization

C=8/4 = 2:1

- What is the information content of a message/image?
- What is the minimum amount of data that is sufficient to describe completely an image without loss of information?

- Information generation is assumed to be a probabilistic process.
- Idea: associate information with probability!

A random event E withprobability P(E) contains:

Note: I(E)=0 when P(E)=1

- Suppose that gray level values are generated by a random variable, then rk contains:

units of information!

How much information does an image contain?

- Average information content of an image:

using

Entropy

units/pixel

(assumes statistically independent random events)

Redundancy (revisited)

- Redundancy:

where:

Note: of Lavg= H, the R=0 (no redundancy)

- It is not easy to estimate H reliably!

image

First order estimate of H:

- Second order estimate of H:
- Use relative frequencies of pixel blocks :

image

- The first-order estimate provides only a lower-bound on the compression that can be achieved.
- Differences between higher-order estimates of entropy and the first-order estimate indicate the presence of interpixel redundancy!

Need to apply transformations!

- For example, consider differences:

16

- Entropy of difference image:

- Better than before (i.e., H=1.81 for original image)

- However, a better transformation could be found since:

- Mapper: transforms input data in a way that facilitates reduction of interpixel redundancies.

Quantizer: reduces the accuracy of the mapper’s output in accordance with some pre-established fidelity criteria.

Symbol encoder: assigns the shortest code to the most frequently occurring output values.

- Inverse operations are performed.
- But … quantization is irreversible in general.

How close is to ?

Criteria

Subjective: based on human observers

Objective: mathematically defined criteria

Root mean square error (RMS)

Mean-square signal-to-noise ratio (SNR)

RMSE = 5.17

RMSE = 15.67

RMSE = 14.17

- A variable-length coding technique.
- Optimal code (i.e., minimizes the number of code symbols per source symbol).
- Assumption: symbols are encoded one at a time!

- Forward Pass
- 1. Sort probabilities per symbol
- 2. Combine the lowest two probabilities
- 3. Repeat Step2 until only two probabilities remain.

- Backward Pass
Assign code symbols going backwards

- Lavg using Huffman coding:
- Lavg assuming binary codes:

- After the code has been created, coding/decoding can be implemented using a look-up table.
- Note that decoding is done unambiguously.

- No assumption on encode source symbols one at a time.
- Sequences of source symbols are encoded together.
- There is no one-to-one correspondence between source symbols and code words.

- Slower than Huffman coding but typically achieves better compression.

- A sequence of source symbols is assigned a single arithmetic code word which corresponds to a sub-interval in [0,1].
- As the number of symbols in the message increases, the interval used to represent it becomes smaller.
- Smaller intervals require more information units (i.e., bits) to be represented.

0

1

Encode message: a1 a2 a3 a3 a4

1) Assume message occupies [0, 1)

2) Subdivide [0, 1) based on the probability of αi

3) Update interval by processing source symbols

Encode

a1 a2 a3 a3 a4

[0.06752, 0.0688)

or,

0.068

- The message a1 a2 a3 a3 a4is encoded using 3 decimal digits or 3/5 = 0.6 decimal digits per source symbol.
- The entropy of this message is:
Note: finite precision arithmetic might cause problems due to truncations!

-(3 x 0.2log10(0.2)+0.4log10(0.4))=0.5786 digits/symbol

Arithmetic Decoding

1.0

0.8

0.72

0.592

0.5728

a4

0.5856

0.57152

0.8

0.72

0.688

Decode 0.572

a3

0.5728

056896

0.4

0.56

0.624

a2

a3 a3 a1 a2 a4

0.2

0.48

0.592

0.5664

0.56768

a1

0.0

0.4

0.56

0.56

0.5664

- Requires no priori knowledge of pixel probability distribution values.
- Assigns fixed length code words to variable length sequences.
- Patented Algorithm US 4,558,302
- Included in GIF and TIFF and PDF file formats

Dictionary Location Entry

- 00
- 11
- ..
- 255255
- 256-
- 511-

- A codebook (or dictionary) needs to be constructed.
- Initially, the first 256 entries of the dictionary are assigned to the gray levels 0,1,2,..,255 (i.e., assuming 8 bits/pixel)

Initial Dictionary

Consider a 4x4, 8 bit image

39 39 126 126

39 39 126 126

39 39 126 126

39 39 126 126

Dictionary Location Entry

- 00
- 11
- ..
- 255255
- 256-
- 511-

- As the encoder examines image pixels, gray level sequences (i.e., blocks) that are not in the dictionary are assigned to a new entry.

39 39 126 126

39 39 126 126

39 39 126 126

39 39 126 126

- Is 39 in the dictionary……..Yes

- What about 39-39………….No

- Then add 39-39 in entry 256

39-39

39 39 126 126

39 39 126 126

39 39 126 126

39 39 126 126

Concatenated Sequence: CS = CR + P

(P)

(CR)

CR = empty

If CS is found:

(1) No Output

(2) CR=CS

else:

(1) Output D(CR)

(2) Add CS to D

(3) CR=P

- The dictionary which was used for encoding need not be sent with the image.
- Can be built on the “fly” by the decoder as it reads the received code words.

- A predictive coding approach.
- Each pixel value (except at the boundaries) is predicted based on its neighbors (e.g., linear combination) to get a predicted image.
- The difference between the original and predicted images yields a differential or residual image.
- i.e., has much less dynamic range of pixel values.

- The differential image is encoded using Huffman coding.

Used to reduce the size of a repeating string of characters (i.e., runs):

1 1 1 1 1 0 0 0 0 0 0 1 (1,5) (0, 6) (1, 1)

a aa b bbbbb c c (a,3) (b, 6) (c, 2)

Encodes a run of symbols into two bytes: (symbol, count)

Can compress any type of data but cannot achieve high compression ratios compared to other compression methods.

- An effective technique to reduce inter pixel redundancy is to process each bit plane individually.
(1) Decompose an image into a series of binary images.

(2) Compress each binary image (e.g., using run-length coding)

- Assuming that a message has been encoded using Huffman coding, additional compression can be achieved using run-length coding.

e.g., (0,1)(1,1)(0,1)(1,0)(0,2)(1,4)(0,2)

- Transform the image into a domain where compression can be performed more efficiently (i.e., reduce interpixel redundancies).

~ (N/n)2 subimages

The magnitude of the FT decreases, as u, v increase!

K << N

K-1

K-1

- T(u,v) can be computed using various transformations, for example:
- DFT
- DCT (Discrete Cosine Transform)
- KLT (Karhunen-Loeve Transformation)

forward

inverse

if u=0

if v=0

if v>0

if u>0

- Basis set of functions for a 4x4 image (i.e.,cosines of different frequencies).

DFT

WHT

DCT

8 x 8 subimages

64 coefficients

per subimage

50% of the

coefficients

truncated

RMS error: 2.32 1.78 1.13

- DCT minimizes "blocking artifacts" (i.e., boundaries between subimages do not become very visible).

DFT

i.e., n-point periodicity

gives rise to

discontinuities!

DCT

i.e., 2n-point periodicity

prevents

discontinuities!

- Subimage size selection:

original

2 x 2subimages

4 x 4subimages

8 x 8subimages

- JPEG is an image compression standard which was accepted as an international standard in 1992.
- Developed by the Joint Photographic Expert Group of the ISO/IEC for coding and compression of color/gray scale images.
- Yields acceptable compression in the 10:1 range.
- A scheme for video compression based on JPEG called Motion JPEG (MJPEG) exists

- JPEG uses DCT for handling interpixel redundancy.
- Modes of operation:
(1) Sequential DCT-based encoding

(2) Progressive DCT-based encoding

(3) Lossless encoding

(4) Hierarchical encoding

Entropy

encoder

Entropy

decoder

- Divide the image into 8x8 subimages;
For each subimage do:

2.Shift the gray-levels in the range [-128, 127]

- DCT requires range be centered around 0

3.Apply DCT (i.e., 64 coefficients)

1 DC coefficient: F(0,0)

63 AC coefficients: F(u,v)

(i.e., non-centered

spectrum)

4.Quantize the coefficients (i.e., reduce the amplitude of coefficients that do not contribute a lot).

Q(u,v): quantization table

- Quantization Table Q[i][j]

Quantization

5.Order the coefficients using zig-zag ordering

- Place non-zero coefficients first

- Create long runs of zeros (i.e., good for run-length encoding)

6.Form intermediate symbol sequence and encode coefficients:

6.2 AC coefficients: variable length coding

6.1 DC coefficients: predictive encoding

DC

symbol_1 (SIZE)symbol_2 (AMPLITUDE)

end of block

DC (6) (61)

AC (0,2) (-3)

symbol_1 (RUN-LENGTH, SIZE)symbol_2 (AMPLITUDE)

SIZE: # bits for encoding amplitude

RUN-LENGTH: run of zeros

symbol_1 symbol_2

(SIZE) (AMPLITUDE)

- DC encoding
- AC encoding

predictive

coding:

[-2048, 2047]=

[-211, 211-1]

1 ≤SIZE≤11

0 0 0 0 0 0 476

(6,9)(476)

= [-210, 210-1]

1 ≤SIZE≤10

If RUN-LENGTH > 15, use symbol (15,0) , i.e., RUN-LENGTH=16

Symbol_1

(Variable Length Code (VLC))

Symbol_2

(Variable Length Integer (VLI))

(1,4)(12) (111110110 1100)

VLC VLI

10 (8k bytes)

50 (21k bytes)

90 (58k bytes)

worst quality,

highest compression

best quality,

lowest compression

Quantized

De-quantized

Reconstructed

Error

Quantized

De-quantized

Error

Reconstructed – spatial

- Could apply JPEG on R/G/B components .
- It is more efficient to describe a color in terms of its luminance and chrominance content separately, to enable more efficient processing.
- YUV

- Chrominance can be subsampled due to human vision insensitivity

- Luminance: Received brightness of the light (proportional to the total energy in the visible band).
- Chrominance: Describe the perceived color tone of the light (depends on the wavelength composition of light
- Hue: Specify the color tone (i.e., depends on the peak wavelength of the light).
- Saturation: Describe how pure the color is (i.e., depends on the spread or bandwidth of the light spectrum).

- YUV color space
- Y is the components of luminance
- Cb and Cr are the components of chrominance
- The values in the YUV coordinate are related to the values in the RGB coordinate by:

Encoder

Decoder

- JPEG supports several different modes
- Sequential Mode
- Progressive Mode
- Hierarchical Mode
- Lossless Mode

- Sequential is the default mode
- Each image component is encoded in a single left-to-right, top-to-bottom scan.
- This is the mode we have been describing.

- The image is encoded in multiple scans, in order to produce a quick, rough decoded image when transmission time is long.

Sequential

Progressive

- Send DCT coefficients in multiple scans:
(1) Progressive spectral selection algorithm

(2) Progressive successive approximation algorithm

(3) Hybrid progressive algorithm

(1) Progressive spectral selection algorithm

- Group DCT coefficients into several spectral bands
- Send low-frequency DCT coefficients first
- Send higher-frequency DCT coefficients next

(2) Progressive successive approximation algorithm

- Send all DCT coefficients but with lower precision.
- Refine DCT coefficients in later scans.

(3) Hybrid progressive algorithm

- Combines spectral selection and successive approximation.

after 0.9s

after 1.6s

after 3.6s

after 7.0s

- Hierarchical mode encodes the image at several different resolutions.
- Image is transmitted in multiple passes with increased resolution at each pass.

f4

N/4 x N/4

f2

N/2 x N/2

f

N x N

- Uses predictive coding (see later)

- Each pixel value (except at the boundaries) is predicted based on certain neighbors (e.g., linear combination) to get a predicted image.
- The difference between the original and predicted images yields a differential or residual image.
- Encode differential image using Huffman coding.

xm

dm

Entropy

Encoder

pm

Predictor

- Similar to lossless DPCM except that (i) it uses quantization and (ii) the pixels are predicted from the “reconstructed” values of certain neighbors.

- Divide image in non-overlapping blocks of pixels.
- Derive a bitmap (0/1) for each block using thresholding.
- e.g., use mean pixel value in each block as threshold.

- For each group of 1s and 0s, determine reconstruction value
- e.g., average of corresponding pixel values in original block.

- Analyze image to produce components containing frequencies in well defined bands (i.e., subbands)
- e.g., use wavelet transform.

- Optimize quantization/coding in each subband.

- Develop a dictionary of fixed-size vectors (i.e., code vectors), usually blocks of pixel values.
- Partition image in non-overlapping blocks (i.e., image vectors).
- Encode each image vector by the index of its closest code vector.

- What is a “fractal”?
- A rough or fragmented geometric shape that can be split into parts, each of which is (at least approximately) a reduced-size copy of the whole.

Idea: store images as collections of transformations!

Generated by 4 affine

transformations!

- Decompose image into segments (i.e., using standard segmentations techniques based on edges, color, texture, etc.) and look them up in a library of IFS codes.
- Best suited for textures and natural images.

- An image coding standard for digitized fingerprints, developed and maintained by:
- FBI
- Los Alamos National Lab (LANL)
- National Institute for Standards and Technology (NIST).

- The standard employs a discrete wavelet transform-based algorithm (Wavelet/Scalar Quantization or WSQ).

- FBI is digitizing fingerprints at 500 dots per inch with 8 bits of grayscale resolution.
- A single fingerprint card turns into about 10 MB of data!

A sample fingerprint image

768 x 768 pixels =589,824 bytes

The "white" spots in the middle of the black ridges are sweat pores

They’re admissible points of identification in court, as are the little black flesh ‘‘islands’’ in the grooves between the ridges

These details are just a couple pixels wide!

Better use a lossless method to preserve every pixel perfectly.

Unfortunately, in practice lossless methods haven’t done better than 2:1 on fingerprints!

Would JPEG work well for fingerprint compression?

file size 45853 bytes

compression ratio: 12.9

The fine details are pretty much history, and the whole image has this artificial ‘‘blocky’’ pattern superimposed on it.

The blocking artifacts affect the performance of manual or automated systems!

file size 45621 bytes

compression ratio: 12.9

The fine details are preserved better than they are with JPEG.

NO blocking artifacts!

- FBI’s target bit rate is around 0.75 bits per pixel (bpp)
- i.e., corresponds to a target compression ratio of 10.7 (assuming 8-bit images)

- This target bit rate is set via a ‘‘knob’’ on the WSQ algorithm.
- i.e., similar to the "quality" parameter in many JPEG implementations.

- Fingerprints coded with WSQ at a target of 0.75 bpp will actually come in around 15:1

Original image 768 x 768 pixels (589824 bytes)

WSQ image, file size 47619 bytes,

compression ratio 12.4

JPEG image, file size 49658 bytes,

compression ratio 11.9

WSQ image, file size 39270 bytes

compression ratio 15.0

JPEG image, file size 40780 bytes,

compression ratio 14.5

JPEG image, file size 30081 bytes,

compression ratio 19.6

WSQ image, file size 30987 bytes,

compression ratio 19.0