1 / 74

# Multimedia Compression (2) - PowerPoint PPT Presentation

Multimedia Compression (2). Mei-Chen Yeh 03/23/2009. Review. Entropy Entropy coding Huffman coding Arithmetic coding. Lossless!. Outline. Revisiting Information theory Lossy compression Quantization Transform coding. Encoder. Decoder. source. s=r Lossless s ≠ r Lossy.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' Multimedia Compression (2)' - sidney

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### Multimedia Compression (2)

Mei-Chen Yeh

03/23/2009

• Entropy

• Entropy coding

• Huffman coding

• Arithmetic coding

Lossless!

• Revisiting Information theory

• Lossy compression

• Quantization

• Transform coding

Decoder

source

s=r Lossless

s≠r Lossy

Performance measures

Distortion

reconstruction

1

Rate

2

• Lossless coding

• Rate

• Lossy coding

• Rate

• Distortion

• Goals

• Minimize the rate

• Keep the distortion small

• Tradeoffs between the best of both worlds

• Rate-distortion theory

• Provides theoretical bounds for lossy compression

• Suppose we send a symbol xi, receive yi

• d(xi, yi) ≥ 0

• d(xi, yi) = 0 for xi= yi

• Average distortion

• D = ∑x∑yp(xi, yi)d(xi, yi)

• Distortion measurement?

decibels

(分貝)

• User feedbacks

• Subjective and may be biased

• Some popular criteria:

• Mean square error (mse)

• Average of the absolute difference

• Signal-to-noise ratio (SNR)

• Peak-signal-to-noise ratio (PSNR)

• Information: reduction in uncertainty

#1: predict the outcome of a coin flip

#2: predict the outcome of a die roll

Next

#1: You observe the outcome of a coin flip

#2: You observe the outcome of a die roll

Which has more uncertainty?

#2

#2

• Entropy

• The average self-information

• The average amount of information provided per symbol

• The uncertainty an observer has before seeing the symbol

• The average number of bits needed to communicate each symbol

• Example: two random variables X and Y

X : College major

Y : Likes “XBOX”

Yes

No

0

H(Y|X=CS) =?

1

H(Y|X=Math) =?

• The conditional entropy H(Y|X)

H(Y|X) = 0.5H(Y|X=Math)+0.25H(Y|X=CS)+0.25H(Y|X=H.)

H(Y|X) = 0.5*1 + 0.25*0 + 0.25*0 = 0.5

conditional probability

H(X) = 1.5

H(Y) = 1

H(Y|X) = 0.5

H(X|Y) = 1

• H(X|Y)

• The amount of uncertainty remaining about X, given we know the value Y

• H(X|Y) vs. H(X)

The amount of information that X and Y convey about each other!

Mutual information

Average mutual information

• Properties

• 0 ≤ I(X; Y) = I(Y; X)

• I(X; Y) ≤ H(X)

• I(Y; X) ≤ H(Y)

Lower the bit rate R by allowing some acceptable distortion D of the signal

Given R, minimize D

Given D, minimize R

Calculates the minimum transmission bit-rate R for a required reconstruction quality

The results do not depend on a specific coding method.

With the assumption of statistical independence between distortion and reconstructed signal

Represents trade-offs between distortion and rate

Definition:

The lowest rate at which the source can be encoded while keeping the distortion less than or equal to D*

Shannon lower bound

Example: Rate distortion function for the Gaussian source

No compression system exists that performs outside the gray area!

• A zero mean Gaussian pdf with variance σx2

• Distortion: the MSE measure

• D = E[(X-Y)2]

• The rate distortion function is:

Khalid Sayood. Introduction to Data Compression, 3rd edition, Morgan Kaufmann, 2005.

• Revisiting Information theory

• Lossy compression

• Quantization

• Transform coding

• A practical lossy compression technique

• The process of representing a large–possibly infinite–set of values with a much smaller set

Use -10, -9, …,0, 1, …, 10 (21 values) to represent real numbers (infinite values)

2.47 => 2

3.1415926 => 3

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

Loss of information!

The reconstruction value 3 could be 2.95, 3.16, 3.05, …

• Inputs and outputs are scalars

• Design of a scalar quantizer

• Construct intervals (Encoder)

• Assign codewords to intervals (Encoder)

• Select reconstruction values (Decoder)

-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10

• Input-output map

Output

Input

-∞

v

Decision boundaries

Reconstruction levels

64

196

8-bit per pixel [0 255]

1-bit per pixel {0, 128, 255}

2-bit per pixel {0, 64, 128, 196, 255}

3-bit per pixel (8 intervals)

• Fixed-length coding

Given an input pdf fx(x) and the number of levels M in the quantizer, find the decision boundaries {bi} and the reconstruction levels {yi} so as to minimize the distortion.

• Variable-length coding

Given a distortion constraint

find the decision boundaries {bi} and the reconstruction levels {yi}, andbinary codesthat minimize the rate while satisfying the constraint.

SLOW

FAST

K

L

How to generate the codebook?

Operate on blocks of data

The VQ procedure:

Samples may be correlated!

Example: height and weight of individuals

Quantization rule

Quantization regions

The quantization regions are no longer restricted to be rectangles!

3

2.5

2

1.5

y

1

0.5

0

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

x

The Linde-Buzo-Gray Algorithm (also known as k-means)

Training set

No guarantee that the procedure will converge to the optimal solution!

Sensitive to initial points

Empty-cell

No update in an empty region

End up with an output point that is never used

• LBG has problems when clusters have different

• Sizes

• Densities

• Non-globular shapes

LBG (3 Clusters)

Original Points

LBG (3 Clusters)

Original Points

Original Points

LBG (2 Clusters)

Use 4x4 blocks of pixels

Codebook size 16

Codebook size 64

Codebook size 256

Codebook size 1024

• Revisiting Information theory

• Lossy compression

• Quantization

• Transform coding and the baseline JPEG

To compact most of the information into a few elements!

Slide credit: Bernd Girod

Transform

Quantization

Binary coding

• Three steps

• Divide a data sequence into blocks of size N and transform each block using a reversible mapping

• Quantize the transformed sequence

• Encode the quantized values

• Data-dependent

• Discrete Karhunen Lòeve transform (KLT)

• Data-independent

• Discrete cosine transform (DCT)

• Sub-band coding

• Wavelet transform

• Also known as Principal Component Analysis (PCA), or the Hotelling transform

• Transforms correlated variables into uncorrelated variables

• Basis vectors are eigenvectors of the covariance matrix of the input signal

• Achieves optimum energy concentration

• Dependent on signal statistics

• Not separable

The frequency increases as we go from top to bottom!

Part of many standards (JPEG, MPEG, H.261, …)

The transform matrix C

Visualize the rows of C

Increased variation!

2-D basis matrices of DCT

Performs close to the optimum KLT in terms of compaction

• DCT on a 8x8 image block

low freq.

high freq.

DC

low freq.

DCT

high freq.

ACs

8 x 8 image block

8 x 8 DCTcoefficients

• Keep DC, remove others (quantize to 0s)

• Remove DC

• Keep DC and the first row of ACs

• Keep DC and the first column of ACs

• Keep DC and the first ACs

• Keep DC and the first eight ACs

Transform

Quantization

Binary coding

DC

ACs

• The bit-rate allocation problem

• Divide bit-rate R among transform coefficients such that resulting distortion D is minimized

variance of the transform

coefficient θk

The optimal bit allocation

Horizontal frequency →

DC component

vertical frequency →

AC components

noisy

• The threshold coding

• Transform coefficients that fall below a threshold are discarded

• Example: a 8x8 image block

The zigzag scan

Sample quantization table

### The Baseline JPEG

Slide credit: Bernd Girod

An 8x8 block from the Sena image

Large values

DCT coefficients

Small values

• Level shift—subtract the mean

• Subtract 128 from each pixel for a 8-bit image [0, 255] -> [-128, 127]

• 8x8 forward DCT

• Replicate the last column/row until the size is a multiple of eight

Forward DCT

Inverse DCT

Reconstruction = ?

32

DCT coefficients

The quantization step sizes are organized in a table (the quantization table)

More quantization errors!

DCT coefficients

Quantization errors in the DC and lower AC coefficients are more easily detectable than that in the higher AC coefficients.

Encode

Remove mean,

DCT

Quantize

Zigzag scan

-26 -3 0 -3 -2 -6 2 -4 1 -4 1 1 5 1 2 -1 1 -1 2 0 0 −1 1 −1 2 0 0 0 0 0 −1 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

-26 -3 0 -3 -2 -6 2 -4 1 -4 1 1 5 1 2 -1 1 -1 2 0 0 −1 1 −1 2 0 0 0 0 0 −1 −1 EOF

Entropy coding

010001010000101110000011101010011000101111100000……………

Decode

010001010000101110000011101010011000101111100000……………

Original block

Entropy decoding

-26 -3 0 -3 -2 -6 2 -4 1 -4 1 1 5 1 2 -1 1 -1 2 0 0 −1 1 −1 2 0 0 0 0 0 −1 −1 EOF

Put into a block

Inverse DCT,