Data Compression: Advanced Topics

Data Compression: Advanced Topics • Huffman Coding Algorithm • Motivation • Procedure • Examples • Unitary Transforms • Definition • Properties • Applications EE465: Introduction to Digital Image Processing

Recall: Variable Length Codes (VLC) Recall: Self-information It follows from the above formula that a small-probability event contains much information and therefore worth many bits to represent it. Conversely, if some event frequently occurs, it is probably a good idea to use as few bits as possible to represent it. Such observation leads to the idea of varying the code lengths based on the events’ probabilities. Assign a long codeword to an event with small probability Assign a short codeword to an event with large probability EE465: Introduction to Digital Image Processing

Two Goals of VLC design • achieve optimal code length (i.e., minimal redundancy) For an event x with probability of p(x), the optimal code-length is , where x denotes the smallest integer larger than x (e.g., 3.4=4 ) –log2p(x)  code redundancy: Unless probabilities of events are all power of 2, we often have r>0 • satisfy uniquely decodable (prefix) condition EE465: Introduction to Digital Image Processing

“Big Question” How can we simultaneously achieve minimum redundancy and uniquely decodable conditions? D. Huffman was the first one to think about this problem and come up with a systematic solution. EE465: Introduction to Digital Image Processing

Huffman Coding (Huffman’1952) • Coding Procedures for an N-symbol source • Source reduction • List all probabilities in a descending order • Merge the two symbols with smallest probabilities into a new compound symbol • Repeat the above two steps for N-2 steps • Codeword assignment • Start from the smallest source and work back to the original source • Each merging point corresponds to a node in binary codeword tree EE465: Introduction to Digital Image Processing

Example-I Step 1: Source reduction p(x) symbol x 0.5 0.5 0.5 S N 0.25 0.25 0.5 E 0.125 (NEW) 0.25 W 0.125 (EW) compound symbols EE465: Introduction to Digital Image Processing

Example-I (Con’t) Step 2: Codeword assignment symbol x p(x) codeword 0 1 NEW 0.5 0.5 0.5 0 0 S 0 N 1 0 S 0.25 0.25 0 10 1 0.5 EW 10 0 E 0.125 110 N 1 1 0 0.25 W 0.125 110 111 1 111 W E EE465: Introduction to Digital Image Processing

Example-I (Con’t) 0 1 1 0 NEW NEW 0 1 1 0 S 0 1 S or EW 10 EW 01 N N 1 0 1 0 110 000 001 W E W E The codeword assignment is not unique. In fact, at each merging point (node), we can arbitrarily assign “0” and “1” to the two branches (average code length is the same). EE465: Introduction to Digital Image Processing

Example-II Step 1: Source reduction p(x) symbol x 0.4 0.4 0.4 0.6 e (aiou) a 0.2 0.2 0.4 0.4 (iou) i 0.2 0.2 0.2 o 0.1 0.2 u 0.1 (ou) compound symbols EE465: Introduction to Digital Image Processing

Example-II (Con’t) Step 2: Codeword assignment p(x) codeword symbol x 0 1 0.4 0.4 0.4 0.6 e (aiou) 01 a 0.2 0.2 0.4 1 0.4 (iou) 000 i 0.2 0.2 0.2 0010 o 0.1 0.2 u 0.1 (ou) 0011 compound symbols EE465: Introduction to Digital Image Processing

Example-II (Con’t) 0 1 (aiou) e 00 01 (iou) a 000 001 (ou) i 0010 0011 o u binary codeword tree representation EE465: Introduction to Digital Image Processing

Example-II (Con’t) p(x) codeword length symbol x 1 1 0.4 e 01 2 a 0.2 3 i 0.2 000 o 0.1 0010 4 u 0.1 0011 4 If we use fixed-length codes, we have to spend three bits per sample, which gives code redundancy of 3-2.122=0.878bps EE465: Introduction to Digital Image Processing

Example-III Step 1: Source reduction compound symbol EE465: Introduction to Digital Image Processing

Example-III (Con’t) Step 2: Codeword assignment compound symbol EE465: Introduction to Digital Image Processing

Summary of Huffman Coding Algorithm • Achieve minimal redundancy subject to the constraint that the source symbols be coded one at a time • Sorting symbols in descending probabilities is the key in the step of source reduction • The codeword assignment is not unique. Exchange the labeling of “0” and “1” at any node of binary codeword tree would produce another solution that equally works well • Only works for a source with finite number of symbols (otherwise, it does not know where to start) EE465: Introduction to Digital Image Processing

Data Compression: Advanced Topics • Huffman Coding Algorithm • Motivation • Procedure • Examples • Unitary Transforms • Definition • Properties • Applications EE465: Introduction to Digital Image Processing

An Example of 1D Transform with Two Variables x2 y2 y1 (1,1) (1.414,0) x1 Transform matrix EE465: Introduction to Digital Image Processing

Decorrelating Property of Transform x2 y1 y2 x1 x1 and x2 are highly correlated y1 and y2 are less correlated p(x1x2)  p(x1)p(x2) p(y1y2)  p(y1)p(y2) Please use MATLAB demo program to help your understanding why it is desirable to have less correlation for image compression EE465: Introduction to Digital Image Processing

Transform=Change of Coordinates • Intuitively speaking, transform plays the role of facilitating the source modeling • Due to the decorrelating property of transform, it is easier to model transform coefficients (Y) instead of pixel values (X) • An appropriate choice of transform (transform matrix A) depends on the source statistics P(X) • We will only consider the class of transforms corresponding to unitary matrices EE465: Introduction to Digital Image Processing

Unitary Matrix Definition conjugate transpose A matrix A is called unitary if A-1=A*T Example Notes:  transpose and conjugate can exchange, i.e., A*T=AT* For a real matrix A, it is unitary if A-1=AT EE465: Introduction to Digital Image Processing

Example 1: Discrete Fourier Transform (DFT) DFT Matrix: Im Re DFT: EE465: Introduction to Digital Image Processing

Discrete Fourier Transform (Con’t) Properties of DFT matrix symmetry Proof: unitary Proof: If we denote then we have (identity matrix) EE465: Introduction to Digital Image Processing

Example 2: Discrete Cosine Transform (DCT) real You can check it using MATLAB demo EE465: Introduction to Digital Image Processing

DCT Examples N=2: Haar Transform 0.5000 0.5000 0.5000 0.5000 0.6533 0.2706 -0.2706 -0.6533 0.5000 -0.5000 -0.5000 0.5000 0.2706 -0.6533 0.6533 -0.2706 N=4: Here is a piece of MATLAB code to generate DCT matrix by yourself % generate DCT matrix with size of N-by-N Function C=DCT_matrix(N) for i=1:N; x=zeros(N,1);x(i)=1;y=dct(x);C(:,i)=y;end; end EE465: Introduction to Digital Image Processing

Example 3: Hadamard Transform Here is a piece of MATLAB code to generate Hadamard matrix by yourself % generate Hadamard matrix N=2^{n} function H=hadamard(n) H=[1 1;1 -1]/sqrt(2); i=1; while i<n H=[H H;H -H]/sqrt(2); i=i+1; end EE465: Introduction to Digital Image Processing

1D Unitary Transform When the transform matrix A is unitary, the defined 1D transform is called unitary transform Forward Transform Inverse Transform EE465: Introduction to Digital Image Processing

Basis Vectors basis vectors corresponding to forward transform (column vectors of transform matrix A) basis vectors corresponding to inverse transform (column vectors of transform matrix A*T ) EE465: Introduction to Digital Image Processing

From 1D to 2D Do N 1D transforms in parallel EE465: Introduction to Digital Image Processing

Definition of 2D Transform 2D forward transform 1D column transform 1D row transform EE465: Introduction to Digital Image Processing

2D Transform=Two Sequential 1D Transforms (left matrix multiplication first) column transform row transform row transform (right matrix multiplication first) column transform Conclusion:  2D separable transform can be decomposed into two sequential  The ordering of 1D transforms does not matter EE465: Introduction to Digital Image Processing

T Basis Images T 1N N1 T Basis image Bijcan be viewed as the response of the linear system (2D transform) to a delta-function input ij EE465: Introduction to Digital Image Processing

Example 1: 8-by-8 Hadamard Transform j DC i Bij In MATLAB demo, you can generate these 64 basis images and display them EE465: Introduction to Digital Image Processing

Example 2: 8-by-8 DCT j DC i In MATLAB demo, you can generate these 64 basis images and display them EE465: Introduction to Digital Image Processing

2D Unitary Transform Suppose A is a unitary matrix, forward transform inverse transform Proof Since A is a unitary matrix, we have EE465: Introduction to Digital Image Processing

Properties of Unitary Transforms • Energy compaction: only a small fraction of transform coefficients have large magnitude • Such property is related to the decorrelating capability of unitary transforms • Energy conservation: unitary transform preserves the 2-norm of input vectors • Such property essentially comes from the fact that rotating coordinates does not affect Euclidean distance EE465: Introduction to Digital Image Processing

Energy Compaction Property • How does unitary transform compact the energy? • Assumption: signal is correlated; no energy compaction can be done for white noise even with unitary transform • Advanced mathematical analysis can show that DCT basis is an approximation of eigenvectors of AR(1) process (a good model for correlated signals such as an image) • A frequency-domain interpretation • Most transform coefficients would be small except those around DC and those corresponding to edges (spatially high-frequency components) • Images are mixture of smooth regions and edges EE465: Introduction to Digital Image Processing

Energy Compaction Example in 1D Hadamard matrix significant A coefficient is called significant if its magnitude is above a pre-selected threshold th insignificant coefficients (th=64) EE465: Introduction to Digital Image Processing

Energy Compaction Example in 2D Example A coefficient is called significant if its magnitude is above a pre-selected threshold th insignificant coefficients (th=64) EE465: Introduction to Digital Image Processing

Image Example low-frequency high-frequency Original cameraman image X Its DCT coefficients Y (2451 significant coefficients, th=64) Notice the excellent energy compaction property of DCT EE465: Introduction to Digital Image Processing

Counter Example Original noise image X Its DCT coefficients Y No energy compaction can be achieved for white noise EE465: Introduction to Digital Image Processing

Energy Conservation Property in 1D 1D case A is unitary Proof EE465: Introduction to Digital Image Processing

Numerical Example Check: EE465: Introduction to Digital Image Processing

Implication of Energy Conservation Q T T-1 Linearity of Transform A is unitary EE465: Introduction to Digital Image Processing

Energy Conservation Property in 2D 2-norm of a matrix X Step 1: A unitary Proof: Using energy conservation property in 1D, we have EE465: Introduction to Digital Image Processing

Energy Conservation Property in 2D (Con’t) Step 2: A unitary Hint: 2D transform can be decomposed into two sequential 1D transforms, e.g., column transform row transform Use the result you obtained in step 1 and note that EE465: Introduction to Digital Image Processing

Numerical Example T Check: EE465: Introduction to Digital Image Processing

Implication of Energy Conservation Q T T-1 Linearity of Transform Similar to 1D case, quantization noise in the transform domain has the same energy as that in the spatial domain EE465: Introduction to Digital Image Processing

Why Energy Conservation? s Y entropy coding binary bit stream forward Transform image X f probability estimation super channel ^ ^ s Y image X entropy decoding inverse Transform f-1 EE465: Introduction to Digital Image Processing

Data Compression: Advanced Topics

Data Compression: Advanced Topics

Presentation Transcript

Audio Compression

BWT-Based Compression Algorithms compress better than you have thought

Beyond the Basics: Advanced Topics

Spatial Data Mining and Spatial Data Warehousing Special Topics In Database

Advanced topics in HR Employee Selection and Staffing

GE Aviation - SCP Training

Chapter 10 Image Compression

Special Topics in Educational Data Mining

Advanced Topics in Computer Systems: Machine Learning and Data Mining Systems Winter 2007

Image Compression (Chapter 8)

Managing XML and Semistructured Data

241-423 Advanced Data Structures and Algorithms

Advanced Data Structures NTUA 2007 R-trees and Grid File

Advanced Character Driver Operations

Advanced UNIX

Valence-based Connectivity Coding

CS 6293 Advanced Topics: Current Bioinformatics

Chapter 2: Dimensioning

GE Aviation - SCP Training

Multilevel Regression Models

CS234/NetSys210: Advanced Topics in Networking Spring 2012 SIP and VoIP