Data Communications

Data Communications Compression Techniques

Data Compression • Whether data, fax, video, audio, etc., compression can work wonders • Compression can be loss-less, or lossy

Huffman Codes • A frequency dependent code • Usually a smaller alphabet • Must know frequency of occurrence of each character in the alphabet • Order characters from highest to lowest or vice versa • Select two smallest percentages; must be adjacent

Huffman Codes Example: a frequency dependent code A 8% B 12% C 10% D 6% E 18% F 7% G 20% H 16% I 3%

Huffman Codes Now send string A B C D E F G

Run-length Encoding Replace runs of 0s with a count of how many 0s. 00000000000000100000000011000000000000000000001000…001100000000000 ^ (30 0s) 14 9 0 20 30 0 11 1110 1001 0000 1111 0101 1111 1111 0000 0000 1011

Lempel-Ziv Encoding • Replace character strings with codes • Problems: • How do we make it dynamic? (Whatever the most frequently occurring strings are, compress those.) • How do we find those strings? • How does receiver know  is THE? • Very popular algorithm – used in PKZIP, V.42bis modems and others

Lempel-Ziv Encoding • Works best on large files • Typical performance of Lempel-Ziv: • Program file: reduces to 44% original size • Text file: reduces to 64% original size • Image file: reduces to 88% original size

Lempel-Ziv Encoding • To begin store each character with its ASCII value (127 values) • Then we will set the variable Buff = first character from the text file and set Next = the next character from the file. Then we will perform the following steps:

Lempel-Ziv Encoding Temp = concat(Buff, Next) is Temp in code table? Yes? Buff = Temp and get next Next No? send the code associated with Buff assign a code to Temp and store both in code table Buff = Next get next character from Input String and assign to Next repeat all steps until end-of-file

Lempel-Ziv Encoding • Try the string: “the thing in this is this” t = 116 h = 104 e = 101 = 32

An LZ Encoding Example • Initialize: • Store each character with its ASCII value (127 values) • So, first multiple-character code will be 128 • String = “the thing in this is this” • Send code 116 (‘t’), Store “th” as code 128 • Send code 104 (‘h’), Store “he” as code 129 • Send code 101 (‘e’), Store “e_“ as code 130 • Send code 32 (‘_‘), Store “_t” as code 131 • Send code 128 (‘th’), Store “thi” as code 132

LZ Encoding Example (cont’d) • String = “the thing in this is this” • Send code 105 (‘i’), Store “in” as code 133 • Send code 110 (‘n’), Store “ng” as code 134 • Send code 103 (‘g’), Store “g_” as code 135 • Send code 32 (‘_‘), Store “_i” as code 136 • Send code 133 (‘in’) Store “in_” as code 137 • Send code 131 (‘_t‘), Store “_th” as code 138 • Send code 104 (‘h’), Store “hi” as code 139 • Send code 105 (‘i’), Store “is” as code 140 • Send code 115 (‘s’), Store “s_” as code 141

LZ Encoding Example (cont’d) • String = “the thing in this is this” • Send code 136 (‘_i’), Store “_is” as code 142 • Send code 110 (‘s_’), Store “s_t” as code 143 • Send code 132 (‘thi’), Store “this” as code 144 • Send code 115 (‘s‘)

Lempel-Ziv Decoding • After you transmit the string, how is the compressed code decoded? • Note - the ONLY thing transmitted are the code values

LZ Decoding • LZ Decoding Algorithm: • Initialize dictionary to contain all single characters and their codes (ASCII) • Repeat • Receive code • Look up associated character block, B, in the dictionary. • Take the last received block plus first character of block B and add this block with a new code to the dictionary • Output the character block B. • Until no more codes are received.

LZ Decoding Example • Receive code 116, Output t • Receive code 104, Store “th” as code 128, Outputh • Receive code 101, Store “he” as code 129, Outpute • Receive code 32, Store “e_“ as code 130, Output_ • Receive code 128, Store “_t” as code 131, Outputth • Receive code 105, Store “thi” as code 132, Outputi • Receive code 110, Store “in” as code 133, Outputn • Receive code 103, Store “ng” as code 134, Outputg • Receive code 32, Store “g_” as code 135, Output_ • Receive code 133, Store “_i” as code 136, Outputin

LZ Decoding Example (cont’d) • Receive code 131, Store “in_” as code 137, Output_t • Receive code 104, Store “_th” as code 138, Outputh • Receive code 105, Store “hi” as code 139, Outputi • Receive code 115, Store “is” as code 140, Outputs • Receive code 136, Store “s_” as code 141, Output_i • Receive code 110, Store “_is” as code 142, Outputs_ • Receive code 132, Store “s_t” as code 143, Outputthi • Receive code 115, Store “this” as code 144, Outputs • Resulting output: the thing in this is this

Relative or Differential Encoding • Video does not compress well using Huffman or run-length encoding • In one color video frame, not much is alike • But what about from frame to frame? • Send a frame, store it in a buffer • Next frame is just difference from previous frame • Then store that frame in buffer, etc.

5 7 6 2 8 6 6 3 5 6 6 5 7 5 5 6 3 2 4 7 8 4 6 8 5 6 4 8 8 5 5 1 2 9 8 6 5 5 6 6 First Frame 5 7 6 2 8 6 6 3 5 6 6 5 7 6 5 6 3 2 3 7 8 4 6 8 5 6 4 8 8 5 5 1 3 9 8 6 5 5 7 6 Second Frame 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 Difference

Image Compression • One image - JPEG, or continuous images such as video - MPEG • A color picture can be defined by red/green/blue, or luminance / chrominance / chrominance which are based on RGB values • Either way, you have 3 values, each 8 bits, or 24 bits total (224 colors!)

Image Compression • A VGA screen is 640 x 480 pixels • 24 bits x 640 x 480 = 7,372,800 bits. Ouch! • And video comes at you 30 images per second. Double Ouch! • We need compression!

JPEG • Joint Photographic Experts Group • Compresses still images • Lossy • JPEG compression consists of 3 phases: • Discrete cosine transformations (DCT) • Quantization • Encoding

JPEG Step 1 - DCT • Divide image into a series of 8x8 pixel blocks • If the original image was 640x480 pixels, the new picture would be 80 blocks x 60 blocks (next slide) • If B&W, each pixel in 8x8 block is an 8-bit value (0-255)

80 blocks 60 blocks 640 x 480 VGA Screen Image Divided into 8 x 8 Pixel Blocks

JPEG Step 1 - DCT • If color, each pixel is 24 bits, or 3 8-bit groups • Thus, each pixel value is represented by three 8x8 arrays • B&W or color, the DCT is applied to these 8x8 arrays

JPEG Step 1 - DCT • So what does DCT do? Takes an 8x8 array (P) and produces a new 8x8 array (T) using cosines • T matrix contains a collection of values called spatial frequencies. These spatial frequencies relate directly to how much the pixel values change as a function of their positions in the block

JPEG Step 1 - DCT • An image with uniform color changes (little fine detail) has a P array with closely similar values and a corresponding T array with many zero values (next slide) • An image with large color changes over a small area (lots of fine detail) has a P array with widely changing values, and thus a T array with many non-zero values

JPEG Step 2 - Quantization • The human eye can’t see small differences in color • So take T matrix and divide all values by 10. This will give us more zero entries. More 0s means more compression! • But this is too lossy. And dividing all values by 10 doesn’t take into account that upper left of matrix has more action (the less subtle features of the image, or low spatial frequencies)

JPEG Step 2 - Quantization • So divide T matrix by another matrix (U) with smaller values in upper left corner and larger values in lower right corner (next slide) • Result is matrix Q

1 3 5 7 9 11 13 15 3 5 7 9 11 13 15 17 5 7 9 11 13 15 17 19 7 9 11 13 15 17 19 21 9 11 13 15 17 19 21 23 11 13 15 17 19 21 23 25 13 15 17 19 21 23 25 27 15 17 19 21 23 25 27 29 U matrix Q[i][j] = Round(T[i][j] / U[i][j]), for i = 0, 1, 2, …7 and j = 0, 1, 2, …7

JPEG Step 3 - Encoding • Now take the quantized matrix Q and perform run-length encoding on it • But don’t just go across the rows. Longer runs of zeros if you perform the run-length encoding in a diagonal fashion (next slide, from White text)

JPEG Step 3 - Encoding

JPEG • How do you get the image back? • Undo run-length encoding • Multiply matrix Q by matrix U yielding matrix T • Apply similar cosine calculations to get original P matrix back

MPEG • Motion Pictures Expert Group • MPEG-1: CD-ROM video, early broadcast satellite systems • MPEG-2: multimedia entertainment and HDTV • MPEG-3: originally intended for HDTV • MPEG-4: videoconferencing • MPEG like JPEG but uses temporal redundancy

MPEG • Don’t transmit complete frames, just what has changed from last frame • But what happens when a scene changes? Or someone walks thru a door? Or you turn on the TV part way thru the broadcast?

MPEG • I-frame - intrapicture frame - self-contained frame • P-frame - predicted frame - just the differences from the last frame • B-frame - bidirectional frame - similar to P-frame but difference between previous frame and future frame

MPEG I B B P B B I ^ diff from prior I frame B frames are interpolated from previous I frame and next P frame Order of transmission: I P B B I B B

MPEG • P-frames are coded using motion-compensated prediction • Screen reduced to macroblocks • Macroblocks contain values of luminance, chrominance, chrominance

MPEG • MPEG algorithm compares a macroblock from previous screen with one from current screen (the 2 macroblocks that look the most alike) and computes a motion vector • Motion vector (what’s changed between macroblocks and how) stores in matrix similar to JPEG • Intel MMX technology really helps

Review Questions • Given the following: A (18%), B (36%), C (24%), D (13%), F (9%), calculate a Huffman Code • Perform a run-length encoding on the following: 10000000001000000111000000000000000100 • What are the advantages of Lempel-Ziv? • What are the three steps in JPEG? • What are the different frames in MPEG and how are they used to create a video?

Data Communications