Noise, Information Theory, and Entropy (cont.)

Noise, Information Theory, and Entropy (cont.) CS414 – Spring 2007 By Karrie Karahalios, Roger Cheng, Brian Bailey

Coding Intro - revisited • Assume alphabet K of{A, B, C, D, E, F, G, H} • In general, if we want to distinguish n different symbols, we will need to use, log2n bits per symbol, i.e. 3. • Can code alphabet K as:A 000 B 001 C 010 D 011 E 100 F 101 G 110 H 111

Coding Intro - revisited “BACADAEAFABBAAAGAH” is encoded as the string of 54 bits • 001000010000011000100000101000001001000000000110000111 (fixed length code)

Coding Intro • With this coding:A 0 B 100 C 1010 D 1011E 1100 F 1101 G 1110 H 1111 • 100010100101101100011010100100000111001111 • 42 bits, saves more than 20% in space

Huffman Tree A (8), B (3), C(1), D(1), E(1), F(1), G(1), H(1)

Limitations • Diverges from lower limit when probability of a particular symbol becomes high • always uses an integral number of bits • Must send code book with the data • lowers overall efficiency • Must determine frequency distribution • must remain stable over the data set

Arithmetic Coding • Replace stream of input symbols with a single floating point number • bypasses replacement of symbols with codes • Use probability distribution of symbols (as they appear) to successively narrow original range • The longer the sequence, the greater the precision of the floating point number • requires infinite precision (but this is possible)

Encode “BILL” p(B)=1/4; p(I)=1/4; p(L) = 2/4 Assign symbols to range [0.0, 1.0] based on p Successively reallocate low-high range based on sequence of input symbols Encoding Example

Encoding Example • When B appears, compute symbol portion [0.0, 0.25] from current range [0.0,1.0]

Encoding Example • When I appears, compute symbol portion[0.25, 0.50] of current range [0.0, 0.25]

Encoding Example • When L appears, compute symbol portion[0.50, 1.0] of current range [0.0625, 0.125]

Encoding Example • When L appears, compute symbol portion [0.50, 1.0] of current range [0.09375, 0.125]

Encoding Example • When L appears, compute symbol portion [0.50, 1.0] of current range [0.09375, 0.125] The final low range valueencodes entire sequence Actually, ANY value within final range will encode entire sequence

Encoding Algorithm • Set low to 0.0 • Set high to 1.0 • WHILE input symbols remain • Range = high – low • Get symbol • High = low + high_range(symbol)*range • Low = low + low_range(symbol)*range • END while • Output any value in [low, high)

Decoding Example • E = .109375 • between 0.0, 0.25; output ‘B’ • E = (.109375 – 0.0) / 0.25 = .4375 • .4375 between 0.25, 0.5; output ‘I’ • E = (.4375 – 0.25) / 0.25 = 0.75 • 0.75 between 0.5, 1.0; output ‘L’ • E = (0.75 – 0.5) / 0.5 = 0.5 • 0.5 between 0.5, 1.0; output ‘L’ • E = (0.5 – 0.5) / 0.5 = 0.0 -> STOP

Decoding Algorithm • encoded = Get (encoded number) • DO • Find symbol whose range contains encoded • Output the symbol • range = high(symbol) – low(symbol) • encoded = (encoded – low(symbol)) / range • UNTIL (EOF)

Code Transmission • Transmit any number within final range • choose number that requires fewest bits • Recall that the minimum number of bits required to represent an ensemble is • Note that we are not comparing directly to H because no code book is generated

Compute Size of Interval • Interval: [L, L + S] • Size of interval (S): • For ensemble “BILL” • = .25*.25*.5*.5 = .015625 • Check algorithm result • .125 - .109375 = .015625

Number of Bits to Represent S • Requires bits (min) to specify S • where • Same as the minimum number of bits

Determine Representation • Compute midpoint L + S/2 • truncate its binary representation after • Truncated number lies within [L, L+S], as

Practical Notes • Achieve infinite precision using fixed width integers as shift registers • represent only fractional part of each range • as precision of each range increases, the most significant bits will match • shift out MSB and continue algorithm • Caveat • underflow can occur if ranges approach same number without MSB being equal

Exercise: Huffman vs Arithmetic • Given message AAAAB where p(A)=.9; p(B)=.1 • Huffman code • (a) compute entropy (H) • (b) build Huffman tree (simple) • (c) compute average codeword length • (d) compute number of bits needed to encode message • Arithmetic coding • (a) compute theoretical min. number of bits to transmit message • (b) compute the final value that represents the message • (c) independent of (b), what is the min number of bits needed to represent the final interval? How does this value compare to (a)?How does this value compare to Huffman part (d)

Error detection and correction • Error detection is the ability to detect errors that are made due to noise or other impairments during transmission from the transmitter to the receiver. • Error correction has the additional feature that enables localization of the errors and correcting them. • Error detection always precedes error correction.

Error Detection • Data transmission can contain errors • Single-bit • Burst errors of length n where n is the distance between the first and last errors in data block. • How to detect errors • If only data is transmitted, errors cannot be detected • Send more information with data that satisfies a special relationship • Add redundancy

Error Detection Methods • Vertical Redundancy Check (VRC) / Parity Check • Longitudinal Redundancy Check (LRC) • Checksum • Cyclic Redundancy Check

Vertical Redundancy Check (VRC)aka Parity Check • Vertical Redundancy Check (VRC) • Append a single bit at the end of data block such that the number of ones is even Even Parity (odd parity is similar)0110011  011001100110001  01100011 Odd Parity 0110011  011001110110001  01100010 • Performance: • Detects all odd-number errors in a data block (even)

11100111 11011101 00111001 10101001 11100111 11011101 00111001 10101001 10101010 11100111 11011101 00111001 10101001 10101010 LRC Original Data Longitudinal Redundancy Check (LRC) • Longitudinal Redundancy Check (LRC) • Organize data into a table and create a parity for each column

LRC • Performance: • Detects all burst errors up to length n (number of columns) • Misses burst errors of length n+1 if there are n-1 uninverted bits between the first and last bit

Parallel Parity • One error gives 2 parity errors. Can detect which value is flipped.

Checksum • Used by upper layer protocols • Similar to LRC, uses one’s complement arithmetic • Ex. 2 40 05 80 FB 12 00 26 B4 BB 09 B4 12 28 74 11 BB 12 00 2E 22 12 00 26 75 00 00 FA 12 00 26 25 00 3A F5 00 DA F7 12 00 26 B5 00 06 74 10 12 00 2E 22 F1 74 11 12 00 2E 22 74 13 12 00 2E 22 B4

Cyclic Redundancy Check • Powerful error detection scheme • Rather than addition, binary division is used  Finite Algebra Theory (Galois Fields) • Can be easily implemented with small amount of hardware • Shift registers • XOR (for addition and subtraction)

CRC • Let us assume k message bits and n bits of redundancy • Associate bits with coefficients of a polynomial1 0 1 1 0 1 11x6+0x5+1x4+1x3+0x2+1x+1= x6+x4+x3+x+1

CRC • Let M(x) be the message polynomial • Let P(x) be the generator polynomial • P(x) is fixed for a given CRC scheme • P(x) is known both by sender and receiver • Create a block polynomial F(x) based on M(x) and P(x) such that F(x) is divisible by P(x)

CRC • Sending • Multiply M(x) by xn • Divide xnM(x) by P(x) • Ignore the quotient and keep the reminder C(x) • Form and send F(x) = xnM(x)+C(x) • Receiving • Receive F’(x) • Divide F’(x) by P(x) • Accept if remainder is 0, reject otherwise

Properties of CRC • Sent F(x), but received F’(x) = F(x)+E(x)When will E(x)/P(x) have no remainder,i.e., when does CRC fail to catch an error? • Single Bit Error E(x) = xiIf P(x) has two or more terms, P(x) will not divide E(x) • 2 Isolated Single Bit Errors (double errors)E(x) = xi+xj, i>jE(x) = xj(xi-j+1)Provided that P(x) is not divisible by x, a sufficient condition to detect all double errors is that P(x) does not divide (xt+1) for any t up to i-j (i.e., block length)

Properties of CRC • Odd Number of Bit ErrorsIf x+1 is a factor of P(x), all odd number of bit errors are detectedProof: Assume an odd number of errors has x+1 as a factor.Then E(x) = (x+1)T(x). Evaluate E(x) for x = 1 E(x) = E(1) = 1 since there are odd number of terms (x+1) = (1+1) = 0 (x+1)T(x) = (1+1)T(1) = 0 E(x) ≠ (x+1)T(x)

Properties of CRC • Short Burst Errors(Length t ≤ n, number of redundant bits)E(x) = xj(xt-1+…+1)  Length t, starting at bit position jIf P(x) has an x0 term and t≤ n, P(x) will not divide E(x) All errors up to length n are detected • Long Burst Errors (Length t = n+1)Undetectable only if burst error is the same as P(x)P(x) = xn+ … + 1 n-1 bits between xn and x0E(x) = 1 + … + 1 must matchProbability of not detecting the error is 2-(n-1) • Longer Burst Errors (Length t > n+1)Probability of not detecting the error is 2-n

Error Correction • Hamming Codes(more next week)

Noise, Information Theory, and Entropy (cont.)