# Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - - PowerPoint PPT Presentation

1 / 59

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - Shannon code, Huffman code, arithmetic code. Juris Viksna, 2014. Information transmission. We will focus on compression/decompression parts, assuming that there are no losses during transmission.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes -

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

#### Presentation Transcript

Information and Coding Theory

Transmission over lossless channels. Entropy. Compression codes -

Shannon code, Huffman code, arithmetic code.

Juris Viksna, 2014

### Information transmission

We will focus on compression/decompression parts, assuming that there

are no losses during transmission.

### Noiseless channel

How many bits we need to transfer a particular piece of information?

All possible n bit messages, each with probability

1/2n

Noiseless channel

Obviously n bits will be sufficient.

Also, it is not hard to guess that n bits will be necessary to distinguish

between all possible messages.

### Noiseless channel

All possible n bit messages.

Msg. Prob.

000000... ½

111111... ½

other 0

Noiseless channel

n bits will still be sufficient.

However, we can do quite nicely with just 1 bit!

### Noiseless channel

• All possible n bit messages.

• Msg. Prob.

• 00 ¼

• 01 ¼

• ½

• 0

Noiseless channel

Try to use 2 bits for “00” and “01” and 1 bit for “10”:

00  00

01  01

10  1

### Noiseless channel

All possible n bit messages, the probability of message i being pi.

Noiseless channel

We can try to generalize this by defining entropy (the minimal average

number of bits we need to distinguish between messages) in the

following way:

Derived from the Greek εντροπία "a turning towards"

(εν- "in" + τροπή "a turning").

### Entropy - The idea

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

Example

### Entropy - Definition

NB!!!

If not explicitly stated otherwise, in this course (as well in Computer Science in general)expressions log x denote logarithm of base 2 (i.e. log2 x).

### Entropy - Definition

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

### Binary entropy function

Entropy of a Bernoulli

trial as a function of success

probability, often called

the

binary entropy function,

Hb(p).

The entropy is maximized

at 1 bit per trial when the

two possible outcomes are

equally probable, as in

an unbiased coin toss.

### Entropy - some properties

Entropy is maximized if probability distribution is uniform – i.e. all

probabilities pi are equal.

Sketch of proof:

Assume probabilities p and q, then taking both probabilities equal to (p+q)/2 entropy does not decrease.

H(p,q) = – (p log p + q log q)

H((p+q)/2, (p+q)/2) = – ((p+q)/2 log ((p+q)/2))

– ((p+q)/2 log ((p+q)/2)) + (p log p + q log q) 

– ((p+q)/2 log ((pq)1/2) + (p log p + q log q) 

– ((p+q) (log p + log q) + (p log p + q log q)  (p –q)(log p – log q)  0

### Joint entropy

Assume that we have a set of symbols  with known frequencies

of symbol occurrences. We have assumed that on average we will

need H() bits to distinguish between symbols.

What about sequences of length n of symbols from  (assuming

independent occurrence of each symbol with the given frequency)?

The entropy of n will be:

it turns out that H(n) = nH().

Later we will show that (assuming some restrictions) the encoding that

use nH() bits on average are the best we can get.

### Joint entropy

The joint entropy of two discrete random variables X and Y is

merely the entropy of their pairing: (X,Y). This implies that if

X and Y are independent, then their joint entropy is the sum of their

individual entropies.

### Conditional entropy

The conditional entropy of X given random variable Y (also called

the equivocation of X about Y) is the average conditional entropy

over Y:

### Mutual information

Mutual information measures the amount of information that can

be obtained about one random variable by observing another.

Mutual information is symmetric:

### Entropy (summarized)

Relations between entropies, conditional entropies, joint entropy and mutual information.

### Binary encoding - The problem

Straightforward approach - use 3 bits to encode each character

(e.g. '000' for a, '001' for b, '010' for c, '011' for d, '100' for e, '101'

for f).

The length of the data file then will be 300 000.

Can we do better?

### Optimal codes

Is this prefix code optimal?

### Huffman encoding - example 2

Construct Huffman code for symbols with frequencies:

A15

D6

F6

H3I1M2N2

U2

V2

#7

### Huffman encoding - optimality

Huffman codes are optimal!

### Huffman encoding - optimality (proof 2)

• Proof by induction:

• n = 1OK

• assume T is obtained by Huffman algorithm and X is an optimal tree.

• Construct T’ and X’ as described by lemma. Then:

• w(T’)  w(X’)

• w(T) = w(T’)+C(n1)+C(n2)

• w(X)  w(X’)+C(n1)+C(n2)

• w(T)  w(X)

### Huffman encoding and entropy

• W() - average number of bits used by Huffman code

• H() - entropy

• Then H() W()<H()+1.

• Assume all probabilities are in form 1/2k.

• Then we can prove by induction that H() =W() (we can state that symbol with probability 1/2k. will always be at depth k)

• obvious if ||=1 or ||=2

• otherwise there will always be two symbols having smallest probabilities both equal to 1/2k

• these will be joined by Huffman algorithm, thus we reduced the problem to alphabet containing one symbol less.

### Huffman encoding and entropy

• W() - average number of bits used by Huffman code

• H() - entropy

• Then W()<H()+1.

• Consider symbols a with probabilities 1/2k+1 p(a) < 1/2k

• modify alphabet: for each a reduce its probability to 1/2k+1

• add extra symbols with probabilities in form 1/2k (so that all powers for these are different)

• construct Huffman encoding tree

• the depth of initial symbols will be k+1, thus W() < H()+1

• we can prune the tree deleting extra symbols, this will only

• decrease W()

### Huffman encoding and entropy

Can we claim that H() W()<H()+1?

In general case symbol with probability 1/2k can be at depth other than k:

Consider two symbols with probabilities 1/2k and 1  1/2k, both of them

will be at depth 1. However changing both probabilities to ½ the entropy will only increase.

By induction we can show that all symbol probabilities can be all changed to have a form 1/2k in such a way that entropy does not decrease and the Huffman tree does not change its structure.

Thus we always will have H() W()<H()+1.

### Arithmetic coding

Unlike the variable-length codes described previously, arithmetic coding, generates non-block codes. In arithmetic coding, a one-to-one correspondence between source symbols and code words does not exist. Instead, an entire sequence of source symbols (or message) is assigned a single arithmetic code word.

The code word itself defines an interval of real numbers between 0 and 1. As the number of symbols in the message increases, the interval used to represent it becomes smaller and the number of information units (say, bits) required to represent the interval becomes larger. Each symbol of the message reduces the size of the interval in accordance with the probability of occurrence. It is supposed to approach the limit set by entropy.

### Arithmetic coding

Let the message to be encoded be a1a2a3a3a4

0.072

0.0688

0.8

0.16

0.4

0.056

0.0624

0.08

0.06496

0.2

0.048

0.0592

0.06368

0.04

### Arithmetic coding

• So, any number in the interval [0.06752,0.0688) , for example 0.068 can be used to represent the message.

• Here 3 decimal digits are used to represent the 5 symbol source message. This translates into 3/5 or 0.6 decimal digits per source symbol and compares favourably with the entropy of

• -(3x0.2log100.2+0.4log100.4) = 0.5786 digits per symbol

• As the length of the sequence increases, the resulting arithmetic code approaches the bound set by entropy.

• In practice, the length fails to reach the lower bound, because:

• The addition of the end of message indicator that is needed to separate one message from another

• The use of finite precision arithmetic

### Arithmetic coding

Decoding:

Decode 0.572.

Since 0.8>code word > 0.4, the first symbol should be a3.

1.0

0.8

0.72

0.592

0.5728

0.5856

0.57152

0.8

0.72

0.688

0.5728

056896

0.4

0.56

0.624

Therefore, the message is:

a3a3a1a2a4

0.2

0.48

0.592

0.5664

0.56768

0.0

0.4

0.56

0.56

0.5664