Information and Coding Theory
This presentation is the property of its rightful owner.
Sponsored Links
1 / 59

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - PowerPoint PPT Presentation


  • 64 Views
  • Uploaded on
  • Presentation posted in: General

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - Shannon code, Huffman code, arithmetic code. Juris Viksna, 2014. Information transmission. We will focus on compression/decompression parts, assuming that there are no losses during transmission.

Download Presentation

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes -

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


Information and coding theory transmission over lossless channels entropy compression codes

Information and Coding Theory

Transmission over lossless channels. Entropy. Compression codes -

Shannon code, Huffman code, arithmetic code.

Juris Viksna, 2014


Information transmission

Information transmission

We will focus on compression/decompression parts, assuming that there

are no losses during transmission.

[Adapted from D.MacKay]


Noiseless channel

Noiseless channel

[Adapted from D.MacKay]


Noiseless channel1

Noiseless channel

How many bits we need to transfer a particular piece of information?

All possible n bit messages, each with probability

1/2n

Receiver

Noiseless channel

Obviously n bits will be sufficient.

Also, it is not hard to guess that n bits will be necessary to distinguish

between all possible messages.


Noiseless channel2

Noiseless channel

All possible n bit messages.

Msg. Prob.

000000... ½

111111... ½

other 0

Receiver

Noiseless channel

n bits will still be sufficient.

However, we can do quite nicely with just 1 bit!


Noiseless channel3

Noiseless channel

  • All possible n bit messages.

  • Msg. Prob.

  • 00 ¼

  • 01 ¼

  • ½

  • 0

Receiver

Noiseless channel

Try to use 2 bits for “00” and “01” and 1 bit for “10”:

00  00

01  01

10  1


Noiseless channel4

Noiseless channel

All possible n bit messages, the probability of message i being pi.

Receiver

Noiseless channel

We can try to generalize this by defining entropy (the minimal average

number of bits we need to distinguish between messages) in the

following way:

Derived from the Greek εντροπία "a turning towards"

(εν- "in" + τροπή "a turning").


Entropy the idea

Entropy - The idea

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

[Adapted from T.Mitchell]


Entropy the idea1

Entropy - The idea

[Adapted from T.Mitchell]


Entropy definition

Entropy - Definition

Example

[Adapted from D.MacKay]


Entropy definition1

Entropy - Definition

NB!!!

If not explicitly stated otherwise, in this course (as well in Computer Science in general)expressions log x denote logarithm of base 2 (i.e. log2 x).

[Adapted from D.MacKay]


Entropy definition2

Entropy - Definition

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

[Adapted from T.Mitchell]


Entropy some examples

Entropy - Some examples

[Adapted from T.Mitchell]


Entropy some examples1

Entropy - Some examples

[Adapted from T.Mitchell]


Binary entropy function

Binary entropy function

Entropy of a Bernoulli

trial as a function of success

probability, often called

the

binary entropy function,

Hb(p).

The entropy is maximized

at 1 bit per trial when the

two possible outcomes are

equally probable, as in

an unbiased coin toss.

[Adapted from www.wikipedia.org]


Entropy some properties

Entropy - some properties

[Adapted from D.MacKay]


Entropy some properties1

Entropy - some properties

Entropy is maximized if probability distribution is uniform – i.e. all

probabilities pi are equal.

Sketch of proof:

Assume probabilities p and q, then taking both probabilities equal to (p+q)/2 entropy does not decrease.

H(p,q) = – (p log p + q log q)

H((p+q)/2, (p+q)/2) = – ((p+q)/2 log ((p+q)/2))

– ((p+q)/2 log ((p+q)/2)) + (p log p + q log q) 

– ((p+q)/2 log ((pq)1/2) + (p log p + q log q) 

– ((p+q) (log p + log q) + (p log p + q log q)  (p –q)(log p – log q)  0

In addition we need also some smoothness assumptions about H.


Joint entropy

Joint entropy

Assume that we have a set of symbols  with known frequencies

of symbol occurrences. We have assumed that on average we will

need H() bits to distinguish between symbols.

What about sequences of length n of symbols from  (assuming

independent occurrence of each symbol with the given frequency)?

The entropy of n will be:

it turns out that H(n) = nH().

Later we will show that (assuming some restrictions) the encoding that

use nH() bits on average are the best we can get.


Joint entropy1

Joint entropy

The joint entropy of two discrete random variables X and Y is

merely the entropy of their pairing: (X,Y). This implies that if

X and Y are independent, then their joint entropy is the sum of their

individual entropies.

[Adapted from D.MacKay]


Conditional entropy

Conditional entropy

The conditional entropy of X given random variable Y (also called

the equivocation of X about Y) is the average conditional entropy

over Y:

[Adapted from D.MacKay]


Conditional entropy1

Conditional entropy

[Adapted from D.MacKay]


Mutual information

Mutual information

Mutual information measures the amount of information that can

be obtained about one random variable by observing another.

Mutual information is symmetric:

[Adapted from D.MacKay]


Entropy summarized

Entropy (summarized)

Relations between entropies, conditional entropies, joint entropy and mutual information.

[Adapted from D.MacKay]


Entropy example

Entropy - example

[Adapted from D.MacKay]


Binary encoding the problem

Binary encoding - The problem

Straightforward approach - use 3 bits to encode each character

(e.g. '000' for a, '001' for b, '010' for c, '011' for d, '100' for e, '101'

for f).

The length of the data file then will be 300 000.

Can we do better?

[Adapted from S.Cheng]


Variable length codes

Variable length codes

[Adapted from S.Cheng]


Encoding

Encoding

[Adapted from S.Cheng]


Decoding

Decoding

[Adapted from S.Cheng]


Prefix codes

Prefix codes

[Adapted from S.Cheng]


Prefix codes1

Prefix codes

[Adapted from S.Cheng]


Binary trees and prefix codes

Binary trees and prefix codes

[Adapted from S.Cheng]


Binary trees and prefix codes1

Binary trees and prefix codes

[Adapted from S.Cheng]


Optimal codes

Optimal codes

Is this prefix code optimal?

[Adapted from S.Cheng]


Optimal codes1

Optimal codes

[Adapted from S.Cheng]


Shannon encoding

Shannon encoding

[Adapted from M.Brookes]


Huffman encoding

Huffman encoding

[Adapted from S.Cheng]


Huffman encoding example

Huffman encoding - example

[Adapted from S.Cheng]


Huffman encoding example1

Huffman encoding - example

[Adapted from S.Cheng]


Huffman encoding example2

Huffman encoding - example

[Adapted from S.Cheng]


Huffman encoding example3

Huffman encoding - example

[Adapted from S.Cheng]


Huffman encoding example4

Huffman encoding - example

[Adapted from S.Cheng]


Huffman encoding example 2

Huffman encoding - example 2

Construct Huffman code for symbols with frequencies:

A15

D6

F6

H3I1M2N2

U2

V2

#7


Huffman encoding example 21

Huffman encoding - example 2

[Adapted from H.Lewis, L.Denenberg]


Huffman encoding algorithm

Huffman encoding - algorithm

[Adapted from S.Cheng]


Huffman encoding optimality

Huffman encoding - optimality

[Adapted from S.Cheng]


Huffman encoding optimality1

Huffman encoding - optimality

[Adapted from S.Cheng]


Huffman encoding optimality2

Huffman encoding - optimality

[Adapted from S.Cheng]


Huffman encoding optimality3

Huffman encoding - optimality

Huffman codes are optimal!

[Adapted from S.Cheng]


Huffman encoding optimality proof 2

Huffman encoding - optimality (proof 2)

[Adapted from H.Lewis and L.Denenberg]


Huffman encoding optimality proof 21

Huffman encoding - optimality (proof 2)

  • Proof by induction:

  • n = 1OK

  • assume T is obtained by Huffman algorithm and X is an optimal tree.

  • Construct T’ and X’ as described by lemma. Then:

  • w(T’)  w(X’)

  • w(T) = w(T’)+C(n1)+C(n2)

  • w(X)  w(X’)+C(n1)+C(n2)

  • w(T)  w(X)

[Adapted from H.Lewis and L.Denenberg]


Huffman encoding and entropy

Huffman encoding and entropy

  • W() - average number of bits used by Huffman code

  • H() - entropy

  • Then H() W()<H()+1.

  • Assume all probabilities are in form 1/2k.

  • Then we can prove by induction that H() =W() (we can state that symbol with probability 1/2k. will always be at depth k)

  • obvious if ||=1 or ||=2

  • otherwise there will always be two symbols having smallest probabilities both equal to 1/2k

  • these will be joined by Huffman algorithm, thus we reduced the problem to alphabet containing one symbol less.


Huffman encoding and entropy1

Huffman encoding and entropy

  • W() - average number of bits used by Huffman code

  • H() - entropy

  • Then W()<H()+1.

  • Consider symbols a with probabilities 1/2k+1 p(a) < 1/2k

  • modify alphabet: for each a reduce its probability to 1/2k+1

  • add extra symbols with probabilities in form 1/2k (so that all powers for these are different)

  • construct Huffman encoding tree

  • the depth of initial symbols will be k+1, thus W() < H()+1

  • we can prune the tree deleting extra symbols, this will only

  • decrease W()


Huffman encoding and entropy2

Huffman encoding and entropy

Can we claim that H() W()<H()+1?

In general case symbol with probability 1/2k can be at depth other than k:

Consider two symbols with probabilities 1/2k and 1  1/2k, both of them

will be at depth 1. However changing both probabilities to ½ the entropy will only increase.

By induction we can show that all symbol probabilities can be all changed to have a form 1/2k in such a way that entropy does not decrease and the Huffman tree does not change its structure.

Thus we always will have H() W()<H()+1.


Arithmetic coding

Arithmetic coding

Unlike the variable-length codes described previously, arithmetic coding, generates non-block codes. In arithmetic coding, a one-to-one correspondence between source symbols and code words does not exist. Instead, an entire sequence of source symbols (or message) is assigned a single arithmetic code word.

The code word itself defines an interval of real numbers between 0 and 1. As the number of symbols in the message increases, the interval used to represent it becomes smaller and the number of information units (say, bits) required to represent the interval becomes larger. Each symbol of the message reduces the size of the interval in accordance with the probability of occurrence. It is supposed to approach the limit set by entropy.


Arithmetic coding1

Arithmetic coding

Let the message to be encoded be a1a2a3a3a4


Arithmetic coding2

Arithmetic coding

0.072

0.0688

0.8

0.16

0.4

0.056

0.0624

0.08

0.06496

0.2

0.048

0.0592

0.06368

0.04


Arithmetic coding3

Arithmetic coding


Arithmetic coding4

Arithmetic coding

  • So, any number in the interval [0.06752,0.0688) , for example 0.068 can be used to represent the message.

  • Here 3 decimal digits are used to represent the 5 symbol source message. This translates into 3/5 or 0.6 decimal digits per source symbol and compares favourably with the entropy of

  • -(3x0.2log100.2+0.4log100.4) = 0.5786 digits per symbol

  • As the length of the sequence increases, the resulting arithmetic code approaches the bound set by entropy.

  • In practice, the length fails to reach the lower bound, because:

  • The addition of the end of message indicator that is needed to separate one message from another

  • The use of finite precision arithmetic


Arithmetic coding5

Arithmetic coding

Decoding:

Decode 0.572.

Since 0.8>code word > 0.4, the first symbol should be a3.

1.0

0.8

0.72

0.592

0.5728

0.5856

0.57152

0.8

0.72

0.688

0.5728

056896

0.4

0.56

0.624

Therefore, the message is:

a3a3a1a2a4

0.2

0.48

0.592

0.5664

0.56768

0.0

0.4

0.56

0.56

0.5664


  • Login