slide1
Download
Skip this Video
Download Presentation
Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes -

Loading in 2 Seconds...

play fullscreen
1 / 59

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes - Shannon code, Huffman code, arithmetic code. Juris Viksna, 2014. Information transmission. We will focus on compression/decompression parts, assuming that there are no losses during transmission.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'Information and Coding Theory Transmission over lossless channels. Entropy. Compression codes -' - morley


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1
Information and Coding Theory

Transmission over lossless channels. Entropy. Compression codes -

Shannon code, Huffman code, arithmetic code.

Juris Viksna, 2014

information transmission
Information transmission

We will focus on compression/decompression parts, assuming that there

are no losses during transmission.

[Adapted from D.MacKay]

noiseless channel
Noiseless channel

[Adapted from D.MacKay]

noiseless channel1
Noiseless channel

How many bits we need to transfer a particular piece of information?

All possible n bit messages, each with probability

1/2n

Receiver

Noiseless channel

Obviously n bits will be sufficient.

Also, it is not hard to guess that n bits will be necessary to distinguish

between all possible messages.

noiseless channel2
Noiseless channel

All possible n bit messages.

Msg. Prob.

000000... ½

111111... ½

other 0

Receiver

Noiseless channel

n bits will still be sufficient.

However, we can do quite nicely with just 1 bit!

noiseless channel3
Noiseless channel
  • All possible n bit messages.
  • Msg. Prob.
  • 00 ¼
  • 01 ¼
  • ½
  • 0

Receiver

Noiseless channel

Try to use 2 bits for “00” and “01” and 1 bit for “10”:

00  00

01  01

10  1

noiseless channel4
Noiseless channel

All possible n bit messages, the probability of message i being pi.

Receiver

Noiseless channel

We can try to generalize this by defining entropy (the minimal average

number of bits we need to distinguish between messages) in the

following way:

Derived from the Greek εντροπία "a turning towards"

(εν- "in" + τροπή "a turning").

entropy the idea
Entropy - The idea

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

[Adapted from T.Mitchell]

entropy the idea1
Entropy - The idea

[Adapted from T.Mitchell]

entropy definition
Entropy - Definition

Example

[Adapted from D.MacKay]

entropy definition1
Entropy - Definition

NB!!!

If not explicitly stated otherwise, in this course (as well in Computer Science in general)expressions log x denote logarithm of base 2 (i.e. log2 x).

[Adapted from D.MacKay]

entropy definition2
Entropy - Definition

The entropy, H, of a discrete random variable X is a measure of the

amount of uncertainty associated with the value of X.

[Adapted from T.Mitchell]

entropy some examples
Entropy - Some examples

[Adapted from T.Mitchell]

entropy some examples1
Entropy - Some examples

[Adapted from T.Mitchell]

binary entropy function
Binary entropy function

Entropy of a Bernoulli

trial as a function of success

probability, often called

the

binary entropy function,

Hb(p).

The entropy is maximized

at 1 bit per trial when the

two possible outcomes are

equally probable, as in

an unbiased coin toss.

[Adapted from www.wikipedia.org]

entropy some properties
Entropy - some properties

[Adapted from D.MacKay]

entropy some properties1
Entropy - some properties

Entropy is maximized if probability distribution is uniform – i.e. all

probabilities pi are equal.

Sketch of proof:

Assume probabilities p and q, then taking both probabilities equal to (p+q)/2 entropy does not decrease.

H(p,q) = – (p log p + q log q)

H((p+q)/2, (p+q)/2) = – ((p+q)/2 log ((p+q)/2))

– ((p+q)/2 log ((p+q)/2)) + (p log p + q log q) 

– ((p+q)/2 log ((pq)1/2) + (p log p + q log q) 

– ((p+q) (log p + log q) + (p log p + q log q)  (p –q)(log p – log q)  0

In addition we need also some smoothness assumptions about H.

joint entropy
Joint entropy

Assume that we have a set of symbols  with known frequencies

of symbol occurrences. We have assumed that on average we will

need H() bits to distinguish between symbols.

What about sequences of length n of symbols from  (assuming

independent occurrence of each symbol with the given frequency)?

The entropy of n will be:

it turns out that H(n) = nH().

Later we will show that (assuming some restrictions) the encoding that

use nH() bits on average are the best we can get.

joint entropy1
Joint entropy

The joint entropy of two discrete random variables X and Y is

merely the entropy of their pairing: (X,Y). This implies that if

X and Y are independent, then their joint entropy is the sum of their

individual entropies.

[Adapted from D.MacKay]

conditional entropy
Conditional entropy

The conditional entropy of X given random variable Y (also called

the equivocation of X about Y) is the average conditional entropy

over Y:

[Adapted from D.MacKay]

conditional entropy1
Conditional entropy

[Adapted from D.MacKay]

mutual information
Mutual information

Mutual information measures the amount of information that can

be obtained about one random variable by observing another.

Mutual information is symmetric:

[Adapted from D.MacKay]

entropy summarized
Entropy (summarized)

Relations between entropies, conditional entropies, joint entropy and mutual information.

[Adapted from D.MacKay]

entropy example
Entropy - example

[Adapted from D.MacKay]

binary encoding the problem
Binary encoding - The problem

Straightforward approach - use 3 bits to encode each character

(e.g. '000' for a, '001' for b, '010' for c, '011' for d, '100' for e, '101'

for f).

The length of the data file then will be 300 000.

Can we do better?

[Adapted from S.Cheng]

variable length codes
Variable length codes

[Adapted from S.Cheng]

encoding
Encoding

[Adapted from S.Cheng]

decoding
Decoding

[Adapted from S.Cheng]

prefix codes
Prefix codes

[Adapted from S.Cheng]

prefix codes1
Prefix codes

[Adapted from S.Cheng]

binary trees and prefix codes
Binary trees and prefix codes

[Adapted from S.Cheng]

binary trees and prefix codes1
Binary trees and prefix codes

[Adapted from S.Cheng]

optimal codes
Optimal codes

Is this prefix code optimal?

[Adapted from S.Cheng]

optimal codes1
Optimal codes

[Adapted from S.Cheng]

shannon encoding
Shannon encoding

[Adapted from M.Brookes]

huffman encoding
Huffman encoding

[Adapted from S.Cheng]

huffman encoding example
Huffman encoding - example

[Adapted from S.Cheng]

huffman encoding example1
Huffman encoding - example

[Adapted from S.Cheng]

huffman encoding example2
Huffman encoding - example

[Adapted from S.Cheng]

huffman encoding example3
Huffman encoding - example

[Adapted from S.Cheng]

huffman encoding example4
Huffman encoding - example

[Adapted from S.Cheng]

huffman encoding example 2
Huffman encoding - example 2

Construct Huffman code for symbols with frequencies:

A 15

D 6

F 6

H 3 I 1M 2N 2

U 2

V 2

# 7

huffman encoding example 21
Huffman encoding - example 2

[Adapted from H.Lewis, L.Denenberg]

huffman encoding algorithm
Huffman encoding - algorithm

[Adapted from S.Cheng]

huffman encoding optimality
Huffman encoding - optimality

[Adapted from S.Cheng]

huffman encoding optimality1
Huffman encoding - optimality

[Adapted from S.Cheng]

huffman encoding optimality2
Huffman encoding - optimality

[Adapted from S.Cheng]

huffman encoding optimality3
Huffman encoding - optimality

Huffman codes are optimal!

[Adapted from S.Cheng]

huffman encoding optimality proof 2
Huffman encoding - optimality (proof 2)

[Adapted from H.Lewis and L.Denenberg]

huffman encoding optimality proof 21
Huffman encoding - optimality (proof 2)
  • Proof by induction:
  • n = 1 OK
  • assume T is obtained by Huffman algorithm and X is an optimal tree.
  • Construct T’ and X’ as described by lemma. Then:
  • w(T’)  w(X’)
  • w(T) = w(T’)+C(n1)+C(n2)
  • w(X)  w(X’)+C(n1)+C(n2)
  • w(T)  w(X)

[Adapted from H.Lewis and L.Denenberg]

huffman encoding and entropy
Huffman encoding and entropy
  • W() - average number of bits used by Huffman code
  • H() - entropy
  • Then H() W()
  • Assume all probabilities are in form 1/2k.
  • Then we can prove by induction that H() =W() (we can state that symbol with probability 1/2k. will always be at depth k)
  • obvious if ||=1 or ||=2
  • otherwise there will always be two symbols having smallest probabilities both equal to 1/2k
  • these will be joined by Huffman algorithm, thus we reduced the problem to alphabet containing one symbol less.
huffman encoding and entropy1
Huffman encoding and entropy
  • W() - average number of bits used by Huffman code
  • H() - entropy
  • Then W()
  • Consider symbols a with probabilities 1/2k+1 p(a) < 1/2k
  • modify alphabet: for each a reduce its probability to 1/2k+1
  • add extra symbols with probabilities in form 1/2k (so that all powers for these are different)
  • construct Huffman encoding tree
  • the depth of initial symbols will be k+1, thus W() < H()+1
  • we can prune the tree deleting extra symbols, this will only
  • decrease W()
huffman encoding and entropy2
Huffman encoding and entropy

Can we claim that H() W()

In general case symbol with probability 1/2k can be at depth other than k:

Consider two symbols with probabilities 1/2k and 1  1/2k, both of them

will be at depth 1. However changing both probabilities to ½ the entropy will only increase.

By induction we can show that all symbol probabilities can be all changed to have a form 1/2k in such a way that entropy does not decrease and the Huffman tree does not change its structure.

Thus we always will have H() W()

arithmetic coding
Arithmetic coding

Unlike the variable-length codes described previously, arithmetic coding, generates non-block codes. In arithmetic coding, a one-to-one correspondence between source symbols and code words does not exist. Instead, an entire sequence of source symbols (or message) is assigned a single arithmetic code word.

The code word itself defines an interval of real numbers between 0 and 1. As the number of symbols in the message increases, the interval used to represent it becomes smaller and the number of information units (say, bits) required to represent the interval becomes larger. Each symbol of the message reduces the size of the interval in accordance with the probability of occurrence. It is supposed to approach the limit set by entropy.

arithmetic coding1
Arithmetic coding

Let the message to be encoded be a1a2a3a3a4

arithmetic coding2
Arithmetic coding

0.072

0.0688

0.8

0.16

0.4

0.056

0.0624

0.08

0.06496

0.2

0.048

0.0592

0.06368

0.04

arithmetic coding4
Arithmetic coding
  • So, any number in the interval [0.06752,0.0688) , for example 0.068 can be used to represent the message.
  • Here 3 decimal digits are used to represent the 5 symbol source message. This translates into 3/5 or 0.6 decimal digits per source symbol and compares favourably with the entropy of
  • -(3x0.2log100.2+0.4log100.4) = 0.5786 digits per symbol
  • As the length of the sequence increases, the resulting arithmetic code approaches the bound set by entropy.
  • In practice, the length fails to reach the lower bound, because:
  • The addition of the end of message indicator that is needed to separate one message from another
  • The use of finite precision arithmetic
arithmetic coding5
Arithmetic coding

Decoding:

Decode 0.572.

Since 0.8>code word > 0.4, the first symbol should be a3.

1.0

0.8

0.72

0.592

0.5728

0.5856

0.57152

0.8

0.72

0.688

0.5728

056896

0.4

0.56

0.624

Therefore, the message is:

a3a3a1a2a4

0.2

0.48

0.592

0.5664

0.56768

0.0

0.4

0.56

0.56

0.5664

ad