1 / 27

# COT 5611 Operating Systems Design Principles Spring 2012 - PowerPoint PPT Presentation

COT 5611 Operating Systems Design Principles Spring 2012. Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 5:00-6:00 PM. Lecture 17 – Wednesday March 14, 2012. Reading assignment: Chapter 8 from the on-line text Claude Shannon’s paper Last time - Information Theory

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

## PowerPoint Slideshow about ' COT 5611 Operating Systems Design Principles Spring 2012' - leandra-buchanan

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

### COT 5611 Operating SystemsDesign Principles Spring 2012

Dan C. Marinescu

Office: HEC 304

Office hours: M-Wd 5:00-6:00 PM

• Chapter 8 from the on-line text

• Claude Shannon’s paper

• Last time - Information Theory

• Information theory - a statistical theory of communication

• Random variables, probability density functions (PDF), cumulative distribution functions (CDF),

• Thermodynamic entropy

• Shannon entropy

• Joint and conditional entropy

• Mutual information

• Shannon’s source coding theorem

• Channel capacity

Lecture 17

• Information Theory

• Applications of information theory

• Properties of Shannon’s entropy

• Joint and conditional entropy

• Mutual information

• Shannon’s source coding theorem

• Channel capacity

• Error detection and error correction

Lecture 17

Error detection and error correction  increase redundancy to protect the message.

Data compression  remove redundancy.

Encryption  transform information to protect it.

Lecture 17

H(X) > 0 for 0 < p < 1;

H(X) is symmetric about p = 0:5;

limp0H(X) = limp1H(X) = 0;

H(X) is increasing for 0 < p < 0:5, decreasing for 0:5 < p < 1 and has a maximum for p = 0:5.

The binary entropy is a concave function of p, the probability of an outcome.

Note: A function f(x) is convex over an interval (a,b) if

f[kx1+(1-k)x2] ≤ kf(x1)+(1-k)x2 for all (x1,x2 ) in (a,b) and 0 ≤ k ≤ 1.

A function is concave over an interval (a,b) if [-f(x)] is convex over (a,b).

Lecture 17

H(X) > 0 for 0 < p < 1;

H(X) is symmetric about p = 0:5;

limp0H(X) = limp1H(X) = 0;

H(X) is increasing for 0 < p < 0:5, decreasing for 0:5 < p < 1 and has a maximum for p = 0:5.

The binary entropy is a concave function of p, the probability of an outcome.

Note: A function f(x) is convex over an interval (a,b) if

f[kx1+(1-k)x2] ≤ kf(x1)+(1-k)x2 for all (x1,x2 ) in (a,b) and 0 ≤ k ≤ 1.

A function is concave over an interval (a,b) if [-f(x)] is convex over (a,b).

Lecture 17

H(X, Y) = H(Y,X) symmetry of joint entropy

H(X, Y) ≥ 0  nonnegativity of joint entropy

H(X | Y) ≥ 0; H(Y | X) ≥ 0  nonnegativity of conditional entropy

H(X | Y) = H(X,Y) - H(Y)  conditional and joint entropy relation

H(X,Y) ≥ H(Y )  joint entropy vs. entropy of a single rv

H(X,Y) ≤ H(X) + H(Y )  subadditivity

H(X, Y, Z) + H(Y) ≤H(X,Y) + H(Y,Z)  strong subadditivity

H(X | Y) ≤ H(X)  reduction of uncertainty by conditioning

H(X,Y,Z) = H(X) + H(Y | X) + H(Z | X, Y ) chainrule for joint entropy

H(X,Y | Z) = H(Y | X,Z) + H(X | Z) chain rule for conditional entropy:

Lecture 17

I(X; Y) = I(Y ;X) symmetry of mutual entropy

I(X; Y) = H(X) - H(X j Y ) mutual information, entropy, and conditional entropy

I(X; Y) = H(Y ) - H(Y|X) mutual information, entropy, and conditional entropy

I(X;X) = H(X) mutual self information and entropy

I(X;X) ≥0; non-negativity of mutual self information

I(X;Y) = H(X) + H(Y ) - H(X,Y ) mutual information, entropy, and joint entropy

I(X; Y | Z) = H(X | Z) - H(X | Y,Z) conditional mutual information and conditional entropy

I(X, Y;Z) = I(X;Z | Y ) + I(Y ;Z) chain rule for mutual information

I(X; Y) ≤ I(X;Z) if X  Y  Z data processing inequality

Lecture 17

lX(n) = nH(X) + O(n)

Informally, Shannon source encoding theorem states that a message containing n independent, identically distributed samples of a random variable X with entropy H(X) can be compressed to a length

The justification of this theorem is based on the weak law of large numbers The mean of a large number of independent, identically distributed random variables, xi,

approaches the average,

with a high probability when n is large

with 𝜺 and 𝛅 arbitrary.

Lecture 17

When the source has an alphabet with m symbols and messages consist of n independently selected symbols from this alphabet a large number of these sequences are typical.

There are 2nH(A) typical strings, therefore we need log 2nH(A) = nH(A) bits to encode all possible typical strings; this is the upper bound for the data compression provided by Shannon's source encoding theorem.

Lecture 17

Lecture 17

Discrete memoryless channel:

C= maxp(x)I(X;Y)

the maximum of mutual information between the input X and the output Y.

The capacity of a noisy channel

The noisy binary symmetric channel: p probability of error; q=Prob(X=0)

I(X;Y) = H(Y) –H(Y|X)

H(Y | X) = -{ q [p log p + (1 - p) log(1 - p)] + (1 - q) [p log p + (1 - p) log(1 - p)]}

= [p log p + (1 - p) log(1 - p)]

We maximize I(X;Y) by making H(Y)=1  C=1 - [p log p + (1 - p) log(1 - p)]

p=1/2  C=0 because the output is independent of the input;

p=0 or p=1  C=1 we have a noiseless channel

The capacity of the binary erasure channel with pe the probability of erasure

Ce= 1- pe

Lecture 17

• Error detection and error correction based on schemes to increase the redundancy of a message.

• A crude analogy is to bubble wrap a fragile item and place it into a box to reduce the chance that the item will be damaged during transport. Redundant information plays the role of the packing materials; it increases the amount of data transmitted, but it also increases the chance that we will be able to restore the original contents of a message distorted during communication.

• Coding corresponds to the selection of both the packing materials and the strategy to optimally pack the fragile item subject to the obvious constraints: use the least amount of packing materials and the least amount of effort to pack and unpack.

• Error detection  compare what you received with the code words from the common dictionary; if there is no match error(s) have occurred

• Error correction  map the received message to a valid code word.

Lecture 17

• A trivial example of an error detection scheme the addition of a parity check bit to a word of a given length.

• This is a simple scheme but very powerful; it allows us to detect an odd number of errors, but fails if an even number of errors occur. For example, consider a system that enforces even parity for an eight-bit word. Given the string 10111011, we add one more bit to ensure that the total number of 1s is even, in this case a 0, and we transmit the nine-bit string 101110110. The error detection procedure is to count the number of 1s; we decide that the string is in error if this number is odd.

• This example also hints to the limitations of error detection mechanisms. A code is designed with certain error detection or error correction capabilities and fails to detect, or to correct error patterns not covered by the original design of the code.

• In the previous example we transmit 101110110 and when two errors occur, in the 4-th and the 7-th bits we receive 101010010.

• This tuple has even parity (an even number of 1's) and our scheme for error detection fails.

Lecture 17

n-tuple  a set of n-symbols from an alphabet A.

Example A={0,1,2} and n=6  000000, 211101, 111122, etc.

A={0,1) (binary alphabet) n=3  000, 001,010,100, 110, 101, 011, 111

Code  a set of n-tuples.

Example:

Binarycode C  select 2kcodewords from the 2n possible binary n-tuples

The sender and the receiver share the knowledge of all the code words in C

Hamming distance  the number of positions two code words differ

Distance d of a code C the minimum distance between any pair of code words of C

Hamming sphere of radius d around a code w – the set of all n-tuples at distance at most d from w.

Lecture 17

A block code C=[n,M] consists of code words of length n and allows the encoding of M messages.

Example: consider binary [n,M] codes; for example n=6 and M=4.

The code: C={c0,c1,c2,c3} with

c0=00000, c1= 101101, c2 = 010110, c3=111011

Out of the 26 possible binary 6-tuples we have selected 4 as code words.

Hamming distance of two code words: the number of bit position they differ

d(c1,c3) =3

The Hamming distance of the code C the minimum distance between any pair of code words:

d(C)=3  Indeed

d(c0,c1) =4, d(c0,c2) =3, d(c0,c3) =5, d(c1,c2) =5, d(c1,c3) =3, d(c2,c3) =3

To compute the Hamming distance for an [n,M] code, it is necessary to compute the distance between CM2pairs of codewords and then to find the pair with the minimum distance.

Lecture 17

Encoding  map k information symbols into n = k+rby adding r redundancy symbols

Example: repetitive code: Encode 0  000 and 1  111. Then

the two code words are 000 and 111;

the other 3-tuples are: 100, 010, 001, 011, 101, 110

decode any received 3-tuple with one error as follows

100, 010, 001  0

011, 101, 110 1

The Hamming sphere of radius 1 around 000 and 111

Lecture 17

Lecture 17

Minimum distance or nearest neighbor decoding. If an n-tuple v is received, and there is a unique codeword c such that d (v,c) is the minimum over all codewords of C then correct v as the codeword c. If no such c exists, report that errors have been detected, but no correction is possible. If multiple codewordsare at the same minimum distance from the received codeword select at random one of them and decode v as that codeword.

Maximum likelihood decoding. Under this decoding policy, of all possible codewordsc the n-tuple v is decoded to that codeword c which maximizes the probability P(v,c) that v is received, given that c is sent.

Lecture 17

Consider the same code C= {c0=00000, c1= 101101, c2 = 010110, c3=111011}

Probability of a bit in error is p=0.15

When we receive v =111111 we decode it as 111011.

p(v, 000000) = (0.15)6= 0.000011

p(v,101100) = (0.15)3 x (0.85)3 = 0.002076

p(v,010110) = (0.15)3 x (0.85)3 = 0.002076

p(v,111011) = (0.15)1x (0.85)5= 0.066555

Lecture 17

• The error detection and error correction capabilities of a code are determined by the distance d of the code (minimum Hamming distance between any par of code words)

• To detect e errors d > e+1

• To correct e errors  d ≥ 2e+1

Lecture 17