In the beginning was the word
This presentation is the property of its rightful owner.
Sponsored Links
1 / 36

In the beginning was the Word... PowerPoint PPT Presentation


  • 77 Views
  • Uploaded on
  • Presentation posted in: General

In the beginning was the Word. 情報理論:日本語,英語で隔年開講 今年度は日本語 で授業を行う が , スライドは英語 のものを使用 Information Theory: English and Japanese, alternate years the course will b e taught in Japanese in this year video-recorded English classes  Lecture Archives 2011 Slides are in English

Download Presentation

In the beginning was the Word...

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript


In the beginning was the word

In the beginning was the Word...

情報理論:日本語,英語で隔年開講

  • 今年度は日本語で授業を行うが,スライドは英語のものを使用

    Information Theory: English and Japanese, alternate years

  • the course will be taught in Japanesein this year

  • video-recorded English classesLecture Archives 2011

  • Slides are in English

  • this slide can be found athttp://apal.naist.jp/~kaji/lecture/

  • test questions are given in both of Japanese and English


Information theory

Information Theory

Information Theory (情報理論)

  • is founded by C. E. Shannon in 1948

  • focuses on mathematical theory of communication

  • gave essential impacts on today’s digital technology

    • wired/wireless communication/broadcasting

    • CD/DVD/HDD

    • data compression

    • cryptography, linguistics, bioinformatics, games, ...

      In this class, we learn basic subjects of information theory.

      (half undergraduate level + half graduate school level)

Claude E. Shannon

1916-2001


Class plan

class plan

This class consists of four chapters(+ this introduction):

  • chapter 0: the summary and the schedule of this course

    (today)

  • chapter 1: measuring information

  • chapter 2: compact representation of information

  • chapter 3: coding for noisy communication

  • chapter 4: cryptography


What s the problem

what’s the problem?

To understand our problem, date back to 1940s...

  • Teletype (電信) was widely used for communication.

  • Morse code: dots ( ∙ ) and dashes ( − )

  • dot = 1 unit long, dash = 3 units long

  • 1 unit silence between marks

  • 3 units silence between letters, etc.

10111000111000000010101010001110111011100011101110001

They already had “digital communication”.


Machinery for information processing

machinery for information processing

No computers yet, but there were “machines”...

Teletype model 14-KTR, 1940

http://www.baudot.net/teletype/M14.htm

Enigma machine

http://enigma.wikispaces.com/

  • They could do something complicated.

  • The transmission/recording of messages were...

    • inefficient...messages should be as short as possible

    • unreliable...messages are often disturbed by noises

      The efficiency and the reliability were two major problems.


The model of communication

the model of communication

A communication system can be modeled as;

C.E. Shannon, A Mathematical Theory of Communication,

The Bell System Technical Journal, 27, pp. 379–423, 623–656, 1948.

channel,

storage medium,

etc...

encoder,

modulator,

codec, etc...


What is the efficiency

what is the “efficiency”?

A communication is efficient if the size of B is small.

  • subject to A = D, or A ≈ D

  • with, or without noise (B ≠ C, or B = C)

A

B

C

D


Problem one efficiency

problem one: efficiency

Example: You need to record the weather of Tokyo everyday.

  • weather = {sunny, cloudy, rainy}

  • You can use “0” and “1”, but you cannot use blank spaces.

  • 2-bit record everyday

  • 200 bits for 100 days

weather

sunny

cloudy

rainy

codeword

00

01

10

0100011000

Can we shorten the representation?


Better code

better code?

The code B gives shorter representation than the code A.

  • Can we decode the code B correctly?

    • Yes, as far as the sequence is processed from the beginning.

  • Is there a code which is more compact than this code B?

    • No, and Yes(→ next slide).

weather

sunny

cloudy

rainy

code A

00

01

10

code B

00

01

1

code A...0100011000

code B...010001100


Think average

think average

Sometimes, events are not equally likely...

weather

sunny

cloudy

rainy

probability

0.5

0.3

0.2

code A

00

01

10

code B

00

01

1

code C

1

01

00

  • with the code A, 2.0 bit / event(always)

  • with the code B,

    20.5 + 20.3 + 10.2 = 1.8 bit / event in average

  • with the code C,

    10.5 + 20.3 + 20.2 = 1.5 bit/ event in average


The best code

the best code?

Can we represent information with 0.00000000001 bit per event?

...No, maybe.

  • It is likely that there is a “limit” which we cannot get over.

  • Shannon investigated the limit mathematically.

    → For this event set, we need 1.485 or more bit per event.

weather

sunny

cloudy

rainy

probability

0.5

0.3

0.2

This is the amount of information

which must be carried by the code.


Class plan in april

class plan in April

  • chapter 0: the summary and the schedule of this course

  • chapter 1: measuring information

    • We establish a mathematical means to measure

      information in a quantitative manner.

  • chapter 2: compact representation of information

    • We learn several coding techniques which give compact representation of information.

  • chapter 3: coding for noisy communication

  • chapter 4: cryptography


What is the reliability

what is the “reliability”?

A communication is reliable if A = Dor A ≈ D.

  • the existence of noise is essential (B ≠ C)

  • How small can we make the size of B?

A

B

C

D


Problem two reliability

problem two: reliability

Communication is not always reliable.

  • transmitted information ≠ received information

ABCADC

ABCABC

  • Errors of this kind are unavoidable in real communication.

  • In the usual conversation, we sometimes use phonetic codes.

  • ABC

    Alpha, Bravo, Charlie

    あさひの「あ」

    いろはの「い」

    Alpha, Bravo, Charlie

    ABC


    Phonetic code

    phonetic code

    • A phonetic code adds redundant information.

    • The redundant part helps correcting possible errors.

      →use this mechanism over 0-1 data, and we can correct errors!

    Alpha

    the real

    information

    redundant (冗長な) information

    for correcting possible errors


    Redundancy

    redundancy

    Q. Can we add “redundancy” to binary data?

    A. Yes, use parity bits.

    A parity bit is...

    a binary digit which is to make the number of 1’s in data even.

    • 00101 → 001010 (two 1’s → two 1’s)

    • 11010 → 110101 (three 1’s → four1’s)

      One parity bit may tell you that there are odd numbers of errors,

      but not more than that.


    To correct error s

    to correct error(s)

    basic idea: use several parity bits to correct errors

    Example: Add five parity bits to four-bits data (a0,a1, a2, a3).

    codeword =

    (a0,a1, a2, a3, p0,p1, q0,q1, r)

    a0

    a1

    p0

    a2

    a3

    p1

    This code corrects one-bit error,

    but it is too straightforward.

    q0

    q1

    r


    Class plan in may

    class plan in May

    • chapter 0: the summary and the schedule of this course

    • chapter 1: measuring information

    • chapter 2: compact representation of information

    • chapter 3: coding for noisy communication

      • We study practical coding techniques for finding and correcting errors.

    • chapter 4: cryptography

      • We review techniques for protecting information from intensive attacks.


    Schedule

    schedule

    • April

    (Mon)

    Tue

    10

    17

    24

    Thu

    12

    19

    26

    • report (quiz):

    • will be assigned by

    • the end of April

    • May

    ×

    01

    08

    15

    22

    29

    03

    10

    17

    24

    31

    ×

    ×

    • June

    04

    05

    • test:

    • questions given in English/Japanese

    statistics in 2011: A ... 51 / B ... 20 / C ... 18 / did not pass ... 13


    Chapter 1 measuring information

    chapter 1:measuring information


    Motivation

    motivation

    “To tell plenty of things, we need more words.”

    ...maybe true, but can you give the proof of this statement?

    We will need to...

    • measure informationquantitatively (定量的に測る)

    • observe the relation between the amount of information

      and its representation.

      Chapter 1 focuses on the first step above.


    The uncertainty

    the uncertainty (不確実さ)

    Information tells what has happened at the information source.

    • Before you receive information, there is much uncertainty.

    • Afteryou receive information, the uncertainty becomes small.

      the difference of uncertainty  the amount of information

      FIRST, we need to measure the uncertainty of information source.

    this difference indicates

    the amount of information

    much

    uncertainty

    small

    uncertainty

    Before

    After


    The definition of uncertainty

    the definition of uncertainty

    The uncertainty is defined according to the statistics (統計量),

    BUT,

    we do not have enough time today....

    In the rest of today’s talk,

    we study two typical information sources.

    • memoryless & stationary information source

    • Markov information source


    Assumption

    assumption

    In this class, we assume that...

    • an information source produces one symbol per unit time

      (discrete-time information source)

    • the set of possible symbols is finite and countable (有限可算)

      (digital information source)

      Note however that, in the real world,

      there are continuous-time and/or analogueinformation sources.

      • cf. sampling & quantization


    Preliminary

    Preliminary (準備)

    • Assume a discrete-time digital information source S:

      • M = {a1, ..., ak}... the set of symbols of S

        (S is said to be a k-ary information source.)

      • Xt...the symbol which S produces at time t

      • The sequence X1, ..., Xn is called a message produced by S.

        Example: S = fair dice

    if the message is

    , then


    Memoryless stationary information source

    memoryless & stationary information source

    A memoryless & stationary information source satisfies...

    • memorylesscondition:

      “A symbol is chosen independently from past symbols.”

    • stationary condition: for any t

      “The probability distribution is time invariant.”

    trial 1

    trial 2

    trial 3

    :

    123456...

    ajcgea...

    gajkfh...

    wasdas...

    :

    • memoryless = 無記憶

    • stationary = 定常

    the same probability distribution


    Memoryless stationary information source1

    memoryless & stationary information source

    Examples of memoryless & stationary information source:

    • the “dice” example, coin toss, ...

      information sources with memory:

    • English text:

    • wireless communication...burst noise

      not-stationary information sources:

    • weather...P(snow) is large in winter

    • and more?


    Markov information source

    Markov information source

    Markov information source

    • a simple model of information source with memory

    • The choice of the next symbol depends on

      at most m previous symbols

      (m-th order Markov source)

    Andrey Markov

    1856-1922

    m = 0  memoryless source

    m = 1  simple Markov source


    Example of simple markov source

    Xt

    S

    R

    1-bit register

    Example of (simple) Markov source

    S ... memoryless & stationary source with P(0) = q, P(1) = 1 – q

    • if Xt–1 = 0, then R = 0:

      • S = 0  Xt = 0 ...PXt|Xt–1(0 | 0) = q

      • S = 1  Xt = 1... PXt|Xt–1(1 | 0) = 1 – q

    • if Xt–1 = 1, then R = 1:

      • S = 0  Xt = 1... PXt|Xt–1(1 | 1) = q

      • S = 1  Xt = 0... PXt|Xt–1(0 | 1) = 1 – q


    Markov source as a finite state machine

    Xt

    S

    R

    1-bit register

    1 / 1–q

    0

    1

    0 / q

    1 / q

    0 / 1–q

    Markov source as a finite state machine

    m-th order k-ary Markov source:

    • The next symbols depends on previous m symbols.

    • The model is having one of km internal states.

    • The state changes when a new symbol is generated.

       finite state machine

    generated

    symbol

    probability


    Two important properties

    A

    B

    C

    two important properties

    irreducible (既約) Markov source:

    • We can move to any state from any state.

    this example is NOT irreducible

    • aperiodic (非周期的) Markov source:

    • We have no periodical behavior (strict discussion needed...).

    this example is NOT aperiodic

    A

    B

    irreducible + aperiodic = regular


    Example of the regular markov source

    example of the regular Markov source

    1/0.1

    0/0.9

    A

    B

    0/0.8

    1/0.2

    start from the state 0

    start from the state 1

    ...

    ...

    ...

    ...

    ...

    ...

    time

    P (state=A)

    P (state=B)

    time

    P (state=A)

    P (state=B)

    1

    1.0

    0.0

    1

    0.0

    1.0

    2

    0.8

    0.2

    2

    0.9

    0.1

    3

    0.89

    0.11

    3

    0.88

    0.12

    4

    0.889

    0.111

    4

    0.888

    0.112

    converge (収束する) to

    thesame probabilities

    stationary probabilities


    Computation of the stationary probabilities

    1/0.1

    0/0.9

    A

    B

    0/0.8

    1/0.2

    computation of the stationary probabilities

    • t : P(state = A) at time t

    • t : P(state = B) at time t

      t+1 = 0.9t + 0.8t

      t+1= 0.1t + 0.2t

      t+1+ t+1= 1

    • If t and tconverge to and , respectively, then

    • we can putt+1=t=and t+1=t=.

    •  = 0.9 + 0.8

    • = 0.1 + 0.2

    • += 1

    =8/9, =1/9


    Markov source as a stationary source

    1/0.1

    0/0.9

    A

    B

    0/0.8

    1/0.2

    Markov source as a stationary source

    After enough time has elapsed...

    a regular Markov source can be regarded as a stationary source

    =8/9, =1/9

    0 will be produced with probabilityP(0) = 0.9  + 0.8  = 0.889

    1 will be produced with probabilityP(1) = 0.1 + 0.2 = 0.111


    Summary of today s class

    summary of today’s class

    • overview of this course

      • motivation

      • four chapters

    • typical information sources

      • memoryless & stationary source

      • Markov source


    Exercise

    A

    1/0.6

    0/0.4

    0/0.5

    1/0.2

    B

    C

    0/0.8

    1/0.5

    exercise

    • Determine the stationary probabilities.

    • Compute the probability that 010 is produced.

    This is to check your understanding.

    This is not a report assignment.


  • Login