1 / 41

Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School

Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School. Goal: A gentle introduction to the basic concepts in information theory Emphasis: understanding and interpretations of these concepts

frederique
Download Presentation

Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Theory Ying Nian Wu UCLA Department of Statistics July 9, 2007 IPAM Summer School

  2. Goal: A gentle introduction to the basic concepts in information theory Emphasis: understanding and interpretations of these concepts Reference:Elements of Information Theory by Cover and Thomas

  3. Topics • Entropy and relative entropy • Asymptotic equipartition property • Data compression • Large deviation • Kolmogorov complexity • Entropy rate of process

  4. Entropy Randomness or uncertainty of a probability distribution Example

  5. Entropy Definition

  6. Entropy Example

  7. Entropy Definition for both discrete and continuous Recall

  8. Entropy Example

  9. Interpretation 1: cardinality Uniform distribution There are elements in All these choices are equally likely Entropy can be interpreted as log of volume or size dimensional cube has vertices can also be interpreted as dimensionality What if the distribution is not uniform?

  10. Asymptotic equipartition property Any distribution is essentially a uniform distribution in long run repetition a constant Recall if then independently Random? But in some sense, it is essentially a constant!

  11. Law of large number independently Long run average converges to expectation

  12. Asymptotic equipartition property Intuitively, in the long run,

  13. Asymptotic equipartition property a constant Recall if then ,with Therefore, as if So the dimensionality per observation is We can make it more rigorous

  14. Weak law of large number independently for

  15. Typical set ,with Typical set

  16. Typical set ,with : The set of sequences for sufficiently large

  17. Interpretation 2: coin flipping Flip a fair coin  {Head, Tail} Flip a fair coin twice independently  {HH, HT, TH, TT} …… Flip a fair coin times independently  equally likely sequences We may interpret entropy as the number of flips

  18. Interpretation 2: coin flipping Example The above uniform distribution amounts to 2 coin flips

  19. Interpretation 2: coin flipping ,with flips amounts to amounts to flips

  20. Interpretation 2: coin flipping

  21. Interpretation 2: coin flipping

  22. Interpretation 3: coding Example

  23. Interpretation 3: coding ,with How many bits to code elements in ? bits Can be made more formal using typical set

  24. Prefix code 100101100010abacbd

  25. Optimal code 100101100010abacbd Sequence of coin flipping A completely random sequence Cannot be further compressed e.g., two words I, probability

  26. Optimal code Kraft inequality for prefix code Minimize Optimal length

  27. Wrong model Optimal code Wrong code Redundancy Box: All models are wrong, but some are useful

  28. Relative entropy Kullback-Leibler divergence

  29. Relative entropy Jensen inequality

  30. Types independently number of times normalized frequency

  31. Law of large number Refinement

  32. Large deviation Law of large number Refinement

  33. Kolmogorov complexity Example: a string 011011011011…011 Program: for (i =1 to n/3) write(011) end Can be translated to binary machine code Kolmogorov complexity = length of shortest machine code that reproduce the string no probability distribution involved If a long sequence is not compressible, then it has all the statistical properties of a sequence of coin flipping string = f(coin flippings)

  34. Joint and conditional entropy Joint distribution Marginal distribution e.g., eye color & hair color

  35. Joint and conditional entropy Conditional distribution Chain rule

  36. Joint and conditional entropy

  37. Chain rule

  38. Mutual information

  39. Entropy rate Stochastic process not independent Entropy rate: (compression) Stationary process: Markov chain: Stationary Markov chain:

  40. Shannon, 1948 1. Zero-order approximation XFOML RXKHRJFFJUJ ZLPWCFWKCYJ FFJEYVKCQSGHYD QPAAMKBZAACIBZLHJQD. 2. First-order approximation OCRO HLI RGWR NMIELWIS EU LL NBNESEBYA TH EEI ALHENHTTPA OOBTTVA NAH BRL. 3. Second-order approximation (digram structure as in English). ON IE ANTSOUTINYS ARE T INCTORE ST BE S DEAMY ACHIN D ILONASIVE TUCOOWE AT TEASONARE FUSO TIZIN ANDY TOBE SEACE CTISBE. 4. Third-order approximation (trigram structure as in English). IN NO IST LAT WHEY CRATICT FROURE BIRS GROCID PONDENOME OF DEMONSTURES OF THE REPTAGIN IS REGOACTIONA OF CRE. 5. First-order word approximation. REPRESENTING AND SPEEDILY IS AN GOOD APT OR COME CAN DIFFERENT NATURAL HERE HE THE A IN CAME THE TO OF TO EXPERT GRAY COME TO FURNISHES THE LINE MESSAGE HAD BE THESE. 6. Second-order word approximation. THE HEAD AND IN FRONTAL ATTACK ON AN ENGLISH WRITER THAT THE CHARACTER OF THIS POINT IS THEREFORE ANOTHER METHOD FOR THE LETTERS THAT THE TIME OF WHO EVER TOLD THE PROBLEM FOR AN UNEXPECTED.

  41. Summary Entropy of a distribution measures randomness or uncertainty log of the number of equally likely choices average number of coin flips average length of prefix code (Kolmogorov: shortest machine code  randomness) Relative entropy from one distribution to the other measure the departure from the first to the second coding redundancy large deviation Conditional entropy, mutual information, entropy rate

More Related