1 / 26

An intuitive introduction to information theory

An intuitive introduction to information theory. Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre Gatersleben-Halle. Outline. Why information theory? An intuitive introduction. History of biology. St. Thomas Monastry, Brno. Genetics.

rolf
Download Presentation

An intuitive introduction to information theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. An intuitive introduction to information theory Ivo Grosse Leibniz Institute of Plant Genetics and Crop Plant Research Gatersleben Bioinformatics Centre Gatersleben-Halle

  2. Outline • Why information theory? • An intuitive introduction

  3. History of biology St. Thomas Monastry, Brno

  4. Genetics Gregor Mendel 1822 – 1884 1866 Mendel‘s laws Foundation of Genetics Ca. 1900: Biology becomes a quantitative science

  5. 50 years later … 1953 James Watson & Francis Crick

  6. 50 years later … 1953

  7. DNA Watson & Crick 1953 Double helix structure of DNA 1953: Biology becomes a molecular science

  8. 1953 – 2003 … 50 years of revolutionary discoveries

  9. 1989

  10. 1989 Goals: • Identify all of the ca. 30.000 genes • Identify all of the ca. 3.000.000.000 base pairs • Store all information in databases • Develop new software for data analysis

  11. 2003 Human Genome Project officially finished 2003: Biology becomes an information science

  12. 2003 – 2053 … biology = information science

  13. 2003 – 2053 … biology = information science Systems Biology

  14. What is information? • Many intuitive definitions • Most of them wrong • One clean definition since 1948 • Requires 3 steps • Entropy • Conditional entropy • Mutual information

  15. Before starting with entropy … Who is the father of information theory? Who is this? Claude Shannon 1916 – 2001 A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379–423 & 623–656, 1948

  16. Before starting with entropy … Who is the grandfather of information theory? Simon bar Kochba Ca. 100 – 135 Jewish guerilla fighter against Roman Empire (132 – 135)

  17. Entropy • Given a text composed from an alphabet of 32 letters (each letter equally probable) • Person A chooses a letter X (randomly) • Person B wants to know this letter • B may ask only binary questions • Question: how many binary questions must B ask in order to learn which letter X was chosen by A • Answer: entropyH(X) • Here: H(X) = 5 bit

  18. Conditional entropy (1) • The sky is blu_ • How many binary questions? • 5? • No! • Why? • What’s wrong? • The context tells us “something” about the missing letter X

  19. Conditional entropy (2) • Given a text composed from an alphabet of 32 letters (each letter equally probable) • Person A chooses a letter X (randomly) • Person B wants to know this letter • B may ask only binary questions • A may tell B the letter Y preceding X • E.g. • L_ • Q_ • Question: how many binary questions must B ask in order to learn which letter X was chosen by A • Answer: conditional entropyH(X|Y)

  20. Conditional entropy (3) • H(X|Y) <= H(X) • Clear! • In worst case – namely if B ignores all “information” in Y about X – B needs H(X) binary questions • Under no circumstances should B need more than H(X) binary questions • Knowledge of Y cannot increase the number of binary questions • Knowledge can never harm! (mathematical statement, perhaps not true in real life )

  21. Mutual information (1) • Compare two situations: • I: learn X without knowing Y • II: learn X with knowing Y • How many binary questions in case of I?  H(X) • How many binary questions in case of II?  H(X|Y) • Question: How many binary questions could B save in case of II? • Question: How many binary questions could B save by knowing Y? • Answer: I(X;Y) = H(X) – H(X|Y) • I(X;Y) = information in Y about X

  22. Mutual information (2) • H(X|Y) <= H(X)  I(X;Y) >= 0 • In worst case – namely if B ignores all information in Y about X or if there is no information in Y about X – then I(X;Y) = 0 • Information in Y about X can never be negative • Knowledge can never harm! (mathematical statement, perhaps not true in real life )

  23. Mutual information (3) • Example 1: random sequence composed of A, C, G, T (equally probable) • I(X;Y) = ? • H(X) = 2 bit • H(X|Y) = 2 bit • I(X;Y) = H(Y) – H(X|Y) = 0 bit • Example 2: deterministic sequence … ACGT ACGT ACGT ACGT … • I(X;Y) = ? • H(X) = 2 bit • H(X|Y) = 0 bit • I(X;Y) = H(Y) – H(X|Y) = 2 bit

  24. Mutual information (4) • I(X;Y) = I(Y;X) • Always! For any X and any Y! • Information in Y about X = information in X about Y • Examples: • How much information is there in the amino acid sequence about the secondary structure? How much information is there in the secondary structure about the amino acid sequence? • How much information is there in the expression profile about the function of the gene? How much information is there in the function of the gene about the expression profile? • Mutual information

  25. Summary • Entropy • Conditional entropy • Mutual information • There is no such thing as information content • Information not defined for a single variable • 2 random variables needed to talk about information • Information in Y about X • I(X;Y) = I(Y;X)  info in Y about X = info in X about Y • I(X;Y) >= 0  information never negative  knowledge cannot harm • I(X;Y) = 0 if and only if X and Y statistically independent • I(X;Y) > 0 otherwise

More Related