1 / 23

Molecular Information Theory

Molecular Information Theory. Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky. Overview. Why do we study Molecular Info. Theory? What are molecular machines? Power of Logarithm Components of a Communication System Discrete Noiseless System Channel Capacity

karl
Download Presentation

Molecular Information Theory

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Molecular Information Theory Niru Chennagiri Probability and Statistics Fall 2004 Dr. Michael Partensky

  2. Overview • Why do we study Molecular Info. Theory? • What are molecular machines? • Power of Logarithm • Components of a Communication System • Discrete Noiseless System • Channel Capacity • Molecular Machine Capacity

  3. Motivation • Needle in a haystack situation. • How will you go about looking for the needle? • How much energy you need to spend? • How fast can you find the needle? • Haystack = DNA, Needle = Binding site, You = Ribosome

  4. What is a Molecular Machine? • One or more molecules or a molecular complex: not a macroscopic reaction. • Performs a specific function. • Energized before the reaction. • Dissipates energy during reaction. • Gains information. • An isothermal engine.

  5. Where is the candy? • Is it in the left four boxes? • Is it in the bottom four boxes? • Is it in the front four boxes? You need answer to three questions to find the candy Box labels: 000, 001, 010, 011, 100, 101, 110, 111 Need log8 = 3 bits of information

  6. More candies… • Box labels: 00, 01, 10, 11, 00, 01, 10, 11 • Candy in both boxes labeled 01. • Need only log8 - log2 = 2 bits of information. In general, m boxes with n candies need log m - log n bits of information

  7. Ribosomes 2600 binding sites from 4.7 million base pairs Need log(4.7 million) - log(2600) = 10.8 bits of information.

  8. Communication System

  9. Information Source • Represented by a stochastic process • Mathematically a Markov chain • We are interested in ergodic sources: Every sequence is statistically same as every other sequence.

  10. How much information is produced? Measure of uncertainty H should be: • Continuous in the probability. • Monotonic increasing function of the number of events. • When a choice is broken down into two successive choices, Total H = weighted sum of individual H

  11. Enter Entropy

  12. Properties of Entropy • H is zero iff all but one p are zero. • H is never negative. • H is maximum when all the events are equally probable • If x and y are two events H(x,y)£ H(x) + H(y) • Conditional entropy: Hx(y)£ H(y)

  13. Why is entropy important? • Entropy is a measure of uncertainty. • Entropy relation from thermodynamics • Also from thermodynamics • For every bit of information gained, the machine dissipates kBTln2 joules.

  14. Ribosome binding sites

  15. Information in sequence

  16. Information curve Information gain for site l is Plot of this across the sites gives Information curve. For E.Coli, Total information is about 11 bits. … same as what the ribosome needs.

  17. Sequence Logo

  18. Channel capacity Source transmitting 0 and 1 at 1000 symbols/sec. 1 in 100 symbols have an error. What is the rate of transmission? Need to apply a correction correction = uncertainty in x for a given value of y Same as conditional entropy = 81 bits/sec

  19. Channel capacity contd. For a continuous source with white noise, Signal to noise ratio Bandwidth Shannon’s theorem: As long as the rate of transmission is below C, the number of errors can me made as small as needed.

  20. Molecular Machine Capacity • Lock and key mechanism. • Each pin on the ribosome is a simple harmonic oscillator in thermal bath. • Velocity of the pins represented by points in 2-d velocity space • More pins -> more dimensions. • Distribution of points is spherical.

  21. Machine capacity For larger dimensions: All points are in a thin spherical shell Radius of the shell is the velocity and hence square root of the energy Before binding: After Binding:

  22. Number of choices = Number of ‘after’ spheres that can sit in the ‘before’ sphere =Vol. of Before sphere/Vol. Of after sphere Machine capacity = logarithm of number of choices

  23. References

More Related