1 / 44

Maximum Entropy in Protein Sequence Modeling

This lecture discusses the challenges of detecting and aligning protein sequences for distant homology, and introduces the Maximum Entropy approach for modeling protein sequence probability distributions. It explores how coevolution analysis can extract information from multiple sequence alignments and infer residue-residue contacts. The lecture also covers concepts in probability theory and information theory, and their application in protein sequence analysis.

jmcdougal
Download Presentation

Maximum Entropy in Protein Sequence Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 3: Maximum Entropy approach for modeling protein sequence probability distributions

  2. The general problem • Distant (remote) homology poses challenges: • Different length • Abundant changes at each position • How to detect homology? • How to align sequences?

  3. Learning from variation If the problem can be solved for a set of sequences “representative” of the family, then we can leverage this knowledge to assess whether or not a given sequence “looks like” this group. P(x1, x2,…, xL) ?

  4. What are these lectures about? • We have discussed the nuts and bolts of Hidden Markov Models (HMMs), showing how these models are initialized from a database of sequences how they can generate multiple sequence alignments (MSA) • We will show how to extract information from the multiple sequence alignments generated with the HMMs. We will thus introduce a promising, increasingly used approach: Maximum Entropy modeling for studying coevolution

  5. Analyzing multiple sequence alignments: maximum entropy approach Can the multiple sequence alignment be used to extract information about the 3D structure of proteins?

  6. Contacting residues co-evolve A mutation is accompanied by a compensatory one: can we exploit this correlations to infer residue-residue contacts from multiple sequence alignments? The problem is that correlations have transitive character, therefore all the amino acids are seemingly connected! To solve this problem we need to model the probability distribution (get a formula) and disentangle direct statistical couplings from indirect ones.

  7. A Mathematical Theory of Information (1948) Claude Shannon

  8. Here is the plan for MaxEnt • We will introduce two important concepts in Information Theory (Surprisal and Entropy) • We will review the main idea about constrained optimization with Lagrange multipliers • Finally, we will use these ingredients to generate a probabilistic model for protein sequences and show that this model highlights crucial information about structure and structural dynamics.

  9. MaxEnt modeling and coevolution analysis • De Juan, D., Pazos, F. and Valencia, A., 2013. Emerging methods in protein co-evolution. Nature reviews. Genetics, 14(4), p.249. • Pressé, Steve, Kingshuk Ghosh, Julian Lee, and Ken A. Dill. Principles of maximum entropy and maximum caliber in statistical physics. Reviews of Modern Physics 85, no. 3 (2013): 1115. • Cover, Thomas M., and Joy A. Thomas. Elements of information theory. John Wiley & Sons, 2012.

  10. Now a mathematical interlude…

  11. Let’s brush up on probability… • A Probability is a number assigned to each subset (they are called events) of a sample space satisfying the followingrules: • For any event A, 0  P(A)  1. • P() =1. • If A1, A2, … An is a partition of A, then • P(A) = P(A1)+ P(A2)+...+ P(An) • (A1, A2, … An is called a partition of A if A1A2 …An = A and A1, A2, … An are mutually exclusive.) “Probability theory is nothing but common sense reduced to calculations” Laplace (1819)  A B Events A and B occur with joint probability P(A∩B) A∩B

  12. Some simple concepts to keep in mind Addition rule for mutually exclusive events: P (Aor B) = P (A) + P (B) (mutually exclusive events – the occurrence of one event prevents the occurrence of the other).

  13. Some simple concepts to keep in mind  Generalized Addition rule: P (AB) = P (A) + P (B) –P (A ∩ B) A B (we dropped the request that event are mutually exclusive) P(A∩B) A∩B

  14. Some simple concepts to keep in mind The “product” of two events: in the process of measurement, we observe both events. Multiplication rule for independent events: P (Aand B) = P (A) x P (B) Independent events: the outcome of one event is not affected by the outcome of the other

  15. Conditional Probability We are restricting the sample space to B (think of P(B) as a normalization factor), we say this in words: What is the probability that A occur given that B occurred?  A B A∩B

  16. Some simple concepts to keep in mind Generalized Multiplication rule: P (A∩B) = P (A|B) x P (B) = P (B|A) x P (A)

  17. Collections of probabilities are described by distribution functions

  18. Expected value of a distribution • Expected value is just the average or mean (µ) of random variable x. • It’s also how we expect X to behave on-average over the long run (“frequentist” view again). • It’s sometimes called a “weighted average” because more frequent values of X are weighted more highly in the average.

  19. Expected value, formally Discrete case: Continuous case:

  20. Done with the short review on probability…Let’s get back to our problem (information)…

  21. A primer in Information Theory • What is Information? • Information is transferred from an originating entity to a receiving entity (via a message). • Note: if the receiving entity knows already the content of a message with certainty, the amount of information is zero. • Flip of a coin: how much information do we receive when we are told that the outcome is head? • If we already knew the answer, i.e., P(head) = 1, the amount of information is zero! • If it’s a fair coin, i.e., P(head) = P (tail) = 0.5, we say that the amount of information is 1 bit. • If the coin is not fair, e.g., P(head) = 0.9, the amount of information is more than zero but less than one bit! • Intuitively, the amount of information received is the same if P(head) = 0.9 or P (head) = 0.1.

  22. Self-information or Surprisal I(p) increases as the probability decreases (and viceversa) I(p) ≥ 0 – information is non-negative I(1) = 0 – events that always occur do not communicate information I(p(1&2)) = I(p1) + I(p2) – information due to independent events is additive

  23. Self-information or Surprisal Let’s analyze better n.4: Since we know that for independent events: N.4 implies that: This suggests a unique functional form!!

  24. Self-information or Surprisal Anti-monotonic in p non-negative Null if the event is certain Additive for independent events

  25. Shannon entropy The average amount of information that we receive per event:

  26. Shannon entropy Entropy as a function of the probability of getting “head” in the coin flip experiment: Entropy is maximum when my “prior knowledge” is minimum

  27. MaxEnt modeling Maximum Entropy • Why maximum entropy? • Maximize entropy = Minimize bias • Model all that is known and assume nothing about what is unknown. • Model all that is known: satisfy a set of constraints that must hold • Assume nothing about what is unknown: choose the most “uniform” distribution  choose the one with maximum entropy

  28. MaxEnt modeling … the fact that a certain prob distribution maximizes entropy subject to certain constraints representing our incomplete information, is fundamental property which justifies use of that distribution for inference; it agrees with everything that is known, but carefully avoids assuming anything that is not known. It is a transcription into mathematics of an ancient principle of wisdom … (Jaynes, 1990) [from: A Maximum Entropy Approach to NLP by A.L.Berger, S.A.DellaPietra and V.J.DellaPietra, In Computational Linguistics, Vol. 22, Number 1, 1996]

  29. How do we model what we know? Empirical (observed) probability of x: Model (theoretical) probability of x: Function of x, the expected value of which is known: Observed expectation (empirical counts): Model expectation (theoretical prediction): We request the model to reproduce the observed statistics: i.e., we impose a constraint

  30. Constrained optimization Free maximum Constrained maximum constraint

  31. Lagrange multipliers

  32. Using Lagrange multipliers for MaxEnt Maximize L(p): The probability distribution turns out to be a Boltzmann distribution!

  33. MaxEnt for protein sequences Assume a model that is as random as possible, but that agrees with some average calculated on the data: In our case univariate and bivariate marginals are constrained to reproduce empirical frequency counts for single MSA columns and column pairs: With the constraints: The model distribution then becomes: and A unique set of Lagrange multipliers bn will then satisfy all the constraints Weigt et al PNAS 2008

  34. Contacting residues are statistically coupled Amino acids interact in pairs (in a statistical sense)!

  35. Protein evolution analogous to disordered spin systems?

  36. From couplings to structure

  37. Structure prediction, at last!

  38. Webservers for structural predictions based on evolutionary coupling analysis Ab initio Structure prediction

  39. How about function?

  40. Beyond structure prediction

  41. Do couplings show an interesting community structure?

  42. Predicting dynamics GranataD, Ponzoni L, Micheletti C, Carnevale V, bioRxiv 109397; doi: https://doi.org/10.1101/109397.

  43. Webserver for evolutionary domains

  44. Take home message: • Maximum entropy approach (with constraints of joint frequencies) provides a model that is extremely useful for: • Inferring tertiary and quaternary contacts in proteins and protein complexes; this approach is becoming the standard in structure prediction • Beyond structure: protein dynamics?

More Related