1 / 20

Whole-genome biophysics, mutations, evolution, chromatin, …

Whole-genome biophysics, mutations, evolution, chromatin, …. Konstantin Zeldovich x62354, LRB 1004. In the previous lecture:. Protein structures and sequences are largely determined by the physical chemistry Ab initio paradigm: sequence + physics = structure (+function, hopefully). Today:.

lexine
Download Presentation

Whole-genome biophysics, mutations, evolution, chromatin, …

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Whole-genome biophysics, mutations, evolution, chromatin, … Konstantin Zeldovich x62354, LRB 1004

  2. In the previous lecture: Protein structures and sequences are largely determined by the physical chemistry Ab initio paradigm: sequence + physics = structure (+function, hopefully) Today: • Are physical and chemical constraints discernible at the whole proteome / whole genome level ? • Constraints on amino acid usage in prokaryotes • Thermostability • Metabolic cost of protein synthesis • Mutational robustness of proteins • Evolution of protein stability • -The genetic code is nonrandom • -Large-scale structure of chromatin, 3C-like methods (J. Dekker lab).

  3. Temperature ranges of modern life Psychro-, meso-, thermo-, hyperthermophilic bacteria/archaea -10°C (Antarctic ice, permafrost in Siberia and Canada)Colwelliaspp, Psychrobacterspp +110°C (deep sea hydrothermal vents, hot springs)Pyrococcusspp, Methanococcusspp >250 sequenced genomes Simplest eukaryotes: up to ~60°C (nematode from hot springs)` Cold-blooded animals: Notothenia spp. Antarctic fish: -1.8°C habitat, dies of overheating at +6°C = 40°FDesert iguana: up to +60°C Very few complete genomes!

  4. Is habitat temperature reflected in the genomes? Existing knowledge • What is presumably related to thermostability? • G+C in DNA increases with temperature (wrong) • DNA stabilization by pairing • Fraction of charges (DEKR) in proteins increases • Hydrophobic interactions weaken with temperature • Fraction of polar residues decreases • ? Limitations of the previous work:based on a few (dozen) individual proteins, or a limited number (~20) of completely sequenced genomes Here: high-thoroughput analysis, 204 genomes Zeldovich, Berezovsky, Shakhnovich, PLOS CB 2007

  5. IVYWREL, orLIVEWYR 86 genomes Topt=937FIVYWREL-335 , R=0.93, rmsd Topt=8.9°C Zeldovich, Berezovsky, Shakhnovich, PLOS CB 2007

  6. Genomic DNA: any temperature, any GC content Base pairing is not the bottleneck of thermal adaptation. 204 genomes

  7. DNA adaptation via codon bias Fraction of A+G Autocorr. function of A,G Fractions of A, G nucleotides are changing with temperature Thermal adaptation of proteins and DNA are independent processes.

  8. Metabolic cost of protein synthesis Starting from the same basic precursors, some amino acids are easy to synthesize, some are hard, and require more energy Hypothesis: Energy (ATP) is the limiting factor in a.a. synthesis and thus survival. Thus, highly expressed proteins must be made of “cheap” amino acids A.a. cost can be deduced from pathway maps Protein expression can be either measured, or inferred from codon usage (codon adaptation index) Akashi and Gojobori, PNAS 99:3695 (2002)

  9. Highly expressed proteins are “cheaper” Akashi and Gojobori, PNAS 99:3695 (2002) MCU rationale: Synonymous codons are used with different frequencies (codon bias)For some reason (translation efficiency?), codon bias is correlated with expression MCU can be calibrated using a few genes with known expression levels Kanaya et al, Gene 238:143 (1999) Nowadays, direct measurements of expression are available (PROJECT!)

  10. Possible effects of mutations DNA -Exon, nonsynonymous -> see “protein” -Exon, synonymous -> normally neutral -Introns, regulatory sequences ->??? -Altered protein expression, localization, alternative splicing, … -RNA coding regions -> changes in RNA structure/function -Chromatin structure? Protein (non-synonymous) -Change of stability -Possible misfolding or aggregation (-> neurodegenerative diseases) -Altered interaction(s) with other protein(s) or small molecule(s) -Altered function Change of thermodynamic stability is among the easiest to comprehend.

  11. Mutational robustness of proteins ProThermdatabase http://gibk26.bse.kyutech.ac.jp/jouhou/protherm/protherm.html ~2000 mutations, thermal & chemical unfolding Average = 1 kcal/mol (destabilizing), variance = 3 (kcal/mol)2 Kumar et al, NAR 2006 Zeldovich et al, PNAS 2007

  12. G prediction servers and tools • FoldX • PoPMuSiC • MUPro • CUPSAT • Eris (they are all trained on highly overlapping datasets, including ProTherm) More servers listed at http://www.gen2phen.org/wiki/protein-level-predictions-4-stability-changes-prediction

  13. Can we translate this to the organism level? • Mutations a protein change its stability G, occur at cell replication -magnitudes of G can be measured or modeled • -Proteins must be stable for the function to exist and evolve • Essential proteins must be stable (G<0) in a viable organism •  essential proteins per genome (~300 in bacteria) • For simplicity: • No epistasis, all proteins equally essential • Locally flat, two-level fitness landscape (life or death) • Asexual replication Mutations shuffle stability back and forth (Protein) evolution is a diffusion process in the -dimensional space of stabilities of the cell’s essential proteins Zeldovich, Chen, ShakhnovichPNAS 2007

  14. ?? r ?? … back in 1930 R.A. Fisher 1930 2D example Diffusion in the space of “characters” Single fitness peak at origin Fitness w=w(r) n-dimensional hyperspheresof constant fitness Compensatory mutations andepistasis Soft selection Axes poorly quantified!!! Low fitness High Hartl, Taubes 1998Poon, Otto 2000

  15. “Characters” are protein stabilities, =2 G1=0 lethal phenotypes unstable proteins, G>0 G2=0 replication mutation viable phenotypes impossible genotypes, too stable proteins Replication of the viable organisms must compensate for death due to the flux across G=0 adsorbing boundary (… skipping the math – analytic solution exists – please ask if interested)

  16. Prediction: universal distribution of G of all proteins Line: theory; histogram: ProTherm database, ~200 proteins Zeldovich, Chen, ShakhnovichPNAS 2007

  17. Genetic code links genomes and proteomes Information-theoretical viewpoint: is this 64->20 mapping in any way optimal? Hypothesis: the genetic code minimizes the effect of DNA mutations on protein structure 1,000,000 realizations of the code (64->20) mean-square change of a.a. polarity upon point mutation Freeland & Hurst, J. Mol. Evol. 47:238 (1998)

  18. Large-scale structure of chromatin On a small scale, chromatin is tightly packed (nucleosomes, 10- and 30-nm fibers) Large-scale structure? Chromosome Conformation Capture (3C, 5C, HiC, …) uncrosslink ligate digest Formaldehyde crosslink fragments can be counted by qPCR or deep sequencing Result: “contact map” of the chromosome: which part is spatially close to which 3D structure can then be inferred, a la NMR structures of proteins (distance constraints) Dekker et al, Science 2002 Lieberman-Aiden et al, Science 2009

  19. DNA looping and long-range transcriptional control Murine beta-globin locus, 130kb Dekker, TiBS 28:277 (2003) Sequence determinants of the contacts?? (PROJECT!)

  20. Whole chromosome as a polymer? sbp Probability of contact? Theory: Lieberman-Aiden et al, Science 2009

More Related