Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd. Novosibirsk, Russia

Analysis of ribo-seq datafor prediction translation efficiency and protein quantity from transcriptomics data Biosoft.Ru Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd.Novosibirsk, Russia July 6th - 11th 2013, St. Petersburg, RUSSIA

Bukharov Aleksandr, Kiselev Ilya Genome-scale model for prediction of synthesis rates of mRNAs and proteins • Initial data: • Schwanhäusser B, Busse D, Li N, Dittmar G, Schuchhardt J, Wolf J, Chen W, Selbach M. Global quantification ofmammalian gene expression control. Nature, 2011,473(7347):337-342. • mouse fibroblasts, parallel metabolic pulse labelling • simultaneously measured absolute mRNA and protein abundance and turnover for 5000+ genes • first genome-scale quantitative model for prediction of synthesis rates of mRNAs and proteins

Experiment design

Schwanhäusser B., et al., 20011 - Fig. 6: Comparison of synthesis rates of mRNA and proteins assuming the measured levels reflect averages over one cell cycle or steady-state values. For the synthesis rates of mRNA (light gray), the deviation between the two approaches is small, because mRNA half lives are mostly smaller than the cell cycle time. For protein synthesis (dark gray), the differences are substantial; they can differ for more than one order of magnitude.

P – protein, exp – population mean, ss – steady state; Schwanhäusser B., et al., 20011 - Fig. 6: Comparison of synthesis rates of mRNA and proteins assuming the measured levels reflect averages over one cell cycle or steady-state values. For the synthesis rates of mRNA (light gray), the deviation between the two approaches is small, because mRNA half lives are mostly smaller than the cell cycle time. For protein synthesis (dark gray), the differences are substantial; they can differ for more than one order of magnitude.

“They do not take into account that gene expression in mammalian cells is non-continuous. In addition, the non-uniform age distribution of cells in culture as described in 19, 23 is neglected, since this effect is expected to be small compared to the deviation obtained by neglecting the cell cycle.“ Schwanhäusser B., et al., 20011, supplementary materials

Agent based model 4247 blocks for protein synthesis each cell is an agent

Numerical experiment. The initial size of population is 200 cells which divide within 108 hours. Average quantity of protein molecules were calculated. This experiment was repeated for 4247 proteins.

Correlation of experiment and numerical modeling is equal to R=0.99 Absolute values also were coordinated (so for 81,6% of proteins absolute values differ by less than 7% Main deviations from experimental values are observed for proteins with extremely low copy numbers, where experimental error can be significant.

Ignolia N. et al., 2011

- The rate of translation is remarkably consistent between different classes of messages (Figures 3D and 3E). • The kinetics of elongation are independent of length and protein abundance and are the same in secreted proteins, whose translation occurs on the ER surface. • Translation speed is also independent of codon usage, which is consistent with the absence of pauses at rare codons. • Although this may be the case for specific examples, they find no evidence for a large effect on the overall rate of elongation. • An important practical implication for the universality of the average rate of elongation is that ribosome footprint density provides a reliable measure of protein synthesis independent of the particular gene being translated. Ignolia N. et al., 2011

“Our data are consistent with recent work that indirectly infers translation levels from absolute mRNA and protein abundance measurements (Schwanhausser et al., 2011). Notably, they found that translation was the single largest contributor to protein abundance, highlighting the value of direct measurements of protein synthesis.” Ignolia N. et al., 2011 R = 0.49

Schwanhausser et al., 2011 R = -0.41 Ignolia N. et al., 2011 R = -0.17

Current works • Database on ribo-seq data • Analyses of lncRNA • Models of biological pathways involved in translation regulation(for example, mTOR) • More predictors for translation efficiency • protein binding sites • miRNA binding sites • …

GTRD - Gene Transcription Regulation Database Initial row data,collected from literature, GEO, SRA and ENCODE databases weresystematically collected and uniformly processed using speciallydeveloped workflow (pipeline) for BioUML platform:- sequenced reads were aligned to reference genome using Bowtie;- peaks were identified using MACS and SISSR algorithms- further refinement of obtained peaks- position weight matrices (PWM) were constructed by different methods(ChIPMunk, our own methods)- ROC curves were calculated to estimate and compare built PWM- site models (PWMs + thresholds) were constucted for recognition TFbinding sites.TFClass database is used as a core for information about transcriptionfactors, their classification and cross-linking with Ensembl.BioUML platformprovides web interface for access to GTRD database:search information, browsing, different data views. Built-in genomebrowser provides powerful visualisation of ChIP-seq data.

Prediction of gene expression level by ChIP-seq data ChIP-seq peaks (MACS) for histones and transcriptio factor binding sites were extracted from GTRD database for 2 cell lines: GM12878 and K562. Machine learning - Random Forest algorithm. R - 0.72 – 0.77

Ribo-seq experiments Olga Gluschenko, Ivan Yevshin

Workflow for ribo-seq data analyses

Model of mTOR pathway on the base of model Richard J. Dimelow R.J. and Wilkinson S.J. Control of translation initiation: a model-based analysis from limited experimental data. J. R. Soc. Interface(2009)6, 51–61 doi:10.1098/rsif.2008.0221

Lequieu J, Chakrabarti A, Nayak S, Varner JD (2011) Computational Modeling and Analysis of Insulin Induced Eukaryotic Translation Initiation. PLoSComput Biol 7(11): e1002263.doi:10.1371/journal.pcbi.1002263

Wanted experiment For the same cell line and conditions: • CAGE -> transcription start sites • RNA-seq • polyA +/-; nucleus, cytoplasm, whole cell • ribo-seq • harringtonine -> translation start site • cycloheximide -> translation efficiency • protein MS • pulse labelled with heavy amino acids (SILAC, left) -> protein abundance and turnover.

Current works • Database on ribo-seq data • Analyses of lncRNA • Models of biological pathways involved in translation regulation(for example, mTOR) • More predictors for translation efficiency • protein binding sites • miRNA binding sites • …

Acknowledgements Ivan Yevshin Olga Gluschenko Eseniya Basmanova Ruslan Sharipov

Fedor Kolpakov Biosoft.Ru, Ltd. Institute of Systems Biology, Ltd. Novosibirsk, Russia