Introduction - PowerPoint PPT Presentation

slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
Introduction PowerPoint Presentation
Download Presentation
Introduction

play fullscreen
1 / 28
Introduction
84 Views
Download Presentation
mili
Download Presentation

Introduction

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. Previous Work Backup Method Reference Background Introduction Acknowledgments Investigating mRNA’s of intrinsically disordered proteins Harini Gopalakrishnan Advisor: Dr. Predrag Radivojac

  2. Previous Work Backup Method Reference Background Introduction Acknowledgments • Basic Facts –mRNA • 1. mRNA-Messenger Ribonucleic Acid • 2. Nucleic Acid polymer consisting of nucleotide • monomers adenine, guanine, cytosine and uracil • 3. Three important types • rRNA (ribosomal RNA) • tRNA (transfer RNA) • mRNA (messenger RNA)

  3. Previous Work Backup Method Reference Background Introduction Acknowledgments Basic Facts –mRNA (contd) Encodes and carries information from DNA to protein synthesis http://en.wikipedia.org/wiki/Image:Mature_mRNA.png

  4. Previous Work Backup Method Reference Background Introduction Acknowledgments • Basic Facts-mRNA (contd) • What is significance of mRNA folding? • Secondary Structures have been used to explain • Translational controls • Regulatory function in the cell especially • the non-coding mRNA • What are the different folding algorithms? • Energy Minimization • Base Pair Maximization • Covariation • Eg: Mfold, Vienna Package

  5. Previous Work Backup Method Reference Background Introduction Acknowledgments • Basic Facts-Disordered Protein • What is a disordered Protein? • lack a well defined three-dimensional structure • conserved between species in composition and sequence • presence of low sequence complexity • amino acid compositional bias away from bulky hydrophobic residues • What are the significance of disorder Proteins? • regulation of transcription and translation, cellular signal transduction, protein phosphorylation, the storage of small molecules and the regulation of the self assembly of large multiprotein complexes such as the bacterial flagellum and the ribosome

  6. Previous Work Backup Method Reference Background Introduction Acknowledgments Basic Facts-Disordered Protein What is its role in diseases? Famous (or infamous?) disorder proteins in diseases -alpha-synuclein -p53 -proteins in HPV’s linked to Ovarian Cancer What are the different predictors that are used? (all based on amino acid sequence inputs) VL2,VSL2,PONDR,VLXT Image Courtesy: http://www.disprot.org

  7. Previous Work Backup Method Reference Background Introduction Acknowledgments • Snapshot from Previous Studies ….. • Third Codon and stability • Speed of translation and protein secondary structures • -alpha helices and beta sheets • The three bases in the codon • 1st base -Biosynthetic pathway • 2nd base -Residue hydrophobicity • 3rd base -helix or beta strand-forming potential of amino • acid

  8. Previous Work Backup Method Reference Background Introduction Acknowledgments • In a Nutshell • Check if nucleotide composition has a bias towards the proteins being ordered and disordered • Check if the stability of RNA fold have any say in differentiating the proteins between the two categories. • Work is different because no study has linked Protein disorder and mRNA composition and stability. • Also establishing the correlation would open new avenues in studying how protein structure can be inferred directly from its precursor- the mRNA.

  9. Previous Work Backup Method Reference Background Introduction Acknowledgments • Hypothesis • There should exist some kind of codon bias between the mRNA sequence of ordered and disordered protein • There should be a difference in folding energy stability between the mRNA of ordered and disordered proteins • There is a correlation between the age of codons and disordered proteins Central dogma

  10. Previous Work Backup Method Reference Background Introduction Acknowledgments • Method • Data Collection • Implementation • Analysis • Future Work

  11. Predicted Dataset (From disorder predictors) True Dataset(Experimentally Verified) Previous Work Backup Method Reference Background Introduction Acknowledgments • Data Collection • One of the important phases , as the whole significance of the analysis lies on the quality of data set selected for both the categories of proteins. • After experimentation with various other databases, proteins were finally taken from the unigene90, DisProt and PDB • Disorder was predicted using VSL2B Dataset

  12. Previous Work Backup Method Reference Background Introduction Acknowledgments • Data Collection • Once we have the proteins of interest, we use Uniprot to webmine the protein and corresponding mRNA dataset based on their unigene id • Problem! • Introns • Poly A tails, which need to be removed • We need a clean data set, in order to study Codon Usage, and nucleotide composition

  13. Previous Work Backup Method Reference Background Introduction Acknowledgments • Solution - Alignment • BLAST • Proved to be efficient while aligning the ordered proteins • Extremely inefficient while aligning protein vs. mRNA for the disordered set of proteins • Disorder proteins have more low complexity region • WISE • Software by the EMBL institute to align protein vs. nucleotide data • Uses Markov Chain methods to make gene predictions and hence identifies introns • Extremely efficient and provided qualitative datasets

  14. Previous Work Backup Method Reference Background Introduction Acknowledgments Data Collection-Final input Statistics

  15. Previous Work Backup Method Reference Background Introduction Acknowledgments • Method -Overview • Analyzed mainly two characteristics of mRNA • Nucleotide Composition of mRNA • Codon Usage • Nucleotide Composition • RNA Folding Energy and Base Pair analysis using Mfold • number of base pair formation • total minimum free energy per RNA fold between

  16. Previous Work Backup Method Background Introduction Acknowledgments Reference Methods Mfold Snapshot

  17. Previous Work Backup Method Background Introduction Acknowledgments Mfold -Overview What is Mfold? A mRNA secondary structure prediction algorithm by M. Zuker and N.Markham How does it work? It is based on the nearest neighbor thermodynamic rules in which free energies are assigned to loops rather than base pairs. It tries to predict the optimal structure by minimizing the overall free energy of the structure formed by coaxial stacking of helices. What does it output? Several output files for every optimal and sub optimal folds within the allowable energy range are obtained. Energy dot plot (on the right) is one important component of this predictor output

  18. Previous Work Backup Method Background Introduction Acknowledgments Method • Tools Employed • Parsing and mining information on Web done by PERL • Analysis and graphs done using MATLAB • Reporting and graphs done in Excel • Disorder Prediction using mRNA inputs was done in MATLAB using SVM

  19. Previous Work Backup Method Background Introduction Acknowledgments Reference Results

  20. Previous Work Results Method Reference Background Introduction Acknowledgments Nucleotide Composition Nucleotide Composition True Dataset Predicted Dataset

  21. Previous Work Results Method Reference Background Introduction Acknowledgments Analysis based on the Composition of mRNA Analysis of Codon Age Amino acid New Old codon New 14 out of 18 Amino Acids have Disorder promoting Codon as the older one 2 amino acids (M and W) are neutral as they have only one codon each

  22. Previous Work Results Method Reference Background Introduction Acknowledgments Base Composition Preferential selection of codons with “g” or “c” for the third base Base Composition Predicted Dataset Statistical Verification

  23. Previous Work Results Method Reference Background Introduction Acknowledgments Energy of Folding and Base Pair Energy of Folding Predicted Dataset

  24. Previous Work Results Method Reference Background Introduction Acknowledgments Energy of Folding and Base Pair Base Pair Analysis Base Pair Analysis

  25. Previous Work Results Method Reference Background Introduction Acknowledgments Energy of Folding and Base Pair Sequence Entropy Plot

  26. Previous Work Results Method Reference Background Introduction Acknowledgments Future Work Predictions Aim: To predict disorder from mRNA based on all above information Using Support Vector Machines(SVM’s) • Based on Codon Composition • Age of Codons • Base Composition Accuracies have been good and promising

  27. Previous Work Results Method Reference Background Introduction Acknowledgments Future Work Acknowledgments Dr. Predrag Radivojac Dr. Haixu Tang Dr. Vladimir Uversky Amrita Mohan Linda Hostetter Informatics faculty and staff My various Course Professors Friends and Fellow Students

  28. Previous Work Results Method Reference Background Introduction Acknowledgments Future Work References 1. http://helix.nih.gov/docs/online/mfold/node3.html 2 Jan C Biro Nucleic acid chaperons: a theory of an RNA-assisted protein folding Theoretical Biology and Medical Modeling 2005, 2:35  3 T. A. Thanaraj and p. Argos Protein secondary structural types are differentially coded on messenger RNA Protein Sci. 1996 5: 1973-1983 4 Taylor FJR, Coates D. 1989. The code within codons. Biosystems 22:177-187. 5.Brunak S, Engelbrecht J, Kesmir C. 1994. Correlation between protein secondary structure and the mRNA nucleotide sequence Protein Structure by Distance Analysis. Amsterdam: 10s Press. pp 327-334. 6. H Jane Dyson and Peter E Wright Intrinsically Unstructured proteins and their functions Nat Rev Mol Cell Biol. 2005 Mar; 6(3):197-208 7. Dunker, A.K., Brown, C.J., Lawson, J.D., Lakoucheva, L.M, and Obradovic, Z Intrinsic disorder And Protein Function. 8 Tompa P Intrinsically Disorder proteins evolve by repeat expansion Bioessays 2003 Sep; 25(9):847-55 9 Svetlana A. Shabalina, Aleksey Y. Ogurtsov, and Nikolay A. Spiridonov A periodic pattern of mRNA secondary structure created by the genetic code Nucleic Acids Res. 2006; 34(8): 2428–2437 10 Edward N Trifonov Theory of Early Molecular Evolution Landes Biosciences 2006 11 E.N.Trifonov Consensus temporal order of Amino Acids and evolition of the triplet code Gene 2000 ;( 261):139-151 12 Predrag Radivojac, Zoran Obradovic, David K. Smith, Guang Zhu, Slobodan Vucetic, Celeste J. Brown J. David Lawson and A. Keith Dunker Protein flexibility and intrinsic disorder ProteinScience (2004), 13:71-8013 N. R. Markham & M. Zuker. UNAFold: software for nucleic acid folding and hybridizing. Methods in Molecular Biology: Bioinformatics. Totowa, NJ: Humana Press, in press. 14 Peng K., Radivojac P., Vucetic S., Dunker A.K., and Obradovic Z., Length-Dependent Prediction of Protein Intrinsic Disorder, BMC Bioinformatics 7:208, 2006. 15 Gene Ontology: tool for the unification of biology. Nture Genet. (2000) 25: 25-29. 16 Brooks D, Singh, M, Fresco J R Selection influences the proteomic usage of a majority of amino acid 17 Vucetic S, Obradovic Z, Vacic V, Radivojac P, Peng K, Iakoucheva LM, Cortese MS, Lawson JD, Brown CJ, Sikes JG, Newton CD, and Dunker AK. 2005Disprot: A database of protein disorder Bioinformatics 21:137-140