Improved RNA Secondary Structure Prediction Using Stochastic Context Free Grammars. Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S. Georgia Institute of Technology. Conclusions. Abstract. Introduction.
Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.
Improved RNA Secondary Structure Prediction Using Stochastic Context Free Grammars.
Esposito, D., Heitsch, C. E., Poznanovik, S. and Swenson, M. S.
Georgia Institute of Technology
There seems to exist multiple characteristics of RNA sequences and structures which can be used to infer the accuracy of the secondary structure prediction using current prediction methods.
If it can be shown that a sequence is difficult for MFE to predict, then it is probable that my stochastic grammar algorithm will predict the secondary structure more accurately.
Accurate RNA secondary structure prediction is an important problem in computational biology. Different RNA nucleotide sequences often fold to similar structures causing current prediction algorithms to range widely in accuracy for RNA strands with similar structures. To understand the origins of these inaccuracies we trained a stochastic context free grammar on a hard-to-predict training set and an easy-to-predict training set which corresponds to a set of sequences with low and high prediction accuracy respectively.
We found interesting statistical differences in the nucleotide composition of the sequence as well as the distribution of nucleotide base pairs between the two training sets. Stochastic context free grammars provide a means to quantify subtle difference in the composition of native secondary structures. The discovery of these differences could potentially lead to the improvement of current prediction algorithms. We are currently performing a parametric analysis of several prediction methods.
Figures 4. 5S and 16S F-Measure Distribution
Figure 6. Canonical vs. Non-Canonical base
pair probabilities for 5S and 16S.
Figure 2. Pfold Grammar. Parse Tree
Figure 1. The Pfold Grammar
Figure 3. F-Measure
Figure 5. F-Measure accuracy MFE vs. Stochastic Grammars
Figure 7. p(t→c) vs. p(t→u) for 5S and 16S.
Cannone J, Subramanian S, Schnare M, Collett J, D’Souza L, Du Y, Feng B, Lin N, Madabusi L, Miller K, Pande N, Shang Z, Yu N, Gutell R: The Comparative RNA Web (CRW) Site: an online database of comparative sequence and structure information for ribosomal, intron, and other RNAs. BMC Bioinformatics 2002, 3.
Mathews D.H., Schroeder S.J., Turner D.H., and Zuker M. RNA World. Cold Spring Harbor Labratory Press, 3rd edition, 2006.
Sean R. Eddy Richard Durbin, Anders Krogh, and Graeme Mitchison. Biological Sequence Analysis. Cambridge University Press, 1998.
Ivo L. Hofacker. Vienna rna secondary structure server. Nucleic Acids Research, 31(13):3429–3431, 2003.
Georgia Institute of Technology
Email: [email protected]