A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting

A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting Yen Lin Huang Department of Computer Science, National Tsing-Hua University 08/27/2007 InCOB 2007

Outline • Introduction • Our algorithm • Experimental results and discussion • Conclusion

Introduction • Programmed ribosomal frameshifting ( PRF) is a recoding by which translating ribosome switches from initial (zero) reading frame to -1 or +1 reading frame at a specific position and then continues its translation. -1 PRF of SARS-CoV.

Introduction • Consequently, the recoding of PRF leads to an expression of an alternative protein, which is different from that produced by standard translation.

Example: -1 PRF of SARS-CoV • Genomic organization of the SARS-CoV. • If there is no frameshifting, polyprotein (pp) 1a is translated from ORF1a; if there is a frameshifting, pp 1a/1b is translated from ORF1a and ORF1b.

Significance of PRF • Manyviruses,as well as bacteria, have been found to utilize the PRF mechanism for increasing the diversity of gene expression. • This event can also be found in a few eukaryotes. • It has been reported that for viruses, even small changes in their frameshifting efficiencies can inhibit viral propagation. • This implies that frameshifting sites in viruses may present a potential target for antiviral therapeutics.

Signals of -1 PRF Two mRNA signals are critical for -1 PRF. • Slippery sequence • It is the place where -1 PRF event occurs. • 3’-stimulatory RNA structure • It forces ribosome to pause over the slippery site such that the ribosome have a chance to switch from zero reading frame to -1 reading frame. Stimulatory RNA structure Slippery sequence

Model of -1 PRF

Slippery sequence • Usually, the slippery sequence is a hepta-nucleotide (7-mer) of the general form X XXY YYZ. • The spaces in “X XXY YYZ” separate codons in the zero frame. • X and Z are any nucleotide and Y is mostly A or U.

H-type pseudoknot Bulged helix Simple stem-loop Stimulatory RNA structures • In most cases, the RNA structure is an H-type pseudoknot (or bulged helix),but in some cases, it is a simple stem-loop.

Other factors of -1 PRF • The spacer between the slippery sequence and the RNA structure is also important for -1 PRF. • The length of spacer alters the location of the paused ribosome and hence influences its shifting probability. • For some bacteria (such as E. coli), an internal SD-like (Shine-Dalgarno-like) sequence often can be found upstream of the -1 PRF site.

Signals of +1 PRF • It has been observed that the events of +1 PRF occur less than those of -1 PRF. • Therefore, there is no general model that can be widely accepted to describe +1 PRF. • The most known cellular genes with +1 PRF areprfBand oaz genes. • prfB encode polypetide chain release factor 2 (RF2) in E. Coli. • oaz (ornithine decarboxylase antizyme ) encode antizyme 1 in mammals.

Signals of +1 PRF • There is no general form of the slippery sequence for +1 PRF. • EX: the slippery sequences in prfB genes are CUU URA C and those in oaz genes are UUU UGA or YCC UGA, where R is A or G, and Y is C or U. • Not all +1 PRF sites have a downstream RNA structure to function as the stimulator. • EX: the +1 PRF site in the bacterial prfB genes.

0 reading frame -1/+1 PRF -1/+1 reading frame Proteins produced by PRF • Protein products arelonger than those by standard translation. • This kind of PRF events are observed frequently. • It occurs near the end of the zero reading frame. • The ribosome switches to translate the new reading frame by extending beyond the terminator of the zero reading frame.

Proteins produced by PRF (cont.) • Protein products are shorterthan those by standard translation. • Such a PRF is less observed currently. • It takes place within the zero reading frame. • The ribosome then slips backwards and terminates quickly, because it reaches a stop codon in the new reading frame near the slippery site.

Previous results • Based on the model described above, several computational approaches have been proposed for prediction of PRFs. • Pattern recognition (Hammell et al., 1999; Moon et al., 2004) • Statistical analysis (Shah et al., 2002) • Machine learning (Bekaert et al., 2003) • Hidden Markov models (Bekaert et al., 2005; 2006) Hammell, A. B. et al. (1999) Genome Res., 9, 417–427 Moon, S. et al. (2004) Nucleic Acids Res., 32, 4884–4892. Shah, A. A. et al. (2002) Bioinformatics, 18, 1046–1053. Bekaert, M. et al., (2003) Bioinformatics, 19, 327–335. Bekaert, M. et al., (2005) Mol. Cell, 17, 61–68. Bekaert M et al. (2006), Bioinformatics, 22, 2463-2465.

Our approach and web server • In this study, we improved the pattern recognitionmethod of detecting PRF sites in a genomic sequence with using structural and functional bioinformatics. • In addition, we have implemented this algorithm as a web server, called PRooF(Programmed Ribosomal Frameshifting), that is open to the public for online analysis. • http://bioalgorithm.life.nctu.edu.tw/PROOF

Flowchart of our algorithm

Step 1: Identification of ORFs • All ORFs abovea threshold (whose default is 100 nt) are identified from an input sequence.

Step 2: Detection of slippery sites • For the PRFs with longer products: • Find all pairs of partially overlapping ORFs. • Use the pattern recognition to detect all possible slippery sites in the overlapping regions. • The slippery sequences conform to the default patterns or user-defined patterns.

Step 2’: Detection of slippery sites • For the PRFs with shorter products: • We simply searches each identified ORF for its possible slippery sites that possess the required slippery sequences.

Detection of slippery sites (cont.) • If the input is a bacterial sequence, we further looks for an internal SD-like sequence upstream of each slippery site.

Protein signature recognition methods Step 3: Verifying protein function • For all candidate ORFs, their translated protein sequences are further verified by InterProScan to see if they have the potential protein motifs/domains already registered in the InterPro database. • InterPro is an integrated database of protein families, domains and functional sites. • InterProScan is a tool the InterPro that combines various protein signature recognition methods for the detection of motifs/domains.

Step 3: Verifying protein function • For the cases of longer product • Each of two overlapping ORFs is translated into a protein sequence, which is then examined by InterProScan. • For the cases of shorter product • The full-length ORF is cut into two fragments at the slippery site. • These two fragments are then translated into protein sequences and are further examined by InterProScan.

Step 4: Predicting RNA structure • We use a heuristic approach we developed before (Huang et al., 2005) to detect the H-type pseudoknot for the sequence fragment downstream of the slippery site of each PRF candidate. C.-H. Huang et al. (2005), A heuristic approach for detecting RNA H-type pseudoknots, Bioinformatics, Vol. 21, pp. 3501-3508.

Step 4: Predicting RNA structure • If no stable H-type pseudoknotis found, we continue to use RNAMotif to search for all possible bulged helixes and choose the most stable one. • RNAMotifis an RNA structural motif search tool that can find the fragments with the possibility of forming a given structure.

Step 4: Predicting RNA structure • If neither a stable H-type pseudoknot nor a bulged helixis found, RNAMotif is used to search for simple stem-loops.

Our web server: PRooF • Based on the algorithm we described above, we have implemented a web server called PRooF for online analysis. • http://bioalgorithm.life.nctu.edu.tw/PROOF • Our PRooF was tested with a number of sequences with one or two known PRF sites from different species. • The experimental results were compared with those obtained by FSFinder2,which was developed by Moon et al. based on pattern recognition. • Moon, S. et al. (2004) Nucleic Acids Res., 32, 4884–4892. • Byun, Y. et al. (2006) LNCS, 3991, 284-291. • Song, J.J. et al. (2007) Comput. Biol. Chem., 31, 298-302.

Testing data sets • The tested sequences in our experiments were taken from the PseudoBase and RECODE. • PseudoBase collects RNA pseudoknots, some of which are thought to function as the PRFstimulators . • http://biology.leidenuniv.nl/~batenburg/PKB.html • RECODE contains the translational recoding events of PRFs in various biological species. • http://recode.genetics.utah.edu/ • All the tests of PRooF and FSFinder2 were run with default parameters, unless otherwise specified.

Testing data sets of -1 PRF

Testing data sets of +1 PRF

Sensitivity and specificity • Sensitivity(Sen) = 100  TP /(TP+FN) • TP = number of correctly predicted PRF sites • FN = number of known PRF sites that were not predicted • Specificity(Spe) = 100  TN /(TN+FP) • TN = number of predicted non-PRF sites that possess a required slippery sequence but are not annotated as PRF sites in thedatabase • FP = number of incorrectly predicted PRF sites

Average sensitivity and specificity • Indeed,ourPRooF greatly improves detection sensitivity,when compared with FSFinder2. • For the details, please refer to our paper.

Reduction of false positives • To reduce false positives, FSFinder2 considered only two pairsof the partially overlapping ORFs whose zero reading framesare the largest two in length. • Moon et al. (2004) reported that these two pairs had the highest probability to contain -1 and +1 PRF sites. • However, currently there seems to be no biological evidence to support their observation.

Reduction of false positives • On the contrary, we utilized InterProScan to screen out the partially overlapping ORFs whose protein sequences contain no functional motifs/domains. • As shown in our experiments, this approach of functional bioinformatics is very useful to reduce the number of false positives.

PredictedRNAstructures • Most of the RNA structures predicted by PRooF are H-type pseudoknots and bulged helixes,butmany RNA structures identified by FSFinder2 are just simple stem-loops. • Both H-type pseudoknots and bulged helixes are believed to be more constructive to promote the efficiency of -1 PRFs and some +1 PRFs. • The reason is that they have a similar structure of bend conformation and are structurally more stable than simple stem-loops.

PredictedRNAstructures • The -1 PRF of HIV-1 was first thought to be a simple stem-loop, but it was then proved experimentally to be a bulged helix. • Gaudin C. et al. (2005) J Mol Biol, 349, 1024-1035 • The RNA structure predicted by PRooF for the -1 PRF of HIV-1 is indeed a bulged helix, but the one predicted by FSFinder2 is just a simple stem-loop. • It should be worthwhile to determine experimentally the RNA structures in other similar cases, where the structures predicted by PRooF are H-type pseudoknots or bulged-helixes, but just simple stem-loops by FSFinder2 or reported in the literature.

Outline • Introduction • Our algorithm • Experimental results and Discussion • Conclusion

Conclusion • We improved the pattern recognition approach to automatically detecting PRF sites in a given genomic sequence withusing both structuralbioinformatics and functional bioinformatics. • Based on this approach, we have developed a web server PRooFthat is open to the public for online analysis. • http://bioalgorithm.life.nctu.edu.tw/PROOF

Conclusion (cont.) • In our experiments, the testing results showed that PRooF greatly improves sensitivity, when compared with FSFinder2. • Most of the RNA structures predicted by PRooF are H-type pseudoknots and bulged helixes, whereas those predicted by FSFinder2 are simple stem-loops. • PRooFwas implemented in a flexible waythat it allows the user to modify all the default parameters such that some exceptional PRF sites can still be detected.

Acknowledgement • Prof. Chin Lung Lu • Institute of Bioinformatics & Department of Biological Science and Technology, National Chiao Tung University • Mr.Chia-Jung Wu • Institute of Bioinformatics, National Chiao Tung University • Prof. Hien-Tai Chiu • Department of Biological Science and Technology, National Chiao Tung University • Prof. Chuan Yi Tang • Department of Computer Science, National Tsing-Hua University

Thank youforyourattention

A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting

A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting

Presentation Transcript

A prediction approach to representative sampling

A Multi-Parameter Approach to Lightning Prediction

A Scalable Approach to Architectural-Level Reliability Prediction

Bioinformatics approach to the study of scorpion toxins

Protein structure prediction: The holy grail of bioinformatics

A Scientific Approach to Improving VUI Design

A whole school approach to improving writing

A whole school approach to improving writing

A whole school approach to improving writing

A whole school approach to improving writing

A whole school approach to improving writing.

A data mining approach to the prediction of corporate failure

A new approach to protein structure prediction

Bioinformatics of Disease: immune epitope prediction

Comparative Microbial Genomics: A Bioinformatics Approach

Bioinformatics The Prediction of Life

A Novel Approach to Event Duration Prediction

A Bioinformatics Approach to the Security Analysis of Binary Executables

A practical approach to improving productivity

On the biological significance of alternative splicing: a bioinformatics approach

The Ribosomal “ Tree of Life ”

A Bioinformatics Approach to Improving the Prediction of Programmed Ribosomal Frameshifting