- 114 Views
- Uploaded on

Download Presentation
## RNA Secondary Structure Prediction

**An Image/Link below is provided (as is) to download presentation**

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Introduction to RNA Sequence/Structure Analysis

- RNAs have many structural and functional uses
- Translation
- Transcription
- RNA splicing
- RNA processing and editing
- cellular localization
- catalysis

RNA functions

•RNA functions as

- mRNA
- rRNA
- tRNA
- In nuclear export
- Part of spliceosome: (snRNA)
- Regulatory molecules (RNAi)
- Enzymes
- Viral genomes
- Retrotransposons
- Medicine

Biological Functions of Nucleic Acids

- tRNA (transfer RNA, adaptor in translation)
- rRNA (ribosomal RNA, component of ribosome)
- snRNA (small nuclear RNA, component of splicesome)
- snoRNA (small nucleolar RNA, takes part in processing of rRNA)
- RNase P (ribozyme, processes tRNA)
- SRP RNA (RNA component of signal recognition particle)
- ……..

RNA Sequence Analysis

- RNA sequence analysis different from DNA sequence analysis
- RNA structures fold and base pair to form secondary structures
- not necessarily the sequence but structure conservation is most important with RNA

Secondary Structures of Nucleic Acids

Pseudoknots:

- DNA is primarily in duplex form.
- RNA is normally single stranded which can have a diverse form of secondary structures other than duplex.

Source: Cornelis W. A. Pleij in Gesteland, R. F. and Atkins, J. F. (1993) THE RNA WORLD. Cold Spring Harbor Laboratory Press.

rRNA Secondary Structure Based on Phylogenetic Data

3D Structures of RNA:Catalytic RNA

- Some structural rules:
- Base pairing is stabilizing
- Unpaired sections (loops) destabilize
- 3D conformation with interactions makes up for this

Tertiary Structure

Of Self-splicing RNA

Secondary Structure

Of Self-splicing RNA

RNA secondary structure

- E. coli Rnase P RNA secondary structure

Image source: www.mbio.ncsu.edu/JWB/MB409/lecture/ lecture05/lecture05.htm

RNA Variations

- Variations in RNA sequence maintain base-pairing patterns for secondary structures
- when a nucleotide in one base changes, the base it pairs to must also change to maintain the same structure
- Such variation is referred to as covariation.

Covariance

- secondary structure prediction in RNA takes into account conserved patterns of base-pairing
- Positions of covariance are conserved matches, since they maintain the secondary structure
- computationally challenging

Features of RNA

- RNA: polymer composed of a combination of four nucleotides
- adenine (A)
- cytosine (C)
- guanine (G)
- uracil (U)

Features of RNA

- G-C and A-U form complementary hydrogen bonded base pairs (canonical Watson-Crick)
- G-C base pairs being more stable (3 hydrogen bonds) A-U base pairs less stable (2 bonds)
- non-canonical pairs can occur in RNA -- most common is G-U

Features of RNA

- RNA typically produced as a single stranded molecule (unlike DNA)
- Strand folds upon itself to form base pairs
- secondary structure of the RNA

Features of RNA

- intermediary between a linear molecule and a three-dimensional structure
- Secondary structure mainly composed of double-stranded RNA regions formed by folding the single-stranded RNA molecule back on itself

Stem Loops (Hairpins)

- Loops generally at least 4 bases long

Bulge Loops

- occur when bases on one side of the structure cannot form base pairs

Interior Loops

- occur when bases on both sides of the structure cannot form base pairs

Junctions (Multiloops)

- two or more double-stranded regions converge to form a closed structure

Tertiary Interactions

- tertiary interactions can be present as well
- located using covariance analysis

Kissing Hairpins

- unpaired bases of two separate hairpin loops base pair with one another

RNA structure prediction methods

- Dot Plot Analysis
- Base-Pair Maximization
- Free Energy Methods
- Covariance Models

How RNA Prediction Methods Were Developed

- Mount p. 334
- Since Tinoco et al. measured energy associated with regions of ss a few energy based algorithms were developed
- Nussinov and Jacobson (1980), Zuker and Stiegler (1981), Trifonov and Bolshoi (1983) ….

Main approaches to RNA secondary structure prediction

- Energy minimization
- dynamic programming approach
- does not require prior sequence alignment
- require estimation of energy terms contributing to secondary structure
- Comparative sequence analysis
- Using sequence alignment to find conserved residues and covariant base pairs.
- most trusted

Circular Representation

- base pairs of a secondary structure represented by a circle
- arc drawn for each base pairing in the structure
- If any arcs cross, a pseudoknot is present

Circular Representation

- Image source: http://www.finchcms.edu/cms/biochem/Walters/rna_folding.html

Base-Pair Maximization

- Find structure with the most base pairs
- Efficient dynamic programming approach to this problem introduced by Ruth Nussinov (Tel-Aviv, 1970s).
- Tutorial in the classroom: let us try to reconstruct Nussinov’s algorithm

Nussinov Algorithm

- Four ways to get the best structure between position i and j from the best structures of the smaller subsequences

1)Add i,j pair onto best structure found for subsequence i+1, j-1

2)add unpaired position i onto best structure for subsequence i+1, j

3)add unpaired position j onto best structure for subsequence i, j-1

4)combine two optimal structures i,k and k+1, j

Nussinov Algorithm

- compares a sequence against itself in a dynamic programming matrix
- Four rules for scoring the structure at a particular point
- Since structure folds upon itself, only necessary to calculate half the matrix

Nussinov Algorithm

- Initialization: score for matches along main diagonal and diagonal just below it are set to zero
- Formally, the scoring matrix, M, is initialized:
- M[i][i] = 0 for i = 1 to L (L is sequence length)
- M[i][i-1] = 0 for i = 2 to L

Nussinov Algorithm

- Using the sequence GGGAAAUCC, the matrix now looks like the following, such that sequences of length 1 will score 0:

Nussinov Algorithm

- Matrix Fill:
- M[i][j] = max of the following :
- M[i+1][j] (ith residue is hanging off by itself)
- M[i][j-1] (jth residue is hanging off by itself)
- M[i+1][j-1] + S(xi, xj) (ith and jth residue are paired; if xi = complement of xj, then S(xi, xj) = 1; otherwise it is 0.)
- M[i][j] = MAXi<k<j (M[i][k] + M[k+1][j]) (merging two substructures)

Nussinov Algorithm

- The final filled matrix is as follows:

Nussinov Algorithm

- Traceback (P 271, Durbin et al) leads to the following structure:

Nussinov Algorithm

- Web Interface:
- http://ludwig-sun2.unil.ch/~bsondere/nussinov/

Evaluation of Maximizing Basepairs

- Simplistic approach
- Does not give accurate structure predictions
- nearest neighbor interactions
- stacking interactions
- loop length preferences

Free Energy Minimization RNA Structure Prediction

- All possible choices of complementary sequences are considered
- Set(s) providing the most energetically stable molecules are chosen
- When RNA is folded, some bases are paired with other while others remain free, forming “loops” in the molecule.
- Speaking qualitatively, bases that are bonded tend to stabilize the RNA (i.e., have negative free energy), whereas unpaired bases form destabilizing loops (positive free energy).
- Through thermodynamics experiments, it has been possible to estimate the free energy of some of the common types of loops that arise.
- Because the secondary structure is related to the function of the RNA, we would like to be able to predict the secondary structure.
- Given an RNA sequence, the RNA Folding Problem is to predict the secondary structure that minimizes the total free energy of the folded RNA molecule.

Prediction of Minimum-Energy RNA Structure is Limited

- In predicting minimum energy RNA secondary structure, several simplifying assumptions are made.
- The most likely structure is identical to the energetically preferable structure
- Nearest-neighbor energy calculations give reliable estimates of an experimentally achievable energy measurements
- Usually we can neglect pseudoknots

Assumptions in secondary Structure Prediction

- most likely structure similar to energetically most stable structure
- Energy associated with any position is only influenced by local sequence and structure

Structure formed does not produce pseudoknots

Predicting Structure From a Single Sequence

- RNA molecule only 200 bases long has 1050 possible secondary structures
- Find self-complementary regions in an RNA sequence using a dot-plot of the sequence against its complement
- repeat regions can potentially base pair to form secondary structures
- advanced dot-plot techniques incorporate free energy measures

Dot Plot

- Image Source: http://www.finchcms.edu/cms/biochem/Walters/rna_folding.html

Energy Minimization Methods

- RNA folding is determined by biophysical properties
- Energy minimization algorithm predicts the correct secondary structure by minimizing the free energy (G)
- G calculated as sum of individual contributions of:
- loops
- base pairs
- secondary structure elements
- Energies of stems calculated as stacking contributions between neighboring base pairs

Energy Minimization Methods

- Free-energy values (kcal/mole at 37oC ) are as follows:

Energy Minimization Methods

- Free-energy values (kcal/mole at 37oC ) are as follows:

Energy Minimization Methods

- Given the energy tables, and a folding, the free energy can be calculated for a structure

Calculating Best Structure

- sequence is compared against itself using a dynamic programming approach
- similar to the maximum base-paired structure
- instead of using a scoring scheme, the score is based upon the free energy values
- Gaps represent some form of a loop
- The most widely used software that incorporates this minimum free energy algorithm is MFOLD.

Free Energy Minimization RNA Structure Prediction

- http://www.bioinfo.rpi.edu/~zukerm/Bio-5495/RNAfold-html/

Calculating Best Structure

- most widely used software incorporating minimum free energy algorithm is MFOLD
- http://www.bioinfo.rpi.edu/applications/mfold/
- http://www.bioinfo.rpi.edu/applications/mfold/old/rna/

Example Sequence

GCTTACGACCATATCACGTTGAATGCACGC

CATCCCGTCCGATCTGGCAAGTTAAGCAAC

GTTGAGTCCAGTTAGTACTTGGATCGGAGA

CGGCCTGGGAATCCTGGATGTTGTAAGCT

Suboptimal Folds

- The correct structure is not necessarily structure with optimal free energy
- within a certain threshold of the calculated minimum energy
- MFOLD updated to report suboptimal folds

Inferring Structure By Comparative Sequence Analysis

- first step is to calculate a multiple sequence alignment
- Requires sequences be similar enough so that they can be initially aligned
- Sequences should be dissimilar enough for covarying substitutions to be detected

Mutual Information

- fxi : frequency of a base in column i
- fxixj: joint (pairwise) frequency of a base pair between columns i and j
- Information ranges from 0 and 2 bits
- If i and j are uncorrelated, mutual information is 0

Frameshifting

- Virology. 2005 Feb 20;332(2):498-510
- Programmed ribosomal frameshifting in decoding the SARS-CoV genome.
- Baranov PV, Henderson CM, Anderson CB, Gesteland RF, Atkins JF, Howard MT.Department of Human Genetics, University of Utah, 15 N 2030 E, Room 7410, Salt Lake City, UT 84112-5330, USA.Programmed ribosomal frameshifting is an essential mechanism used for the expression of orf1b in coronaviruses. Comparative analysis of the frameshift region reveals a universal shift site U_UUA_AAC, followed by a predicted downstream RNA structure in the form of either a pseudoknot or kissing stem loops. Frameshifting in SARS-CoV has been characterized in cultured mammalian cells using a dual luciferase reporter system and mass spectrometry. Mutagenic analysis of the SARS-CoV shift site and mass spectrometry of an affinity tagged frameshift product confirmed tandem tRNA slippage on the sequence U_UUA_AAC. Analysis of the downstream pseudoknot stimulator of frameshifting in SARS-CoV shows that a proposed RNA secondary structure in loop II and two unpaired nucleotides at the stem I-stem II junction in SARS-CoV are important for frameshift stimulation. These results demonstrate key sequences required for efficient frameshifting, and the utility of mass spectrometry to study ribosomal frameshifting.

Frameshifting

- RNA-struct-frameshift.pdf
- frameshifts.pdf
- hepatitisC-frameshift.pdf

Covariance Models

- 7 approaches to locate covarying sites offered in Mount, p225
- key to covariance is mutual information content
- mutual information content can be plotted on a motif logo

Mutual Information

- Image source: http://www.cbs.dtu.dk/~gorodkin/appl/slogo.html

Covariance Models

- A formal covariance model, COVE, devised by Eddy and Durbin
- Provides very accurate results
- extremely slow and unsuitable for searching large genomes

SCFGs

- Stochastic Context Free Grammars (SCFGs) have also been used to model RNA secondary structure
- Examples
- tRNAScan-SE
- program created to find snoRNAs
- Grammars are created by using a training set of data, and then the grammars are applied to potential sequences to see if they fit into the language

SCFGs

- SCFGs allow the detection of sequences belonging to a family
- tRNAs
- group I introns
- snoRNAs
- snRNAs

SCFGs

- base-paired columns modeled by pairwise emitting non terminals
- aWu; aWa; aWc; aWg; ...
- single-stranded columns modeled by leftwise emitting nonterminals (when possible)
- aW; cW; gW; uW; ..., when possible

SCFGs

- Any RNA structure can be reduced to a SCFG (see Durbin, et al., p 278-279)

Transformational Grammars

- First described by linguist Noam Chomsky in the 1950’s.
- (Yes, the same Noam Chomsky who has expressed various dissident political views throughout the years!)

Transformational Grammars

- Very important in computer science, most notably in compiler design
- Covered in detail in compiler and automaton classes

Transformational Grammars

- Idea: take a set of outputs (sentence, RNA structure) and determine if it can be produced using a set of rules
- consist of a set of symbols and production rules
- The symbols can terminal (emitting) symbols or non-terminal symbols

Grammar for Palindromes

- Consider palindromic DNA sequences
- Five possible terminal symbols: {A, C, G, T, ) ( represents the blank terminal symbol)

Grammar for Palindromes

- Production Rules, where S and W are non-terminal symbols:
- SW
- W aWa | cWc | gWg | tWt
- W a | c| g | t |

Derivation of Sequences

- Using these production rules, a derivation of the palindromic sequence acttgttca follows:
- S W aWa acWcaactWtca acttWttca acttgttca

Parse Trees

- A context-free grammar can be aligned to a sequence using a parse tree
- Root of the tree is the non-terminal start symbol, S
- Leaves are terminal symbols
- Internal nodes are the nonterminals
- Leaves can be parsed from left to right to view the results of production

RNA Structure SCFG

- SW
- WWW(bifurcation)
- W aWu | cWg | gWc | uWa (stems)
- W gWu | uWg
- W aW | cW | gW | uW (bulges)
- WWa | Wc | Wg | Wu (bulges)
- W a | c| g | t |

Example of SCFG

- structure for the RNA structure for the sequence produced by MFOLD, can be constructed (5’ to 3’):
- GCUUACGACCAUAUCACGUUGAAUGCACGCCAUCCCGUCCGAUCUGGCAAGUUAAGCAACGUUGAGUCCAGUUAGUACUUGGAUCGGAGACGGCCUGGGAAUCCUGGAUGUUGUAAGCU

Example Construction

- S
- W
- Wu
- gWcu
- gcWgcu
- gcuWagcu
- gcuuWaagcu
- gcuuaWuaagcu
- gcuuacWguaagcu
- gcuuacgWuguaagcu
- gcuuacgaWuuguaagcu
- gcuuacgacWguuguaagcu
- gcuuacgaccWguuguaagcu
- gcuuacgaccaWguuguaagcu....

Other Programs

- RNA Movies
- http://bibiserv.techfak.uni-bielefeld.de/rnamovies/
- (Visualization of RNA secondary structure)
- RNA LOGOS
- http://www.cbs.dtu.dk/~gorodkin/appl/slogo.html

Download Presentation

Connecting to Server..