190 likes | 615 Views
Learn about the basics of RNA structure, secondary structure prediction, pseudoknots, and base pair maximization using dynamic programming algorithms. Understand the challenges posed by pseudoknots and explore the process of maximizing base pairings in RNA structures.
E N D
RNA Secondary Structure Prediction Dynamic Programming Approaches Sarah Aerni http://www.tbi.univie.ac.at/
RNA folding Dynamic programming for RNA secondary structure prediction Outline
RNA Basics • RNA bases A,C,G,U • Canonical Base Pairs • A-U • G-C • Bases can only pair with one other base. Image: http://www.bioalgorithms.info/
RNA Secondary Structure Pseudoknot Stem Interior Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop Image– Wuchty
Circle Plot • Linear RNA strand folded back on itself to create secondary structure • Circularized representation uses this requirement • Arcs represent base pairing Images – David Mount • All loops must have at least 3 bases in them • Equivalent to having 3 base pairs between all arcs Exception: Location where the beginning and end of RNA come together in circularized representation
Trouble with Pseudoknots • Pseudoknots cause a breakdown in the Dynamic Programming Algorithm. • In order to form a pseudoknot, checks must be made to ensure base is not already paired – this breaks down the recurrence relations Images – David Mount
Base Pair Maximization A C C A Problem: Find the RNA structure with the maximum (weighted) number of nested pairings G C C G G C A U A U U A U A C A G A C A C A G U A A G C U C G C U G U G A C U G C U G A G C U G G A G G C G A G C G A U G C A U C A A U U G A ACCACGCUUAAGACACCUAGCUUGUGUCCUGGAGGUCUAUAAGUCAGACCGCGAGAGGGAAGACUCGUAUAAGCG
Base Pair Maximization – Dynamic Programming Algorithm S(i,j) is the folding of the subsequence of the RNA strand from index i to index j which results in the highest number of base pairs Simple Example: Maximizing Base Pairing Unmatched at i Bifurcation Umatched at j Base pair at i and j Images – Sean Eddy
Base Pair Maximization – Dynamic Programming Algorithm S(i, j – 1) • Alignment Method • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension S(i + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar to unmatched alignment Bases can pair, similar to matched alignment Dynamic Programming – possible paths S(i + 1, j – 1) +1 Images – Sean Eddy
Base Pair Maximization – Dynamic Programming Algorithm • Alignment Method • Align RNA strand to itself • Score increases for feasible base pairs • Each score independent of overall structure • Bifurcation adds extra dimension Reminder: For all k S(i,k) + S(k + 1, j) Reminder: For all k S(i,k) + S(k + 1, j) k = 0 : Bifurcation max in this case S(i,k) + S(k + 1, j) Initialize first two diagonal arrays to 0 Fill in squares sweeping diagonally Bases cannot pair, similar Bases can pair, similar to matched alignment Dynamic Programming – possible paths Bifurcation – add values for all k Images – Sean Eddy
Base Pair Maximization - Drawbacks • Base pair maximization will not necessarily lead to the most stable structure • May create structure with many interior loops or hairpins which are energetically unfavorable • Not biologically reasonable
References • How Do RNA Folding Algorithms Work?. S.R. Eddy. Nature Biotechnology, 22:1457-1458, 2004.