1 / 28

RNA secondary structure prediction

Doug Raiford Lesson 7. RNA secondary structure prediction. Why do we care. RNA World Hypothesis RNA world evolved into the DNA and protein world DNA advantage: greater chemical stability Protein advantage: more flexible and efficient enzymes ( biomolecules that catalyze)

aerona
Download Presentation

RNA secondary structure prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Doug Raiford Lesson 7 RNA secondary structure prediction

  2. Why do we care • RNA World Hypothesis • RNA world evolved into the DNA and protein world • DNA advantage: greater chemical stability • Protein advantage: more flexible and efficient enzymes (biomolecules that catalyze) • 20 amino acids vs. 4 nucleotides • Chemically, more diverse • Remnants remain in ribosomes, nucleases, polymerases, and splicing molecules

  3. Primary, secondary, tertiary T arm • Primary: sequence • Secondary: double stranded regions • Reverse complements • Tertiary: three-dimensional structure >tRNA. Carries amino acid for Isolucine AGGCUUGUAGCUCAGGUGGUUAGAGCGCACCCCUGAUAAGGGUGAGGUCGGUGGUUCAAGUCCACUCAGGCCUACCA CCA Tail Acceptor Step D arm Anticodon arm Anticodon

  4. If you were tasked… • How find regions of reverse complementation? • What do we have? • Sequence • A’s like pairing with U’s and G’s like pairing with C’s • Stronger bond (3 hydrogen bonds) between G’s and C’s • Should result in lowest free energy (max enthalpy)

  5. Already know one RNA molecule T arm • tRNA • Transports amino acid to the ribosome CCA Tail Acceptor Step D arm Anticodon arm Anticodon

  6. Exploratory data analysis • Visualization

  7. Dot plot methods • Good at finding longer base-pairings (stacked base-pairs) • Need to find the conformation that provides the minimal total free energy • RNA often has many alternate conformations at different temperatures • Stacked base-pairs add stability • Loops/bulges introduce positive free energy and are destabilizing

  8. Zuker (1981) recursive solution • First nucleotide base-pairs with last • First nucleotide base-pairs with some other (other than last) nucleotide (including none) Recurse on rest Recurrence relations Recurse on every possible set of two strings

  9. Dynamic programming solution j • As luck would have it… • Zuker came up with a dynamic programming solution i

  10. Dynamic programming solution j • Start with zeros on diagonal • Populate diagonally i

  11. Calculate recurrence relations j • Will look at last value to illustrate • Match first and last character, recurse on rest i

  12. All possible pairs of substrings j • Min of all pairs of substrings -3 i G A A GGGAAAUCC G-G-G-A C-C-U G-G A C-C-U A A GGGAAAUCC

  13. Complexity • n2 plus 2n for each visited cell • So O(n3) Populate matrix plus traverse row/column for each cell

  14. Loops and bulges-differing energies • Any prediction method must account for these

  15. Complexity • Now O(n4) • Interior loops most expensive • Can exploit the fact that along diagonals, loops have same size • Can calculate once • Limits search space • Back to O(n3)

  16. MFOLD website T arm • Zuker’s site CCA Tail Acceptor Step D arm Codon: uua Anti-codon: aat Anticodon arm tRNA for Leucine in E. coli, a prototypical organism 1 gccgaggtggtggaattggtagacacgctaccttgaggtggtagtgcccaatagggctta 61 cgggttcaagtcccgtcctcggtacca Anticodon

  17. Paired mutations • Just like proteins: conformation • What if a T-A base-pair mutate to an G-C • Still same function • What would this do to a search or sequence alignment? GCAGGACCAUAUA ||||||||||||| CGUCCUGGUAUAU GCAGGACCAGAUA ||||||||||||| CGUCCUGGUCUAU

  18. Exploiting co-varying nucleotides to locate base-pairs • Phenomenon known as covariance (not to be confused with statistical covariance) GCAGGACCAUAUA ||||||||||||| CGUCCUGGUAUAU GCAGGACCAGAUA ||||||||||||| CGUCCUGGUCUAU

  19. Using covariant base-pairs to identify secondary structure • How might we locate covariant pairs? • MSA then compare all pair-wise combinations of columns • High degree of agreement in two columns (G’s match with C’s, A’s match with U’s) an indication of base-pairing χ2 test Compare to expected number of parings given sequence composition

  20. Other representations • Pairing depicted with nested parentheses AAGACUUCGGUCUGGCGACAUUC ((( ))) (( ( )))

  21. Other representations • Mountain plots A mountain plot represents a secondary structure in a plot of height versus position, where the height m(k) is given by the number of base pairs enclosing the base at position k. I.e. loops correspond to plateaus (hairpin loops are peaks), helices to slopes.

  22. Other representations • Circle plot

  23. Tree representations • Data structure capable of capturing secondary structure • Ordered Binary Tree

  24. Context free grammars • Productions S → aSu | uSa | cSg | gSc S → aS | cS | gS | uS S → Sa | Sc | Sg | Su S → SS S →⍉

  25. Context free grammars • Derivation S → aS S → aSc S → aScc S → acSgcc S → acgScgcc S → acggSccgcc S → acgggScccgcc S → acggggSccccgcc S → acgggguSccccgcc S → acgggguuSccccgcc S → acgggguucSccccgcc S → acgggguucgSccccgcc S → acgggguucgaSccccgcc S → acgggguucgaaSccccgcc S → acgggguucgaauSccccgcc S → acgggguucgaauccccgcc

  26. Context free grammars • Parse tree a←S | S→c | S→c | c←S→g | g←S→c | g←S→c | g←S→c | g←S→cS→u | | u←SS→a \ / u←SS→a \ / c←S—S→g

  27. Summary • Conformation of RNA dictates function • Determining secondary structure can help determine tertiary structure • Dynamic programming approach to identifying minimum energy conformations • Zuker MFOLD • View using dot plots, nested parens, mountain or circular plots • Covariance: base-pairs mutate but still form pairs, exploit to find pairings

More Related