1 / 20

Improving Free Energy Functions for RNA Folding

Improving Free Energy Functions for RNA Folding. RNA Secondary Structure Prediction. Why RNA is Important. Machinery of protein construction Catalytic role in cells May be possible to destroy specific sequences of RNA (to interrupt protein production) RNase P (Cech/Altman c.1981).

dorie
Download Presentation

Improving Free Energy Functions for RNA Folding

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction

  2. Why RNA is Important • Machinery of protein construction • Catalytic role in cells • May be possible to destroy specific sequences of RNA (to interrupt protein production) • RNase P (Cech/Altman c.1981)

  3. AAUCG...CUUCUUCCA Primary Tertiary Secondary RNA Structural Levels Secondary: http://anx12.bio.uci.edu/~hudel/bs99a/lecture21/lecture2_2.html Tertiary: http://www.leeds.ac.uk/bmb/courses/teachers/trnballs.html

  4. Abstracting the problem A G C G C A U C Zuker (1981) Nucleic Acids Research 9(1) 133-149

  5. Why it is hard • Large search space (hard to enumerate) Hofacker et al. (1994) Monat. Chem. 125 167-188

  6. Why it is hard • Secondary structure does not exist. • Unlike proteins • Putative structures (prone to revision) • Quality of Energy Functions • Discussed later

  7. Current Algorithms • Single-Strand • Minimum Free Energy (Zuker et. al. 1981) • Partition Functions (McCaskill 1990) • Comparative Sequence Analysis • Max. Weighted Matching (Nussinov et. al. 1978) • Stochastic CFG (Sakikibara et. al. 1994) • Phylogenetic Trees (Gulko et. al. 1995) • Statistical Significance (Noller & Woese, early 80’s) See proposal for references

  8. MFE / Tinoco Hypothesis The free energy of a secondary structure equals the sum of the free energies of the loops and stacked pairs Tinoco et al. (1971) Nature 230 362-367.

  9. Secondary Structures Proposed System AAUCG...CUUCUUCCA 2 GA (E’) 3 1 MFE (E) AAUCG...CUUCUUCCA

  10. Step I - Calc MFE Structure • Given a sequence  apply the MFE algorithm • Generates secondary structure S

  11. Step II - Structural Similarity • Given a database of experimentally verified RNA structures • Let Q be the database structure most similar to S • Based on RNase P Database (Brown 1999)

  12. Step III - Construct E’ • Create a new energy function:

  13. Discussion on E’ • E’ has global information • Global information precludes the use of dynamic programming (MFE, Partition) • Leaves (stochastic) combinatorial optimization • Gradient Descent (no E/S) • Genetic Algorithms / Simulated Annealing

  14. Step IV - Genetic Algorithm • RNA Structural Prediction by GA • Input: sequence  • Output: structure that maximizes E’ for  • Steady State Genetic Algorithm • Pseudoknots forbidden (conflicts) • Fitness = -E’ • Effect of Similarity(Q, S) diminishes with each generation (pseudo-SA).

  15. 23 52 (23 52 3 3.2) length start end weight Genetic Algorithm - Repn. • Stem-loop representation(Chen et. Al. 2000) • Window method (EMBOSS Palindrome)

  16. Fit stems of P2 into C1 or C2 randomly. Placement must be conflict free. C1 P1 P2 C2 Genetic Algorithm - Operators • Mutation • Add stem from stem pool to a child • Crossover

  17. Preliminary Results • E’ does not lead to drastic speed up • Genetic algorithm is very slow • If initial population generated randomly from stem pool. • Use suboptimal folding for initial population.

  18. Preliminary Results Explained • The real structure is usually very similar the Tinoco optimal structure. • View E’ as a way of choosing among the suboptimal structures.

  19. Future Work • More testing on the entire RNase P Database (> 400 structures) • Tune E’ • Accuracy comparison to MFE and Partition Function Algorithms • Parallelize genetic algorithm

  20. END

More Related