1 / 41

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar. PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki Seki and Tadao Kasami Graduate School of Information Science, Nara Institute of Science and Technology (NAIST). NAIST. Table of Contents.

nairi
Download Presentation

RNA Structure Prediction Including Pseudoknots Based on Stochastic Multiple Context-Free Grammar

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. RNA Structure Prediction Including PseudoknotsBased on Stochastic Multiple Context-Free Grammar PMSB2006, June 18, Tuusula, Finland Yuki Kato, Hiroyuki Seki and Tadao Kasami Graduate School of Information Science, Nara Institute of Science and Technology (NAIST)

  2. NAIST

  3. Table of Contents • Background • Grammatical approach to RNA structure modeling • Model • Stochastic multiple context-free grammar • Algorithms • Parsing and parameter estimation • Experimental results • RNA pseudoknot prediction • Summary

  4. RNA Secondary Structure:Stem-Loop Complementary base pairs A•U G•C Connect base pairs with arcs. U C A A nested Loop C•G U•A U•A Stem 5’—C A A U G A C—3’ C U U C A U C A G A A A A U G A C

  5. Modeling RNA Secondary Structure by Context-Free Grammar (CFG) • RNA secondary structure can be modeled by parse structure of CFG. Structure predictionParsing • Example of CFG rules: S S S S u u c a u c a g a a U U C A U C A G A A Secondary structure Derivation tree

  6. RNA Secondary Structure:Pseudoknot • CFGs cannot represent pseudoknots. Connect base pairs with arcs. crossed A 5’—C U U C A A G A C U U G A C—3’ • • • • • • A C U U C A U C A G A A A A U G A C A

  7. Early Studies n: sequence length

  8. Early Studies (cont.) • Grammars for fully describing RNA pseudoknots: • SL-TAG and ESL-TAG [Uemura et al., 1999] • RPG [Rivas and Eddy, 2000] • These grammars have been identified as subclasses ofmultiple context-free grammars. [Kato et al., 2005]

  9. Motivation • Multiple context-free grammar (MCFG): • Natural extension of CFG • Easy to compare generative power and design algorithms • Generative power to represent pseudoknots • Polynomial time parsing algorithm • We have shown a candidate subclass of the minimum grammars of MCFGs for representing pseudoknots. [Kato et al., 2005]

  10. What’s New in the Present Work • Extension of MCFGs to a probabilistic model (stochastic MCFG, SMCFG) • Design of polynomial timeparsing andparameter estimationalgorithms for the subclass of SMCFGs • Experiments on RNApseudoknot prediction

  11. Early Studies and Present Work

  12. Table of Contents • Background • Grammatical approach to RNA structure modeling • Model • Stochastic multiple context-free grammar • Algorithms • Parsing and parameter estimation • Experimental results • RNA pseudoknot prediction • Summary

  13. A G A C U U Pseudoknot A G A C U Stem-loop genes Gene finding Relation between SMCFG and Major Probabilistic Models Probabilistic extension Strong SMCFG MCFG CFG SCFG Generative power HMM FA Weak

  14. From HMM to SCFG

  15. Stochastic Multiple Context-Free Grammar (SMCFG) • G = (N, T, F, P, S) N: finite set of nonterminals, T: finite set of terminals, F: finite set offunctions, P: finite set of rules with probabilities, S N: start symbol

  16. Functions of SMCFG • Example:

  17. Rules of SMCFG • Rule: • : probability that the rule is applied • The sum of the probabilities of the rules with the same left hand side should be one. • Example:

  18. A1 Ak Prob. p1 Prob. pk A: f Ak A1 … Prob. Derivation Trees in SMCFG …

  19. A Prob. 0.7 (a g ,c u) B Prob. 0.35 (a g ,ac u) A Prob. 0.28 (a g ,ac uu) Modeling Pseudoknot by SMCFG UP2La[(x1, x2)] = (x1, ax2) UP2Ru[(x1, x2)] = (x1, x2u)

  20. SMCFG for RNA Pseudoknot Modeling • W1,…,Wm:nonterminals • Note: W1 is the start symbol. • For each rule, two real values called transition probabilityp1(0 < p11) and emission probabilityp2(0 < p21) are specified. • Probability of each rule is defined as

  21. SMCFG Gs

  22. Table of Contents • Background • Grammatical approach to RNA structure modeling • Model • Stochastic multiple context-free grammar • Algorithms • Parsing and parameter estimation • Experimental results • RNA pseudoknot prediction • Summary

  23. Algorithms for SMCFG • CYK algorithm calculates the optimal alignment of a sequence to an SMCFG (the most likely derivation tree). • Inside algorithm calculates the probability of a sequence given an SMCFG. • Inside-outside algorithm estimates optimal probability parameters for an SMCFG given a set of example sequences.

  24. CYK Algorithm • Input: • The following are calculated by dynamic programming: • : log maximum probability that Wv generates • : log maximum probability that Wy generates

  25. CYK Algorithm (cont.) • Output: log maximum probability that W1 generates i.e. • : the most likely derivation tree • : entire set of probability parameters

  26. Algorithm [CYK] • Initialization: fori←1ton+1, j←iton+1, v←1tom do if// : empty sequence then else • Iteration: fori←ndownto1, j←i1ton, k←n+1downtoj+1, l←k1ton, v←1tom // Some examples are shown.

  27. Wv Wy Wz i h k 1 h+1 j l n Algorithm [CYK] (cont.) • if x1 x21 x22

  28. Wv Wy l1 i k 1 i+1 j l n Algorithm [CYK] (cont.) • if ai x1 x2 al

  29. Complexity of CYK Algorithm • m: # of nonterminals (m = a+b) • n: sequence length • Time complexity: O(amn4+bn5) • Space complexity: O(mn4)

  30. Table of Contents • Background • Grammatical approach to RNA structure modeling • Model • Stochastic multiple context-free grammar • Algorithms • Parsing and parameter estimation • Experimental results • RNA pseudoknot prediction • Summary

  31. Experimental Method • Construction of a model CUACUGUUC SMCFG Sample sequences with structure annotation RNA family database CYK algorithm Secondary structure prediction CUAGUCUUA Test sequence parsing

  32. Data Sets for Experiments • Three viral RNA families including pseudoknots from Rfam ver. 7.0

  33. Corona_pk_3 in Rfam ver. 7.0 • Coronavirus 3' UTR pseudoknot • Sequence length: 6264 Consensus structure

  34. HDV_ribozyme in Rfam ver. 7.0 • Hepatitis delta virus ribozyme • Sequence length: 8791 Consensus structure

  35. Tombus_3_IV in Rfam ver. 7.0 • Tombusvirus 3' UTR region IV • Sequence length: 8992 Consensus structure

  36. Evaluation for Prediction Results • precision = • recall = # of correct base pairs predicted by the algorithm # of predicted base pairs # of correct base pairs predicted by the algorithm # of base pairs specified by the annotation

  37. Experimental Results • Prediction accuracy

  38. Experimental Results (cont.) • Running time *: Implementation in ANSI C on a machine with Intel Pentium D CPU 2.80GHZ and 2.00GB RAM

  39. Pair Stochastic Tree Adjoining Grammar (PSTAG)[MSS05] CUACUGUUC Sample sequences with structure annotation Derivation tree representing known structure RNA family database PSTAG algorithm Secondary structure prediction CUAGUCUUA alignment Test sequence [MSS05] Matsui et al., “Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures,” Bioinformatics, 2005.

  40. Comparison with PSTAG

  41. Summary • A new probabilistic model called SMCFG has been proposed for RNA pseudoknot modeling. • Polynomial time parsing and parameter estimation algorithms have been designed. • Experimental results on RNA pseudoknot prediction have shown good prediction accuracy.

More Related