1 / 16

CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure

CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure. Facts about RNAs Mainly as “information carrier” in protein synthesis mRNA tRNA Also as catalysts Ribozymes Also as regulator RNAi Single strand Intra-molecular pairing Secondary structure

lovey
Download Presentation

CISC 467/667 Intro to Bioinformatics (Spring 2007) RNA secondary structure

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CISC 467/667 Intro to Bioinformatics(Spring 2007)RNA secondary structure CISC667, S07, Lec19, Liao

  2. Facts about RNAs • Mainly as “information carrier” in protein synthesis • mRNA • tRNA • Also as catalysts • Ribozymes • Also as regulator • RNAi • Single strand • Intra-molecular pairing • Secondary structure • Essential for sequence stability and function • Similarity in secondary structure plays a more important role than primary sequence in determining homology for RNAs CISC667, S07, Lec19, Liao

  3. CISC667, S07, Lec19, Liao

  4. CISC667, S07, Lec19, Liao

  5. CISC667, S07, Lec19, Liao

  6. C A A A G G A A G U A C A C G U G C C G C A RNA stem loop hybridization pairing: A-U, and C-G where “-” stands for a pairing, and “x” for no pairing. In the above example, seq1 and seq2 fold into a similar structure, whereas seq3 does not. Pairwise alignments disregarding such structural restrictions may be misleading; seq2 and seq3 have 70% sequence identity, seq1 and seq3 have 60%, whereas seq1 and seq2 have only 30%. G A U x C C x U G x G seq1 seq2 seq3 seq1 C A G G A A A C U G seq2 G C U G C A A A G C seq3 G C U G C A A C U G CISC667, S07, Lec19, Liao

  7. A A G A G C A U C G C A G G A A A C U G C A G G A A A C U G ( ( ( . . . . ) ) ) Representations for palindromic structures 1) 2) 3) There is a 1-1 correspondence between RNA secondary structures and well-balanced parenthesis expressions, where the balancing parentheses correspond to base pairings via hydrogen bonds. CISC667, S07, Lec19, Liao

  8. Secondary structure prediction Base pair maximization algorithm [Nussinov] mi,j: maximal # of base pairs that can be formed for sequence si…sj. di,j = 1, if si and sj are paired = 0, otherwise Initialization mi,i-1 = 0 for i= [2, L] mi,i = 0 for i = [1,L] Recursion mi,j= max {mi+1,j, mi,j-1, mi+1,j-1+ di,j, maxi<k<j {mi,k+ mk+1,j} // bifurcation } CISC667, S07, Lec19, Liao

  9. Dynamic programming table 1 2 3 4 5 6 7 8 9 G G G A A A U C C 1 G 0 1 2 3 2 G 0 1 2 3 3 G 0 1 2 2 4 A 0 1 1 1 A A 5 A 0 0 1 1 1 6 A 0 1 1 1 A U 7 U 0 G C 8 C 0 G C 9 C 0 G Time Complexity: O(L3), where L is sequence length Space complexity: O(L2) CISC667, S07, Lec19, Liao

  10. Secondary structure prediction Minimum energry algorithm [Zuker] Ei,j: minimum energy for an optimal fold formed by sequence si…sj. ei,j = -5, if si , sj is CG or GC = -4, if si , sj is AU or UA = -1, if si , sj is GU or UG Initialization ei,i-1 = 0 for i= [2, L] ei,i = 0 for i = [1,L] Recursion Ei,j= min {Ei+1,j, Ei,j-1, Ei+1,j-1+ ei,j, mini<k<j {Ei,k+ Ek+1,j} // bifurcation } CISC667, S07, Lec19, Liao

  11. Motivation: • Both DNA and protein sequences can be considered as sentences in a “language”. • We know the “language” is not random • What’s the grammar of the language? • “The linguistics of DNA” – David Searls, American Scientist, 80(19920579. CISC667, S07, Lec19, Liao

  12. Chomsky hierarchy of transformational grammars (1956) • Regular expression (no recursive structure, cannot handle palindromes) • W  aW, or W  a • Context-free • W   • Context-sensitive • 1W 2  1  2 • Unrestricted grammars • 1W 2   Notations: a: any terminals; , : any string of non-terminals and/or terminals, including the null string; : any string of non-terminals and/or terminals, not including the null string. Note: Chomsky normal form: any CFG can be written such that all productions have the following forms: Wv Wy Wz Wv a CISC667, S07, Lec19, Liao

  13. S w1 w2 w3 c a g g a a a c u g seq1 C A G G A A A C U G seq2 G C U G C A A A G C To capture the palindromic structure, a context-free grammar can be given as follows. S  aW1u | cW1g | gW1c | uW1a W1 aW2u| cW2g| gW2c| uW2a W2 aW3u| cW3g| gW3c| uW3a W3  gaaa | gcaa CISC667, S07, Lec19, Liao

  14. Stochastic grammars Motivations: • Irregularities (or exceptions to the grammar) in languages • Such irregularities can be taken care of by “growing” your grammar; introducing new non-terminals and new production rules. • It is useful to differentiate productions that account for a large portion of the language from those that are rare exceptions. • This can be achieved by assigning probabilities to various productions. • For any non-terminals, the probabilities of all its possible productions must add to 1. • Given a sentence x, a stochastic grammar  parsing the sentence will assign a probability P(x|) that the sentence belongs to the language specified by the grammar, whereas, a conventional grammar will just give a yes-or-no answer. • x language P(x| ) =1 CISC667, S07, Lec19, Liao

  15. Stochastic CFG for sequence modeling • Calculate optimal alignment of a sequence to a parameterized SCFG. [Cocke-Younger-Kasami algorithm] • Calculate probability for a given sequence to belong to the language specified by SCFG. [inside algorithm, and inside-outside algorithm] • Given a set of example sequences/structures, estimate optimal probability parameters for a SCFG. Note: • The issues addressed above by SCFGs are quite similar to those for HMMs; actually it can be proved that HMMs are equivalent to stochastic regular grammars. CISC667, S07, Lec19, Liao

  16. Resources Lecture notes: http://www.bioinfo.rpi.edu/~zukerm/lectures/RNAfold-html/ RNA informatics: http://www-lbit.iro.umontreal.ca/RNA_Links/RNA.shtml Software: -Vienna RNA fold CISC667, S07, Lec19, Liao

More Related