1 / 54

Michal Ziv-Ukelson New Tools for Comparative Structural RNAomics

Michal Ziv-Ukelson New Tools for Comparative Structural RNAomics. RNA Structure, Dimensions 1- 3 : Folding. Bioinformatic Structural witnesses for RNA functionality. Witness 1: Structure Stability. Witness 2: Sequence/Structure Conservation.

ddow
Download Presentation

Michal Ziv-Ukelson New Tools for Comparative Structural RNAomics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Michal Ziv-Ukelson New Tools for Comparative Structural RNAomics

  2. RNA Structure, Dimensions 1- 3 : Folding

  3. Bioinformatic Structural witnesses for RNA functionality Witness 1: Structure Stability. Witness 2: Sequence/Structure Conservation. (within the structural context). Witness 3: Structure Conservation.

  4. Structural Cis-Elements: Purine Riboswitch

  5. Structural Cis-Elements: Purine Riboswitch “GGUAU” “CCGUA” GGUAU [Mandal et al., 2003] predicted a potential pseudoknot between the two arms of the purine riboswitch aptamer. CCGUA

  6. Witness 1: Stablity of Structure (2D, predicted) AUCCCCGUAUCGAUC AAAAUCCAUGGGUACCCUAGUGAAAGUGUA UAUACGUGCUCUGAU UCUUUACUGAGGAGU CAGUGAACGAACUGA RNA Secondary Structure Prediction: O(N3): [Nusssinov-Jacobson 1980, Zuker-Stiegler-1981] MFOLD:http://www.rpi.edu/~zukerm Vienna RNA Package:http://www.tbi.univie.ac.at/~ivo/RNA

  7. Witness 2: Sequence Conservation (e.g in binding sites) Lactobacillus acidophilus Lactobacillus delbrueckii GGUAU  GGUAU CCGUA CCGUA 

  8. Witness 3: Compensatory Mutations (in stems) Lactobacillus acidophilus Lactobacillus delbrueckii G-U  U-A

  9. Witness 3: Compensatory Mutations (in stems) Lactobacillus acidophilus Lactobacillus delbrueckii G-C  C-G

  10. Three Approaches to Structural RNA Comparative Analysis Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction Sequence alignment T-coffee Clustalw Prm A B C Sankoff locaRNA, foldAlign dynAlign, Carnac pmcomp Aligned Sequences Simultaneous Fold and Alignment Homologous RNA secondary Structures Fold alignment RNAalifiold Pfold ilm Structure Alignment RNAforester maRNA Aligned Structures

  11. Approach A to Structural RNA Comparative Analysis [Giegrich-2004] Homologous RNA sequences Sequence alignment T-coffee Clustalw Prm A Witness 3 :Sequence Conservation… But without the Structural Context !!! A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A .Witness 2:Structural Conservation Aligned Sequences Fold alignment RNAalifiold Pfold ilm Witness 1:Structure Stability. Aligned Structures

  12. Approach A to Structural RNA Comparative Analysis [Giegrich-2004] Homologous RNA sequences Sequences need to be similar enough so that they can be initially… aligned Yet sequences should be dissimilar enough for co-varying substitutions ! to be detected Sequence alignment T-coffee Clustalw Prm A A C G T G G A G A A C G G A C C C T A A A G G G G A T A T A G C A A T T A T C C G G A T T A G T T C C G G A T T G G A C G A A T A G G G C T A A A T G C C A Aligned Sequences Fold alignment RNAalifiold Pfold ilm Aligned Structures

  13. Three Approaches to Structural RNA Comparative Analysis Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction Sequence alignment T-coffee Clustalw Prm A B C Sankoff locaRNA, foldAlign dynAlign, Carnac pmcomp Aligned Sequences Simultaneous Fold and Alignment Homologous RNA secondary Structures Fold alignment RNAalifiold Pfold ilm Structure Alignment RNAforester maRNA Aligned Structures

  14. Approach C to Structural RNA Comparative Analysis [Giegrich-2004] Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction C Machine Learning Homologous RNA secondary Structures Structure Alignment RNAforester maRNA Aligned Structures

  15. Approach C to Structural RNA Comparative Analysis [Giegrich-2004] AUCCCCGUAUCGAUC AAAAUCCAUGGGUACCCUAGUGAAAGUGUA UAUACGUGCUCUGAU UCUUUACUGAGGAGU CAGUGAACGAACUGA Homologous RNA sequences Fold Sequences Crystallography/NMR MFE prediction C Machine Learning Witness 1:Structure Stability Witnesses separated to two stages (can’t consult) !!! Homologous RNA secondary Structures Structure Alignment RNAforester maRNA R R M M Witnesses 2: Structural Conservation Witnesses 3: Sequence Conservation within the structural context). H B I B H Aligned Structures H H H

  16. The problem Target RNA sequence Structure not known Consider top-ranking suboptimal folding predictions Query RNA known Sequence\structure

  17. Outline • Previously: RNA folding Now: RNA search • RNA’s structure representations • Approaches to Tree Comparisons • Algorithm for Approximate Labelle Subtree Isomorphism\Homeomorphism • Results

  18. Non Coding RNA Families • They are only partially conserved in sequence, but they are conserved in structure. • Have a role in regulating gene expression. • tRNA, rRNA, snoRNA, microRNA, siRNA, Riboswitch Structure Function

  19. Our Goal Genome Sequence millions of nucleotides QUERY ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG ACGCUGACGUAGUCAGUAGACGAC AGACAGAUACGUCACCGCAGAUAC GCAUAGUAGCAGUAGCAGAUGACG …………………………………………… …………………………………………… Are there any appearances of this structure in the genome? Discover ncRNA templatess in a sequence database.

  20. QUERY Example: Purine Riboswitch family consensus from RFAM Database (Seed133, Full 2,427)

  21. Alternative search approach

  22. Matan Drori the Riboswitch Hunter

  23. The tool - STRMS (Structural RNA Motif Search): Input:(1)Secondary structure of the query, including local sequence and structure constraints, and (2) a target sequence database. Output: All occurrences of the query in the target, ranked by their similarity to the query [in html file]. The tool is flexible and takes into account a large number of sequence options. Our approach combines: pre-folding with MFOLD (Zuker, 2003) RNA pattern matching algorithm [O(mn)] based on subtree homeomorphism for ordered, rooted trees.

  24. Isana Veksler Lublinsky Veksler-Lublinsky, I., Ziv-Ukelson, M., Barash, D., & Kedem, K. (2007). A structure-based flexible search method for motifs in RNA. Journal of Computational Biology, 14(7), 908-926.‏

  25. Our method consists of two phases:

  26. RNA’s Secondary Structure (((((((..((((…….)))).(((((…….)))))…..(((((…….))))))))))))

  27. RNA’s Secondary Structure Graph

  28. Comparison of ordered rooted trees • Trees are among the most common and well-studied combinatorial structures in computer science. In particular, the problem of comparing trees occurs in several diverse areas such as: • computational biology • structured text databases • image analysis • automatic theorem proving • compiler optimization.

  29. What is a labeled tree? a b f g c d Tree – a connected acyclic graph Each node in a labeled tree is assigned a label from a certain alphabet

  30. Tree Matching - Grammar A simple parse tree:

  31. RNA Trees?

  32. RNA’s Secondary Structure Pseudoknot Stem Interior Loop Single-Stranded Bulge Loop Junction (Multiloop) Hairpin loop Image– Wuchty

  33. Ordered rooted tree Shapiro, 1988: • The nodes correspond to elements of secondary structure (hairpin loop, bulge, internal loop or multi-loop). • The edges correspond to base-paired (stem) regions. Zhang, 1998: • The nodes of the tree represent either unpaired bases (leaves) or paired bases (internal nodes). Each node is labeled with a base or a pair of bases, respectively. • Two kinds of edges, alternatively connecting either consecutive stem base-pairs or a leaf base with the last base-pair in the corresponding stem.

  34. Our tree representation • Compressed as in [Shapiro, 1988] + a node for every single strand component in multiloops. • Includes additional information on nodes and on edges for the purpose of sequence analysis. • It is more informative than Shapiro’s tree representation and more compact then Zhang’s. • This leads to a precise screening of the target text by first selecting candidates whose structural tree representation is similar to that of the query, and then further filtering these candidates by applying sequence considerations.

  35. Our tree representation

  36. Alignment (Mapping) Properties e e a a b d d Preservation of ancestors

  37. Mapping Aspects: rooting e a a d e d g f f g

  38. Ordered Rooted Tree comparison • The following operations are defined on ordered trees: • relabel - Change the label of a node v in T. • delete - Delete a non-root node v in T with parent v′, making the children of v become the children of v′. The children are inserted in the place of v as a subsequence in the left-to-right order of the children of v′. • insert - The complement of delete. Insert a node v as a child of v′ in T making v the parent of a consecutive subsequence of the children of v′.

  39. 1. Edit distance • An edit script S between T1 and T2 is a sequence of edit operations turning T1 into T2. • The tree edit distance problem is to compute the edit distance and a corresponding edit script. (Edit script in Tree Comparison corresponds to generating the actual alignment in Sequence Comparison).

  40. 1. Edit distance

  41. 1. Edit distance

  42. 1. Edit distance

  43. 1. Edit distance

  44. 1. Edit distance

  45. 2. Tree Inclusion T1 is included in T2 if there is a sequence of delete operations performed on T2 which makes T2 isomorphic to T1. The tree inclusion problem is to decide if T1 is included in T2.

  46. 2. Tree Inclusion T1 is included in T2 if there is a sequence of delete operations performed on T2 which makes T2 isomorphic to T1. The tree inclusion problem is to decide if T1 is included in T2.

  47. Polynomial time algorithms exist for these problems. They are all based on the classical technique of dynamic programming and most of them are simple combinatorial algorithms.

  48. Comparison of ordered rooted trees • Ordered tree comparison is generally computed by tree edit distance, which allows various forms of deletions and insertions in both query and target. • The search for small non-coding RNAs naturally yields a more specific tree search formulation since we do not allow deletions in the query. • In our method we apply a weighted pattern matching algorithm for finding the best homeomorphic mapping between two rooted ordered trees. • Specific constraints on the searched structure can be defined in the input to the search: structural constraints (lengths), allowing or forbidding element deletion in the target, sequence constraints ( local conserved sequence segments, etc).

  49. The Algorithm • Thesubtree isomorphism problem [Matula, 1968,1978]: Given a pattern tree P and a text tree T, find a subtree of T which is isomorphic to P, i.e. find if some subtree of T that is identical in structure to P can be obtained by removing entire subtrees of T, or decide that there is no such tree. • Thesubtree homeomorphism problem[Chung, 1987, Reyner, 1977, Pinter et al., 2004]: Is a variant of the former problem, where degree-2 nodes can be deleted from the text tree. Homeomorphism Example

  50. The Algorithm - Motivation • Point-mutation events could easily result in an extra bulge in an RNA structure. • However, in some cases the functional homology to the original, non-mutated structure is still preserved. • The suggested alignment should be flexible enough to allow the deletion of degree-2 nodes from the target tree. bulge riboswitch and its functional homologue

More Related