1 / 33

Mira Abraham-Cohen and Haim J.Wolfson

Mira Abraham-Cohen and Haim J.Wolfson. Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel. Why RNA?. RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs). X. Protein. DNA. RNA. The Central Dogma of Molecular Biology.

sarawheeler
Download Presentation

Mira Abraham-Cohen and Haim J.Wolfson

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Mira Abraham-Cohen and Haim J.Wolfson Blavatnik School of Computer Science Tel Aviv University Tel Aviv, Israel

  2. Why RNA? RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs) X Protein DNA RNA The Central Dogma of Molecular Biology

  3. Why RNA? RNA (ribonucleic acid) is: not solely a carrier of genetic information (non-coding RNAs) a key player in essential cellular processes (e.g. protein synthesis and transport, gene silencing) involved in pathological processes (e.g. cancerous tumors, AIDS) a potential drug or drug-target (e.g. RNAi, bacterial ribosomes as antibiotic-targets)

  4. RNA Structure 1D 2D 3D

  5. ?Why RNA secondary structure • “RNA structure” usually refers to 2D structure • Easier to achieve (more common than 3D structures) • Secondary structure elements • Helix • Loop

  6. Secondary Structure elements Helix Internal loop Bulge Multi branch loop Hairpin

  7. GUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCAGUCUGUCCCCACACGACAGAUAAUCGGGUGCAACUCCCGCCCCUUUUCCGAGGGUCAUCGGAACCA .((((((.......))))))....((((.......)))).[[[..((((((]]]...))))))...

  8. Pseudoknot structural motif • Important for the function of many RNAs helix1 i1 < i2 < j1 < j2 helix2 • RNA 2D structure alignment • Disregarding pseudoknots O(n4) [Zhang and Shasha 1989] • Including pseudoknots NP-Hard [Zhang et al. 1999]

  9. Why do pseudoknots make a difference? Are they common? Over 30% of the functional groups Less than 70% 2D similarity

  10. Previous work – RNA 2D alignment • Methods disregarding pseudoknots • RNAforester [Hofacker et al. 2004] • Migals [Allali and Sagot 2005] • MARNA [Siebert and Backofen 2005] • Methods that deal with limited cases • rna_align (DP) [Jiang et al. 2001] • pkalign (DP) [Mohl et al. 2009]

  11. Previous work – RNA 2D alignment • A method that deals with the general problem • LARA (ILP) [Bauer et al. 2007] • All current methods dealing with pseudoknots • High time and memory complexity • Impractical for big structures • rna_align < 150 nts • pkalign < 800 nts • LARA < 1600 nts on pc-wolfson1 (2GB RAM)

  12. HARP Motivation Preserved 3D structure Preserved function Preserved relative 3D distances Preserved function Preserved relative 2D distances Preserved function ?

  13. HARP • Aligns RNA 2D structures with no limitation on the pseudoknot type • Exploits inherent RNA distance constraints • Distances between 2D elements are usually conserved • Pseudoknots often create spatial distance constraints • Goal: Finding the largest set of conserved helices • Heuristic method based on an analog of Geometric Hashing

  14. Geometric hashing Each pair of points defines a “view” Voting table Point of “view”

  15. HARP - Overview R1 R2 Generate reduced “helix” graph representations G1 G2 Build a look-up table of geodesic distances in all bases Query the look-up table Refine alignments and extend the match

  16. Reduced graph representation • Vertices- stable helices • Helix beginning, termination and length • Edges connect adjacent helices • Direction: polymerization direction • Weight: minimal number of nucleotides needed for connection

  17. Graph representation

  18. Graph representation i k j backward forward k k 11 20 4 4 7 16 i i j j

  19. Building a look-up table forward backward Shortest path between any two vertices Any two vertices (i,j) define a “view”

  20. Similar views Inserting G1 triangles Querying with G2 triangles

  21. Querying the vote table Indexing edges Basis edge • Filtering by • Triangle type F/B • ε-vicinity • Querying the table with the indexing edges of G2 • ε-vicinity

  22. Alignment refinement G1 G2 w Distance between the vertices Hungarian algorithm Correlation between helices’ lengths

  23. Alignment extension and scoring • Greedy approach • Starting with the largest (pair of bases) match • Extending by adding the pair that contributes most to the extension • Score

  24. Complexity Generating reduced graphs representations In practice: Average size structures less than a second Big structures (~2800 nucleotides) less than a minute and 10 MB Building a look-up table Querying the look-up table Generating alignments: Alignments refinement Alignment extension

  25. Results • HARP’s statistics • Average score and p-value • Comparison with LARA • Alignment examples

  26. HARP’s statistics

  27. Similar 2D yet different function 5S ribosomal RNA SRP

  28. Comparison with LARA 23 rRNA

  29. Comparison with LARA Sensitivity TP/P=TP/(TP+FN) HARP LARA 1-Specificity = FPR FP / N = FP / (FP + TN)

  30. Self splicing group I introns 68.9% similarity (left) PDB id 1zzn chain B, 10 stable helices. (right) PDB id 1y0q chain A, 13 stable helices.

  31. Catalytic domains of ribonuclease P (left) PDB id 2a2e chain A, 19 stable helices (right) PDB id 2a64 chain A, 16 stable helices .

  32. Conclusions HARP • HARP is a tool for the alignment of RNA secondary structures, which may include pseudoknots • Accurate tool capable of distinguishing between homologous structures and non-homologous structures • Highly efficient • Takes less than a second for average-size structures • Less than a minute and 10 MB for very big structures • Web server : http://bioinfo3d.cs.tau.ac.il/HARP

  33. Thank you for your attention !

More Related