1 / 51

Predicting RNA Secondary Structures: A Lattice Walk Approach

This research paper discusses a lattice walk approach to predicting RNA secondary structures, with a focus on modeling sequences within the HIV-1 RNA structure. It explores the challenges of infectious diseases in Africa and the role of mathematical modeling in understanding biological functions.

Download Presentation

Predicting RNA Secondary Structures: A Lattice Walk Approach

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Predicting RNA Secondary Structures: A Lattice Walk Approach to Modeling Sequences Within the HIV-1 RNA Structure Facing the Challenge of Infectious Diseases in Africa: The Role of Mathematical Modeling University of Witswatersrand Johannesburg, South Africa September 25-27, 2006 Asamoah Nkwanta, Ph.D. Morgan State University Nkwanta@jewel.morgan.edu

  2. TOPICS • RNA Prediction & Molecular Biology • RNA Combinatorics • Certain Class of Random Walks • Matrix Theory • Connection Between Walks & RNA • Modeling HIV-1 RNA Sequences

  3. RNA Secondary Structure Prediction “The Human Genome Project and related efforts have generated enormous amounts of raw biological sequence data. However, understanding how biological sequences encode structural information remains a fundamental scientific challenge. For instance, understanding the base pairing, or secondary structure, of single-stranded RNA sequences is crucial to advancing knowledge of their novel biochemical functions.” C. E. Heithsch, Combinatorics on Plane Trees, Motivated by RNA Secondary Structure Configuration (preprint, 2005)

  4. What is RNA Secondary Sequence Prediction ?

  5. RNA Secondary Structure Prediction • Given a primary sequence, we want to find the biological function of the related secondary structure. To achieve this goal we predict (model)its’ secondary structure. • Most methods predict secondary structure rather than tertiary structure. The three dimensional shape is important for biological function, and it is harder to predict.

  6. Molecular Biology (Cont.) 3-D structure of Haloarcula marismortui 5S ribosomal RNA in large ribosomal subunit

  7. Molecular Biology • Central Dogma • DNA  RNA  Protein • Transcription / Translation

  8. Molecular Biology (Cont.)

  9. Molecular Biology (Cont.) However, the "Central Dogma" has had to be revised a bit.  It turns out that you CAN go back from RNA to DNA, and that RNA can also make copies of itself.  It is still NOT possible to go from Proteins back to RNA or DNA, and no known mechanism has yet been demonstrated for proteins making copies of themselves.

  10. Molecular Biology (cont.) • HIV is one of a group of atypical viruses called retroviruses that maintain their genetic information in the form of RNA. Retroviruses are capable of producing DNA from RNA.

  11. Molecular Biology (Cont.)

  12. Molecular Biology (cont.) • Ribonucleic acid (RNA) molecule: Three main categories • mRNA (messenger) – carries genetic information from genes to other cells • tRNA (transfer) – carries amino acids to a ribosome (cells for making proteins) • rRNA (ribosomal) – part of the structure of a ribosome

  13. Molecular Biology (cont.) • Other types (RNA) molecules: • snRNA (small nuclear RNA) – carries genetic information from genes to other cells • miRNA (micro RNA) – carries amino acids to a ribosome (cells for making proteins) • iRNA (immune RNA) – part of the structure of a ribosome (Important for HIV studies)

  14. RNA Secondary Structure • “RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigation. “ • B. Knudsen and J. Hein, Pfold: RNA secondary structure prediction using stochastic context-free grammars (Nucleic Acids Research, 2003) • There are published examples involving tRNA, rRNA, and other types of RNA

  15. RNA Secondary Structure (cont.) • A ribonucleic acid (RNA) molecule consists of a sequence of ribonucleotides (typically single stranded) • Each ribonucleotide contains one of four bases: adenine (A), cytosine (C), guanine (G), and uracil (U)

  16. Secondary Structure (cont.) • Note U is replaced by thymine (T) in DNA • As the molecule forms, chemical bonds join A-U and C-G pairs, (Unstable G-U). These are called the Watson-Crick pairs.

  17. Secondary Structure (cont.) • Primary Structure – The linear sequence of bases in an RNA molecule • Secondary Structure – The folding or coiling of the sequence due to bonded nucleotide pairs: A-U, G-C • Tertiary Structure – The three dimensional configuration of an RNA molecule

  18. Primary RNA Sequence • CAGCAUCACAUCCGCGGGGUAAACGCU • Nucleotide Length, 27 bases

  19. Geometric Representation • Secondary structure is a graph defined on a set of n labeled points • (M.S. Waterman, 1978) • Biological • Combinatorial/Graph Theoretic • Random Walk

  20. RNA COMBINATORICS • RNA Numbers 1,1,1,2,4,8,17,37,82,185,423,978,… • These numbers count various combinatorial objects including RNA secondary structures of length n.

  21. RNA COMBINATORICS (cont.) • The number of RNA secondary structures for the sequence [1,n] is counted by the coefficients of s(z): Coefficients of the formal power series: • (1,1,1,2,4,8,17,37,82,185,423,978,…)

  22. RNA COMBINATORICS (cont.) • The number of lattice paths with unit steps R (right), U (up) & D (down) that go from (0,0), remain in the first quadrant of the coordinate plane, and return to the x-axis under the restriction that there are never consecutive UD steps is the nth RNA number: • (1,1,1,2,4,8,17,37,82,185,423,978,…)

  23. RNA COMBINATORICS (cont.) • The number of RNA sequences of length n that can be formed over the words [A,U,G,C] such that the letters A & U are not adjacent is equal to: • What a remarkable formula for an integer, when n = 1 we get 4, and n = 2 we get 14.

  24. Counting Sequence Database • The On-line Encyclopedia of Integer Sequences: http:/www.research.att.com/njas/sequences/index.html • N.J.A. Sloane & S. Plouffe, The Encyclopedia of Integer Sequences, Academic Press, 1995.

  25. RNA EQUATIONS • Recurrence Relations:

  26. RNA EQUATIONS (cont.) • Generating Function: • 1,1,1,2,4,8,17,37,82,185,423,978,…

  27. RNA EQUATIONS (cont.) • Exact Formula:

  28. RNA EQUATIONS (cont.) • s(n,k) is the number of structures of length n with exactly k base pairs: For n,k > 0,

  29. RNA EQUATIONS (cont.) • Asymptotic Estimate: As n grows without bound

  30. Random Walk • A random walk is a lattice path from one point to another such that steps are allowed in a discrete number of directions and are of a certain length

  31. RNA Walk – Type I • NSE* Walks – Unit step walks starting at the origin (0,0) with steps up, down, and right • No walks pass below the x-axis and there are no consecutive NS steps

  32. RNA Walk – Type I (cont.) • N = (0,1) up • S = (0,-1) down • E = (1,0) right

  33. Type I Walk Array (n x k)

  34. RNA Walk – Type II • NSE** Walks – Unit-step walks starting at the origin (0,0) with steps up, down, and right such that no walks pass below the x-axis and there are no consecutive SN steps

  35. Type II Walk Array (n x k)

  36. Examples • Type I: ENNESNESSE • Type II: NEEENSEEES

  37. RNA Walk Bijection • Theorem: There is a bijection between the set of NSE* walks of length n+1 ending at height k = 0 and the set of NSE** walks of length n ending at height k = 0. • Source: Lattice paths, generating functions, and the Riordan group, Ph.D. Thesis, Howard University, Washington, DC, 1997

  38. Matrices Count Lattice Walks • Type I Walks 1 0 0 0 0 0 0 - 1 1 0 0 0 0 0 - 1 2 1 0 0 0 0 - 2 3 3 1 0 0 0 - 4 6 6 4 1 0 0 - 8 13 13 10 5 1 0 - 17 28 30 24 15 6 1 - - - - - - - - - • Type II Walks 1 0 0 0 0 0 0 - 1 1 0 0 0 0 0 - 2 2 1 0 0 0 0 - 4 4 3 1 0 0 0 - 8 9 7 4 1 0 0 - 17 20 17 11 5 1 0 - 37 41 41 29 16 6 1 - - - - - - - - - The ith-jth entrycorresponds to the number of random walks of length i and ending height j.

  39. Type I Formation Rule (Recurrence)

  40. The Connection Between RNA and the Walks • Theorem: There is a bijection between the set of RNA secondary structures of length n and the set of NSE* walks ending at height k=0. • Source: Lattice paths and RNA secondary structures, DIMAC Series in Discrete Math. & Theoretical Computer Science 34 (1997) 137-147. (CAARMS2 Proceedings)

  41. HIV-1 RNA Sequence Prediction • We want to construct a lattice walk method to predict secondary RNA sequences that code for regions of the SL2 and SL3 domains within the HIV-1 5’ UTR RNA molecule. • These domains are important for HIV genomic packaging

  42. HIV-1 RNA Structural Components

  43. Components of Secondary Structure • Base pairs • Bulges • Interior Loops • End loops • Hairpin • Multibranch loops – junctions where more than one hairpin or more complex secondary structures are appended.

  44. HIV-1 Sequence (SL2 & SL3) • The following sequence was obtained from the NCBI website. The first 363 nucleotides were extracted from the entire HIV-1 RNA genomic sequence: • GGUCUCUCUGGUUAGACCAGAUCUGAGCCUGGGAGCUCUCUGGCUAACUAGGGAACCCACUGCUUAAGCCUCAAUAAAGCUUGCCUUGAGUGCUUCAAGUAGUGUGUGCCCGUCUGUUGUGUGACUCUGGUAACUAGAGAUCCCUCAGACCCUUUUAGUCAGUGUGGAAAAUCUCUAGCAGUGGCGCCCGAACAGGGACCUGAAAGCGAAAGGGAAACCAGAGGAGCUCUCUCGACGCAGGACUCGGCUUGCUGAAGCGCGCACGGCAAGAGGCGAGGGGCGGCGACUGGUGAGUACGCCAAAAAUUUUGACUAGCGGAGGCUAGAAGGAGAGAGAUGGGUGCGAGAGCGUCAGUAUUAAGCG • Color key: • SL2 – yellow • SL3 - red

  45. Known Sequence of the SL2 Domain Type I Walk Type II Walk G G C G A C U G G U G A G U A C G C C mfe -7.1 Original Structure

  46. Lattice Walk Model • Start with an RNA primary sequence • Perform RNA combinatorial analysis on the given sequence • Connect lattice walks to the given sequence using Type I and II walks • Calculate identified sequences to find the minimum free energy • Predict secondary sequence • Conduct laboratory experiments for biological functionality

More Related