Using rna secondary structure to guide sequence motif finding toward single stranded regions
Download
1 / 15

- PowerPoint PPT Presentation


  • 124 Views
  • Uploaded on

Using RNA secondary structure to guide sequence motif finding toward single-stranded regions. Michael Hiller, Rainer Pudimat, Anke Busch and Rolf Backofen Rachel Brower-Sinning. Motivation:.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about '' - kostya


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
Using rna secondary structure to guide sequence motif finding toward single stranded regions l.jpg

Using RNA secondary structure to guide sequence motif finding toward single-stranded regions

Michael Hiller, Rainer Pudimat, Anke Busch and Rolf Backofen

Rachel Brower-Sinning


Motivation l.jpg
Motivation: finding toward single-stranded regions

RNA binding proteins are an integral part of pre-mRNA processing (splicing, etc.) and regulate mRNA processes (transport, stability, etc)

On the mRNA molecule, there might be multiple binding sites according to the sequence specificity, but the structure of the strand will determine which site is accessible to the BP

Current programs, such as MEME, search for motifs using a PSPM to look at the probability of each letter at each position in the pattern, but do not look at the secondary structure of the sequence


Approach l.jpg
Approach finding toward single-stranded regions

The approach taken by MEMERIS to to first pre-compute the “single-strandedness” of a substring in the RNA sequence from position a to b; allowing for the choice of two measurements-

The first being the probability that all bases in the string are unpaired (PUab)

And the second being the expected fraction of bases in the substring [a,b] that do not form base pairs (EFab)


Approach con t l.jpg
Approach con’t finding toward single-stranded regions

Next, this secondary structure information is integrated in with MEME

MEME is a program for finding motifs in a set of unaligned sequences (X = X1, X2, … ,Xn), where the motif is defined as a PSPM, Θ1 = (P1, P2, … ,PW) where W is the length of the motif and the vector Pi is the probability distribution of the letters at position i.

A given sequence Xi is modeled as consisting of two different parts:

- a non-negative number of non-overlapping motif occurrences sampled from the matrix

- random samples from a background probability distribution for the remaining sequence positions


Approach con t5 l.jpg
Approach con’t finding toward single-stranded regions

MEME considers three different models-

- exactly one motif occurrence per sequence (OOPS model)

- zero or one motif occurrences per sequence (ZOOPS model)

- zero or more motif occurrences per sequence (TCM model)

To find the motif(s) an expectation maximization algorithm is used to perform a maximum likelihood estimation of the model given the data


Approach con t6 l.jpg
Approach con’t finding toward single-stranded regions

OOPs model

-MEME uses where

-which changes to

in MEMERIS, where


Approach con t7 l.jpg
Approach finding toward single-stranded regionscon’t

The expectation of the hidden variables is computed as

And the ML estimation remains unchanged


Effects l.jpg
Effects finding toward single-stranded regions


Results l.jpg
Results finding toward single-stranded regions

  • PU values are stricter than the EF values; using PU values will favour single strandedness more than using EF values

  • Trade off: PU values are dependent on motif length: an increase in the length of the motif results in a decrease in PU

    While EF values are independent of motif length


Results con t l.jpg
Results con’t finding toward single-stranded regions


Results con t11 l.jpg
Results con’t finding toward single-stranded regions

Using the SELEX data, which contains 33 TCAT or ACAT repeats in hairpin loops, binding sites of the neuron-specific splicing factor Nova-1

MEMERIS correctly identifies these described 33 TCAT and ACAT his

MEME identifies the correct motifs, but also motifs outside the hairpins (not binding sites)


Results con t testing on selex data l.jpg
Results con’t finding toward single-stranded regions(testing on SELEX data)


Results con t pie rfam data l.jpg
Results con’t finding toward single-stranded regions(PIE Rfam data)


Results con t tar rfam l.jpg
Results con’t finding toward single-stranded regions(TAR Rfam)


Discussion l.jpg
Discussion finding toward single-stranded regions

RNA binding proteins bind in a sequence specific manner, but have a preferred structural characteristic to the binding site (with the motif occurring in dsRNA having been found to eliminate protein binding)

MEMERIS can both look for sequence motif and incorporate secondary structure information

MEMERIS has been shown to identify ssRNA motifs that often are the protein binding motif


ad