slide1
Download
Skip this Video
Download Presentation
RNA

Loading in 2 Seconds...

play fullscreen
1 / 46

RNA - PowerPoint PPT Presentation


  • 93 Views
  • Uploaded on

Local Exact Pattern Matching for Non-fixed RNA Structures Mika Amit , Rolf Backofen , Steffen Heyne , Gad M. Landau, Mathias Mohl , Christina Schmiedl , Sebastian Will. RNA. RNA R is an ordered pair (S,B) where:. C. A. G. U. A. C. U. A.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' RNA' - harris


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

Local Exact Pattern Matching for Non-fixed RNA StructuresMika Amit,Rolf Backofen, Steffen Heyne, Gad M. Landau, Mathias Mohl, Christina Schmiedl, Sebastian Will

slide2

RNA

RNA R is an ordered pair (S,B) where:

C

A

G

U

A

C

U

A

S is a sequence defined over 𝚺 = {A,C,G,U}

G

C

G

C

U

B is a set of base pairs C-G, G-C, A-U, or U-A

C

U

base pair

singlebase

U

backbone connection

G

G

U

A

G

C

A

U

C

A

C

C

C

U

U

U

CPM 2012, Helsinki

slide3

RNA

RNA R is an ordered pair (S,B) where:

C

A

G

U

A

C

U

A

S presents the primary structure of R

G

C

G

C

B presents the secondary structure of R

U

C

U

U

G

G

U

A

G

C

A

U

C

A

C

C

C

U

U

U

CPM 2012, Helsinki

slide4

RNA Representations

C

A

G

U

A

C

C

U

U

U

GC

U

A

G

C

GC

Tree

G

C

UA

U

C

U

A

G

C

A

U

C

U

G

G

U

A

G

C

A

U

C

A

C

C

C

U

U

U

Arc annotated string

CPM 2012, Helsinki

slide5

RNASecondaryStructure

  • Determines the activity and functionality of the RNA

C

A

G

U

A

C

U

A

G

C

  • Usually more preserved during evolution

C

A

G

C

G

C

C

C

C

U

A

C

U

U

C

G

A

G

G

G

A

A

C

U

A

G

G

A

C

U

A

U

G

C

G

The secondary structures of RNA is highly researched

A

CPM 2012, Helsinki

slide6

RNAStructure

  • Predicting the secondary structure of RNA molecule is a difficult task

C

A

G

U

A

C

U

A

G

C

C

A

G

C

G

C

C

C

C

U

A

C

U

U

C

G

A

  • The structure is sometimes given in a non-fixed form, where each base pair has a probability ≤ 1 to exist in the RNA

G

G

G

A

A

C

U

A

G

G

A

C

U

A

U

G

C

G

A

CPM 2012, Helsinki

slide7

Nested Structure

In all of these examples,

the structure of R is Nested:

Each base can be connected

by a bond connection to

at most one other base,

and there are no crossing arcs

C

A

G

U

A

C

C

U

U

U

GC

U

A

G

C

GC

G

C

UA

U

C

U

A

G

C

A

U

C

U

G

G

U

A

G

C

A

U

C

A

C

C

C

U

U

U

CPM 2012, Helsinki

slide8

Unlimited Structure

Arc annotated substrings can represent Unlimited structures, as well

G

G

U

A

G

C

A

U

C

A

C

C

C

U

U

C

C

A

G

A

C

U

G

A

A

CPM 2012, Helsinki

slide9

Bounded-Unlimited Structure

Arc annotated substrings can represent Bounded-Unlimited structures:

Each base can be connected to a constant number of other bases,

G

G

U

A

G

C

A

U

C

A

C

C

C

U

U

C

C

A

G

A

C

U

G

A

A

and crossing arcs are allowed

CPM 2012, Helsinki

slide10

RNA Similarity Algorithms

Many algorithms for finding similarity between RNA molecules use tree similarity algorithms

  • Tree Edit Distance:
  • Tai (’79) O(n6)
  • Zhang & Shasha (‘89) O(n4)
  • Klein (‘98) O(n3logn)
  • Ma et al. (‘99) O(n3logn)
  • Demaine et al. (‘07) O(n3)

GC

UA

AU

GC

CG

GC

GC

UA

GC

UA

UA

GC

CG

A

G

C

A

U

C

U

C

A

G

C

CPM 2012, Helsinki

A

C

A

G

A

C

U

slide11

RNA Similarity Algorithms

Many algorithms for finding similarity between RNA molecules use tree similarity algorithms

  • Tree Alignment:
  • Jiang et al. (’95)
  • Schirmer & Giegerich (‘11)
  • Backofen et al. (‘07)
  • Mohl et al. (’09)

GC

UA

AU

GC

CG

GC

GC

UA

GC

UA

UA

GC

CG

A

G

C

A

U

C

U

C

A

G

C

CPM 2012, Helsinki

A

C

A

G

A

C

U

slide12

RNA Similarity Algorithms

Many algorithms for finding similarity between RNA molecules use tree similarity algorithms

  • Longest Arc Preserving Common Subsequence:
  • Evans (’99)
  • Lin et al. (’02)
  • Alber et al. (’04)
  • Jiang et al. (’04)

GC

UA

AU

GC

CG

GC

GC

UA

GC

UA

UA

GC

CG

A

G

C

A

U

C

U

C

A

G

C

CPM 2012, Helsinki

A

C

A

G

A

C

U

slide13

RNA Similarity Algorithms

Many algorithms for finding similarity between RNA molecules use tree similarity algorithms

  • Similar Subforests
  • Jansson & Peng (’11)

GC

UA

AU

GC

CG

GC

GC

UA

GC

UA

UA

GC

CG

A

G

C

A

U

C

U

C

A

G

C

CPM 2012, Helsinki

A

C

A

G

A

C

U

slide14

Exact Pattern Matching Problem

In this work, we search for local common sequence-structure regions (patterns) between two given RNA molecules

Pattern

CPM 2012, Helsinki

slide15

Patterns in RNAs

In this work, we search for local common sequence-structure regions (patterns) between two given RNA molecules

CPM 2012, Helsinki

slide16

Exact Pattern Matching Problem

Finding all maximal common structure-sequence regions between two RNAs

Solved by Backofen & Siebert in O(n2) for fixed Nested x Nested Structures

G

A

A

C

C

U

C

A

G

G

C

U

U

U

C

C

U

A

A

single base match

left endpoint match

type mismatch

G

A

A

G

A

A

C

A

G

G

C

U

U

A

C

C

C

U

U

C

G

CPM 2012, Helsinki

slide17

Exact Pattern Matching Problem

In this work, we solve the problem for non-fixedNested x Nested Structures

arc breaking

G

A

A

C

C

U

C

A

G

G

C

U

U

U

C

C

U

A

A

G

A

A

G

A

A

C

A

G

G

C

U

U

A

C

C

C

U

U

C

G

CPM 2012, Helsinki

slide18

Arc Breaking Operation

  • We support the operation of arc-breaking, in which a base pair can be deleted, with no penalty

base pair

G

U

A

G

U

C

U

G

A

C

C

C

A

G

G

G

A

C

single bases

CPM 2012, Helsinki

slide19

Arc Breaking Operation

  • We support the operation of arc-breaking, in which a base pair can be deleted, with no penalty

base pair

A

G

C

U

C

C

C

U

A

G

A

G

G

G

U

A

G

C

single bases

CPM 2012, Helsinki

slide20

Arc Breaking

  • We support the operation of arc-breaking, in which a base pair can be deleted, with no penalty

GC

UA

U

AU

GC

CG

A

GC

GC

UA

GC

UA

UA

GC

CG

A

G

C

A

U

C

U

C

A

G

C

A

C

A

G

A

C

U

CPM 2012, Helsinki

slide21

Arc Breaking

Patterns are now less restricting:

CPM 2012, Helsinki

slide22

Exact Pattern Matching Algorithms

We describe three algorithms for finding the local exact pattern matching between two RNAs:

  • A simple O(n4) algorithm
    • (using ideas from Zhang & Shasha (‘89) )
  • An improved O(n3logn) algorithm
    • (using ideas from Klein (‘98) )
  • An O(n3) algorithm
    • (using ideas from Demaine, Weimann et al. (‘07) )

CPM 2012, Helsinki

slide23

Exact Pattern Matching Algorithm

Input: R1=(S1,B1) and R2=(S2,B2), |R1|=n, |R2|=m, n>m

Output: Local exact pattern matching between R1 and R2

R1:

R2:

CPM 2012, Helsinki

slide24

Exact Pattern Matching Algorithm

We compare each base pair from R1 with each base pair from R2, in increasingorder of their sizes

R1:

R2:

CPM 2012, Helsinki

slide25

Exact Pattern Matching Algorithm

For each two base pairs we compute the matching inside the base pairs, and the extensions to their outsides

…

…

…

…

CPM 2012, Helsinki

slide26

Matching Inside the Base Pairs

  • Dynamic programming algorithm
  • Similar to the LCS\Edit distance algorithms of strings

CPM 2012, Helsinki

slide27

Matching Inside the Base Pairs

On each comparison we compute only prefixes of the substrings and select the maximal score over 4 expressions :

Match base pairs

bp1

i

1

+

S1(i)==S2(j) ?

+

1

j

bp2

CPM 2012, Helsinki

slide28

Matching Inside the Base Pairs

Match single bases

bp1

1

i

S1(i)==S2(j) ?

1

j

bp2

CPM 2012, Helsinki

slide29

Matching Inside the Base Pairs

Delete from R1

Delete from R2

bp1

1

i-1

i

1

j

bp2

CPM 2012, Helsinki

slide30

Matching Inside the Base Pairs

On each comparison we compute the maximal match from left-to-right

…

…

C

A

A

G

U

A

G

C

U

A

U

A

U

G

C

C

G

A

C

1

i

j

1

…

…

C

G

A

C

A

A

G

C

U

U

A

U

A

U

A

U

A

U

G

C

C

CPM 2012, Helsinki

slide31

Matching Inside the Base Pairs

On each comparison we compute the maximal match from right-to-left

…

…

C

A

A

G

U

A

G

C

U

A

U

A

U

G

C

C

G

A

C

1

i

j

1

…

…

C

G

A

C

A

A

G

C

U

U

A

U

A

U

A

U

A

U

G

C

C

CPM 2012, Helsinki

slide32

Matching Inside the Base Pairs

  • There are two tricky parts here:
  • What happens when a mismatch occurs?

…

…

C

A

A

G

U

A

G

C

U

A

U

A

U

G

C

C

G

A

C

C

1

i

j

1

…

…

C

G

A

C

A

A

G

C

U

U

A

U

A

U

A

U

A

U

G

C

C

G

CPM 2012, Helsinki

slide33

Matching Inside the Base Pairs

  • There are two tricky parts here:
  • What happens when the matchings overlap?

…

…

C

A

A

G

U

A

G

C

U

A

U

A

U

G

C

C

G

A

C

1

i

j

1

…

…

C

G

A

C

A

A

G

C

U

U

A

U

A

U

A

U

A

U

G

C

C

CPM 2012, Helsinki

slide34

Matching Inside the Base Pairs

The solution: on each comparison we compute the best score going from both right-to-left and left-to-right

…

…

C

A

A

G

U

A

G

C

U

A

U

A

U

G

C

C

G

A

C

1

i

j

1

…

…

C

G

A

C

A

A

G

C

U

U

A

U

A

U

A

U

A

U

G

C

C

CPM 2012, Helsinki

slide35

Time Complexity

  • We only compare prefixes of the base pairs
  • There are O(n2) prefixes for each RNA
  • Each comparison is computed in O(1) time
  • The total time is O(n4)

CPM 2012, Helsinki

slide36

Extending the Match

We compute the maximal pattern extension for all bases in R1 and all bases in R2 in one run.

The time complexity: O(n2)

R1:

…

n

i

j

m

…

R2:

CPM 2012, Helsinki

slide37

Total Time Complexity

Computing the pattern match inside all base pairs is done in O(n4)

Computing the pattern match extensions to the right and to the left is done in O(n2)

The total time complexity is O(n4)

+

=

CPM 2012, Helsinki

slide38

An O(n3logn)Algorithm

We use Klein’s Tree Edit Distance (‘98) ideas:we decompose the largest RNA into heavy paths:

The root base pair is marked light, and continue recursively:

Select the maximal child base pair and mark it as heavy,

mark the rest of the children as light

C

C

G

A

A

U

C

C

G

A

G

U

U

C

G

G

G

U

C

C

C

A

G

G

CPM 2012, Helsinki

slide39

Special Substrings

For each base pair we define its specialsubstrings

bp

The no. of special substrings of a base pair is:

|bp| - |hp| + 1

hp

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

a

x

y

b

U

C

G

G

G

U

C

C

C

A

Lemma (Sleator & Tarjan ‘83):

There are O(nlog n) special substring in R of size n

U

U

C

G

G

G

U

C

C

C

A

U

U

C

C

G

G

G

U

C

C

C

A

U

U

C

C

A

G

G

G

U

C

C

C

A

C

U

U

C

C

A

G

G

G

U

C

C

C

A

A

C

U

U

C

G

G

G

U

C

C

C

A

C

G

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

CPM 2012, Helsinki

slide40

An O(n3logn)Algorithm

We compare all O(n2) substrings of R2 with

O(nlogn)specialsubstrings of R1

bp

hp

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

a

x

y

b

U

C

G

G

G

U

C

C

C

A

U

U

C

G

G

G

U

C

C

C

A

U

U

C

C

G

G

G

U

C

C

C

A

U

U

C

C

A

G

G

G

U

C

C

C

A

C

U

U

C

C

A

G

G

G

U

C

C

C

A

A

C

U

U

C

G

G

G

U

C

C

C

A

C

G

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

CPM 2012, Helsinki

slide41

An O(n3logn)Algorithm

The comparisons are made between the rightmost or leftmost bases, according to the special substring

bp

hp

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

a

x

y

b

U

C

G

G

G

U

C

C

C

A

U

U

C

G

G

G

U

C

C

C

A

U

U

C

C

G

G

G

U

C

C

C

A

U

U

C

C

A

G

G

G

U

C

C

C

A

C

U

U

C

C

A

G

G

G

U

C

C

C

A

A

C

U

U

C

G

G

G

U

C

C

C

A

C

G

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

CPM 2012, Helsinki

slide42

An O(n3logn)Algorithm

The total number of compared substrings is O(n3logn), each one computed in O(1) time, which gives a total of O(n3logn) running time.

bp

hp

This algorithm works for Nested x Bounded-Unlimited structures also.

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

a

x

y

b

U

C

G

G

G

U

C

C

C

A

U

U

C

G

G

G

U

C

C

C

A

U

U

C

C

G

G

G

U

C

C

C

A

U

U

C

C

A

G

G

G

U

C

C

C

A

C

U

U

C

C

A

G

G

G

U

C

C

C

A

A

C

U

U

C

G

G

G

U

C

C

C

A

C

G

U

U

C

C

A

C

G

G

G

U

C

C

C

A

G

G

CPM 2012, Helsinki

slide43

An O(n3)Algorithm

Based on Demaine et al. (‘07) algorithm we decompose both RNAs to heavy paths,

the special substrings are decided on each base pairs comparison: the base pair that has the largest root light base pair, is the dominant one

1

R1:

4

2

3

6

8

5

9

7

C

C

G

A

A

U

C

C

G

A

G

U

U

C

G

G

G

U

C

C

C

A

G

G

A

R2:

D

C

B

F

E

C

C

U

A

C

U

C

U

G

C

C

U

U

G

C

U

U

G

C

A

G

A

CPM 2012, Helsinki

slide44

An O(n3)Algorithm

The number of compared substrings is O(n3)

This algorithm can work with Nested X Nested structures only

R1:

1

4

2

6

8

3

5

9

7

C

C

G

A

A

U

C

C

G

A

G

U

U

C

G

G

G

U

C

C

C

A

G

G

R2:

A

D

C

B

E

F

C

C

U

A

C

U

C

U

G

C

C

U

U

G

C

U

U

G

C

A

G

G

CPM 2012, Helsinki

slide45

More Algorithms

  • Find the local approximate pattern matching between Nested x Nested structures in O(n3k2)
  • for k allowed mismatches
  • Find the local approximate pattern matching between Nested x Bounded-Unlimited structures in O(n3k2logn) for k allowed mismatches
  • Find the most similar sibling substructures between Nested x Nested structures in O(n3)

CPM 2012, Helsinki

slide46

T

H

A

N

K

Y

O

U

!

ad