slide1 l.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
THE RNA DETECTIVE GAME: FINDING RNA CHAINS FROM FRAGMENTS PowerPoint Presentation
Download Presentation
THE RNA DETECTIVE GAME: FINDING RNA CHAINS FROM FRAGMENTS

Loading in 2 Seconds...

play fullscreen
1 / 65

THE RNA DETECTIVE GAME: FINDING RNA CHAINS FROM FRAGMENTS - PowerPoint PPT Presentation


  • 105 Views
  • Uploaded on

THE RNA DETECTIVE GAME: FINDING RNA CHAINS FROM FRAGMENTS. RNA. Detective. Fred Roberts, Rutgers University. DNA and RNA. Deoxyribonucleic acid, DNA, is the basic building block of inheritance. DNA can be thought of as a chain consisting of bases.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about 'THE RNA DETECTIVE GAME: FINDING RNA CHAINS FROM FRAGMENTS' - karli


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

THE RNA DETECTIVE GAME:

FINDING RNA CHAINS FROM FRAGMENTS

RNA

Detective

Fred Roberts, Rutgers University

slide2

DNA and RNA

Deoxyribonucleic acid, DNA, is the basic building block of inheritance.

DNA can be thought of as a chain consisting of bases.

Each base is one of four possible chemicals:

Thymine (T), Cytosine (C), Adenine (A), Guanine (G)

slide3

DNA and RNA

Some DNA chains:

GGATCCTGG, TTCGCAAAAAGAATC

Real DNA chains are long:

Algae (P. salina): 6.6x105 bases long

Slime mold (D. discoideum): 5.4x107 bases long

slide4

DNA and RNA

Insect (D. melanogaster – fruit fly): 1.4x108 bases long

Bird (G. domesticus): 1.2x109 bases long

slide5

DNA and RNA

Human (H. sapiens): 3.3x109 bases long

The sequence of bases in DNA encodes certain genetic information.

In particular, it determines long chains of amino acids known as proteins.

slide6

DNA and RNA

How many possible DNA chains are there in humans?

slide7

Aside: Counting

Fundamental methods of combinatorics are important in mathematical biology.

slide8

The Product Rule

How many sequences of 0’s and 1’s are there of length 2?

There are 2 ways to choose the first digit and no matter how we choose the first digit, there are two ways to choose the second digit.

Thus, there are 2x2 = 22 = 4 ways to choose the sequence.

00, 01, 10, 11

How many sequences are there of length 3?

By similar reasoning: 2x2x2 = 23.

slide9

The Product Rule

Is this interesting?

slide11

The Product Rule

Really boring!

slide12

The Product Rule

Counting may be boring at times, but we will see that it can be really powerful.

slide13

The Product Rule

Product Rule: If something can happen in n1 ways and no matter how the first thing happens, a second thing can happen in n2 ways, then the two things together can happen in n1 x n2 ways.

More generally, if something can happen in n1 ways and no matter how the first thing happens, a second thing can happen in n2 ways, and no matter how the first two things happen a third thing can happen in n3 ways, … then all the things together can happen in n1 x n2 x n3 x … ways.

slide14

DNA and RNA

How many possible DNA chains are there in humans?

How many DNA chains are there with two bases?

Answer (Product Rule): 4x4 = 42 = 16.

There are 4 choices for the first base and, for each such choice, 4 choices for the second base.

How many with 3 bases?

How many with n bases?

slide15

DNA and RNA

How many with 3 bases? 43 = 64

How many with n bases? 4n

How many human DNA chains are possible?

4^(3.3x109)

This is greater than 10^(1.98x109)

(1 followed by 198 million zeroes!)

slide16

DNA and RNA

RNA is a “messenger molecule” whose links are defined from DNA.

An RNA chain has at each link one of four bases.

The possible bases are the same as those in DNA except that the base Uracil (U) replaces the base Thymine (T).

slide17

The RNA Detective Game

Sample RNA chains:

GGCAUUGGA, UAUAUGCGGCUUC

RNA chains are very long.

Can we discover what they look

like without actually

observing them?

Trick: Use enzymes.

slide18

The RNA Detective Game

Some enzymes break up an RNA chain into fragments after each G link.

Some enzymes break up the chain after each C or U link.

Consider the chain

CCGGUCCGAAAG

Applying the G enzyme breaks the chain into the following fragments:

G fragments: CCG, G, UCCG, AAAG

We know that these are the fragments, but we do not know the order in which they appear.

How many possible chains have these four fragments?

slide19

The RNA Detective Game

Chain: CCGGUCCGAAAG

G fragments: CCG, G, UCCG, AAAG

Product rule again: 4 choices for the first fragment, for each such choice 3 choices for second fragment, …

There are 4x3x2x1 = 4! = 24 possible chains.

One chain corresponding to each permutation of these four fragments.

One such chain different from the original:

UCCGGCCGAAAG

slide20

The RNA Detective Game

Chain: CCGGUCCGAAAG

Suppose we instead apply the U,C enzyme.

We get the following fragments:

U,C fragments: C, C, GGU, C, C, GAAAG

How many chains are there with these fragments?

Is 6! = 720 the correct answer???

Two of the permutations are the one that takes the fragments in the order given and the one that takes the second fragment first and the first second and all others in this order.

They give rise to the same chain.

slide21

The RNA Detective Game

So 6! is wrong.

What is the answer??

What if the fragments were

C, C, C, C, C

There are 5! permutations of these fragments, but only one RNA chain with these fragments:

CCCCC

slide23

Multinomial Coefficients

Putting n distinguishable balls into k distinguishable boxes:

The number of ways to put n1 balls into the first box,

n2 balls into the second box, …, nk balls into the kth

box is denoted by C(n;n1,n2,…,nk), where

n = n1 + n2 + … nk.

slide24

Multinomial Coefficients

Theorem: C(n;n1,n2,…,nk) = n!/n1!n2!...nk!

Example: How many RNA chains of length 6 have 3 C’s and 3 A’s?

Think of 2 boxes, a C box and an A box. How many ways are there to put 3 positions (balls) into the C box and 3 into the A box?

Answer: C(6;3,3) = 6!/3!3! = 20.

Some of these are: CACACA, ACACAC, AAACCC.

slide25

Multinomial Coefficients

If a 6-link RNA chain is chosen at random, what is the probability of obtaining one with 3 C’s and 3 A’s?

Answer: There are 46 possible RNA chains of length 6.

The probability is therefore

C(6;3,3)/46 = 20/4096  .005.

slide26

Multinomial Coefficients

The number of 10-link RNA chains consisting of 3 A’s, 2 C’s, 2 U’s, and 3 G’s is

C(10;3,2,2,3) = 25,200

What if we know they end in AAG?

Then, only the first 7 positions need to be filled, and 2 A’s and one G are already used up. Hence, the answer is

C(7;1,2,2,2) = 630

Notice how knowing the end of a chain can dramatically reduce the number of possible chains.

slide28

The RNA Detective Game

Recall that we have the following U,C fragments:

C, C, GGU, C, C, GAAAG

The number of RNA chains with these fragments is not 6! = 720.

Think of having 6 positions (there are 6 fragments) and assigning 4 positions to the C box, 1 to the GGU box, and one to the GAAAG box.

Then the number of ways of doing this is

C(6;4,1,1) = 6!/4!1!1! = 30

slide29

The RNA Detective Game

U,C fragments: C, C, GGU, C, C, GAAAG

Actually, this computation is still a bit off, though not because the combinatorial argument is wrong.

Notice that the fragment GAAAG does not end in U or C.

Thus, we know it comes last.

There are 5 remaining U,C fragments.

The number of chains beginning with these 5 fragments is given by

C(5;4,1) = 5

Beginning of the chains: CCCCGGU, CCCGGUC, CCGGUCC, CGGUCCC, GGUCCCC

slide30

The RNA Detective Game

We get all chains with the given U,C fragments by adding GAAAG to the end of each of these:

CCCCGGUGAAAG

CCCGGUCGAAAG

CCGGUCCGAAAG

CGGUCCCGAAAG

GGUCCCCGAAAG

slide31

The RNA Detective Game

Thus, there are 24 possible chains with the given G fragments and 5 with the possible U,C fragments.

But: We have not yet combined our knowledge of both G and U,C fragments.

G fragments: CCG, G, UCCG, AAAG

U,C fragments: C, C, GGU, C, C, GAAAG

Which of the 5 chains with these U,C fragments has the right G fragments?

slide32

The RNA Detective Game

G fragments: CCG, G, UCCG, AAAG

U,C fragments: C, C, GGU, C, C, GAAAG

Which of the 5 chains with these U,C fragments has the right G fragments?

CCCCGGUGAAAG

CCCGGUCGAAAG

CCGGUCCGAAAG

CGGUCCCGAAAG

GGUCCCCGAAAG

CCCCGGUGAAAG does not: It has CCCCG as a G fragment.

What about the others?

slide33

The RNA Detective Game

Checking the remaining 4 possible RNA chains with the given U,C fragments shows that only the third one,

CCGGUCCGAAAG

has the given G fragments.

Hence, we have recovered the initial chain.

This is an example of recovery of an RNA chain given a complete digest by enzymes.

How remarkable is it that we could recover the initial RNA chain this way?

slide34

The RNA Detective Game

CCGGUCCGAAAG

How many RNA chains are there with the same bases as this chain?

There are 12 bases: 4 C’s, 4 G’s, 3 A’s, and 1 U.

The number of chains with these bases is given by C(12;4,4,3,1) = 138,600

Thus, knowing the number of bases is not nearly as useful as knowing the fragments.

slide35

The RNA Detective Game

Another example.

G fragments: UG, ACG, AC

U,C fragments: U, GAC, GAC

Step 1: Does any fragment have to come last?

slide36

The RNA Detective Game

G fragments: UG, ACG, AC

U,C fragments: U, GAC, GAC

Step 1: Does any fragment have to come last?

None of the U,C fragments has to come last.

However, the G fragment AC has to come last.

Thus, the other two G fragments come first in some order and there are only two possible RNA chains with these G fragments: UGACGAC, ACGUGAC

slide37

The RNA Detective Game

G fragments: UG, ACG, AC

U,C fragments: U, GAC, GAC

There are only two possible RNA chains with these G fragments: UGACGAC, ACGUGAC

The latter has AC as a U,C fragment. So, the former is the correct chain.

slide38

The RNA Detective Game

Is it always possible to completely recover the original RNA chain given its G fragments and U,C fragments?

RNA

slide39

The RNA Detective Game

Is it always possible to completely recover the original RNA chain given its G fragments and U,C fragments?

No: sometimes the solution is ambiguous.

Exercise: Find two RNA chains with the same G and U,C fragments.

slide40

Eulerian Paths

Surprisingly, eulerian paths in multidigraphs can be used to help with the RNA detective game.

When a digraph is allowed to have more than one arc from vertex x to vertex y, we call it a multidigraph.

A path in a multidigraph is called eulerian if it uses every arc once and only once. (Recall the Konigsberg Bridge Problem.)

A closed path (one that ends where it starts) is eulerian if it is eulerian as a path.

slide41

Eulerian Paths

d

a

c

b

e

eulerian closed path: a, b, c, d, b, e, a

slide42

Eulerian Paths

d

a

c

b

e

eulerian path: a, b, c, d, b, e

slide43

Eulerian Paths

When does a multidigraph have an eulerian path or closed path?

Theorem (I.J. Good, 1946): A connected multidigraph has an eulerian closed path iff for every vertex, the indegree (number of incoming arcs) equals the outdegree (number of outgoing arcs).

Theorem (I.J. Good, 1946): A connected multidigraph has an eulerian path iff for all vertices with the possible exception of two, indegree equals outdegree, and for at most two vertices, indegree and outdegree differ by one.

slide45

Eulerian Paths

Note that these theorems hold if there are loops from a vertex to itself.

A loop adds 1 to indegree and 1 to outdegree.

Thus, loops do not affect the existence of eulerian paths or closed paths.

slide46

Eulerian Paths and the RNA Detective Game

Assume that there are at least two G fragments and at least two U,C fragments. Otherwise, we can recover the original chain.

Example:

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

slide47

Eulerian Paths and the RNA Detective Game

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

Step 1: Break down each fragment after each G, U, or C.

E.g.: GAAAGAA becomes GxAAAGxAA

GGU becomes GxGxU

UCACG becomes UxCxACxG

Each piece is called an extended base.

All extended bases in a fragment except first and last are called interior extended bases.

slide48

Eulerian Paths and the RNA Detective Game

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

Step 2: Use the extended base breakup of fragments to find the beginning and end of the RNA chain.

Start by making two lists

All interior extended bases of all fragments:

C, C, AC, G, AAAG

Fragments with one extended base:

G, AAAG, AA, C, C, C, AC

slide49

Eulerian Paths and the RNA Detective Game

All interior extended bases of all fragments:

C, C, AC, G, AAAG

Fragments with one extended base:

G, AAAG, AA, C, C, C, AC

Theorem: Every entry on the first list is on the second list. There are always exactly two entries on the second list not on the first. One of these is the first extended base of the entire RNA chain and the other is the last.

Thus: chain begins in AA or C and ends in AA or C.

How do you tell how it ends?

slide50

Eulerian Paths and the RNA Detective Game

Thus: chain begins in AA or C and ends in AA or C.

How do you tell how it ends?

One of these must be from an abnormal fragment: a G fragment that doesn’t end in G or a U,C fragment that doesn’t end in U or C.

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

AA is such an abnormal fragment.

An abnormal fragment marks the end of the chain.

So: chain ends in AA and begins in C.

slide51

Eulerian Paths and the RNA Detective Game

Step 3: Build a multidigraph.

First, identify all normal fragments with more than one extended base. From each such fragment, use the first and last extended bases as vertices and draw an arc from the first to the last.

Label the arc with the corresponding fragment.

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

Fragment UCACG gives rise to vertices U and G and we include an arc from U to G labeled UCACG.

slide53

Eulerian Paths and the RNA Detective Game

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

Fragment CCG means that we include an arc from C to G labeled CCG.

Fragment GGU means that we include an arc from G to U labeled GGU.

slide55

Eulerian Paths and the RNA Detective Game

There might be several arcs from a given extended base to another if there are several normal fragments from the first to the second. That is why we get a multidigraph.

Step 4: We add one additional arc.

Identify the longest abnormal fragment.

Include an arc from the first (and perhaps only) extended base in this fragment to the first extended base in the chain.

Label this as X*Y where X is the longest abnormal fragment in the chain and Y is first extended base in the chain.

slide56

Eulerian Paths and the RNA Detective Game

G fragments: CCG, G, UCACG, AAAG, AA

U,C fragments: C, C, GGU, C, AC, GAAAGAA

GAAAGAA is the longest abnormal fragment.

Put in an arc from G (first extended base in this fragment) to C (first extended base in the chain).

Label the arc as GAAAGAA*C

slide58

Eulerian Paths and the RNA Detective Game

Theorem: This multidigraph has an eulerian closed path. The RNA chains with the given G and U,C fragments correspond to eulerian closed paths that end with the special arc X*Y.

In our example, it is easy to check it has an eulerian closed path. (Use I.J. Good’s Theorem.)

slide59

Eulerian Paths and the RNA Detective Game

GAAAGAA*C

GGU

G

C

U

UCACG

CCG

The only eulerian closed path that ends in GAAAGAA*C goes from C to G to U to G to C.

slide60

Eulerian Paths and the RNA Detective Game

GAAAGAA*C

GGU

G

C

U

UCACG

CCG

Step 5 : Use the corresponding labeling of arcs to obtain the chain:

CCGGUCACGAAAGAA

It is easy to check this has the right G and U,C fragments.

slide61

The RNA Detective Game: Concluding Comments

The “fragmentation stratagem” we have described was used by R.W. Holley and his colleagues at Cornell in 1965 to determine the first nucleic acid sequence.

The method is not used anymore and was only used for a short time before other, more efficient methods were adopted.

However, it has great historical significance and illustrates an important role for mathematical methods in biology.

slide62

The RNA Detective Game: Concluding Comments

Nowadays, by use of radioactive marking and high-speed computer analysis, it is possible to sequence long RNA and DNA chains rather quickly.

slide63

The RNA Detective Game: Concluding Comments

The mathematical power of the fragmentation stratagem, nevertheless, is a good illustration of the use of methods of discrete mathematics in modern molecular biology.

slide64

The RNA Detective Game: Concluding Comments

And of the power of counting!