slide1
Download
Skip this Video
Download Presentation
Vladimir V. Ufimtsev

Loading in 2 Seconds...

play fullscreen
1 / 27

Vladimir V. Ufimtsev - PowerPoint PPT Presentation


  • 62 Views
  • Uploaded on

GENERATION OF DNA CODES. Vladimir V. Ufimtsev. Adviser: Dr. V. Rykov. Historical Background. 1948. A Mathematical Theory of Communication C.E. Shannon. Main result: Entropy function - average value of information obtained from a channel. 1950.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Vladimir V. Ufimtsev' - orea


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
slide1

GENERATION OF DNA CODES

Vladimir V. Ufimtsev

Adviser: Dr. V. Rykov

slide2

Historical Background

1948

A Mathematical Theory of Communication C.E. Shannon

Main result: Entropy function - average value of information obtained from a channel.

1950

Error Detecting and Error Correcting Codes

R.W. Hamming

Main result: Matrices that can be used to encode messages and provide more reliable transmission across a channel.

1953

A structure for Deoxyribose Nucleic Acid

J. D. WATSON, F. H. C. CRICK,M. H. F. Wilkins, R. E. Franklin,

Main result: Structure found for the building block of life.

There’s Plenty of Room at the Bottom

R.P. Feynman

1959

Main result: Anticipated Science at the nanoscale ( meters).

slide3

Basic Coding Theory

Let denote a set consisting of all vectors (codewords) of length n built over

i.e.

Let such that:

1)

2)

3)

Let be such that:

is referred to as a Code of length n, size M, and minimum distance d.

slide4

Spheres

A sphere in centered at x having radius d:

Volume of the sphere around x, of radius d:

Spaces

A space is HOMOGENEOUS when the volume of a sphere does not depend on where it is centered i.e.

A space is NON - HOMOGENEOUS when the volume of a sphere does depend on where it is centered.

slide5

The Main Coding Theory Problem

For any code there are 3 conflicting parameters;

Length: n

Size: M

Minimum distance: d

The aim of coding theory is:

Given any 2 parameters, find the optimal value for the

3rd. We need small n for fast transmission, large M for

as much information as possible to be encoded and large d so that we can detect and correct many errors.

slide6

Bounds in Coding Theory

Exact formulas for sphere volumes and code sizes are extremely difficult to obtain sometimes. In most cases only upper and lower bounds can be obtained for these parameters.

We will be working in a NON-HOMOGENEOUS space making the obtainment of exact formulas for sphere volumes and code sizes VERY HARD.

Hamming Upper Bound on Code Size in with any metric:

Varshamov-Gilbert Lower Bound on Code Size in with any metric:

slide7

Turan\'s Theorem

Let G be a simple graph on vertices and e edges. G contains an M-clique if:

CLIQUES:

slide8

From Turan to Varshamov-Gilbert

If:

Then there exists a code of size M.

slide9

Let

Then:

Hence there exists a code of size M and so:

slide10

DNA Structure

The rules of base pairing

(nucleotide paring):

  • A - T: adenine (A)

always pairs with

thymine (T)

  • C - G: cytosine (C)

always pairs with

guanine (G)

slide11

Watson-Crick complements

  • Each base has a bonding surface
  • Bonding surface of A is complementary to that of T (2 bonds)
  • Bonding surface of G is complementary to that of C (3 bonds)
  • Hybridizationis a process that joins two complementary opposite polarity single strands into a double strand through hydrogen bonds.
slide13

Types of Hybridization

Direct

Shifted

Folded

Loop

slide14

DNA Computing

Interest into DNA computing was sparked in 1994 by Len Adleman.

Adleman showed how we can use DNA molecules to solve a mathematical problem. (Hamiltonian path problem).

DNA computing relies on the fact that DNA strands can be represented as sequences of bases (4-ary sequences) and the property of hybridization.

In Hybridization, errors can occur. Thus, error-correcting codes are required for efficient synthesis of DNA strands to be used in computing.

slide15

Similarity

Sequence

is a subsequence of

if and only if there exists a strictly increasing sequence of indices:

Such that:

is defined to be the set of longest common subsequences of

and

is defined to be the length of the longest common subsequenceof

and

slide16

Example of LCS

  • X = ( A T C T G A T )

Z = ( T C G T ) - subsequence of X

  • X = ( A T C T G A T )

Y = ( T G C A T A )

( T C A T )– L (X,Y)

LCS(X ,Y) = 4

slide17

Insertion-Deletion Metric

Original Insertion-Deletion metric (Levenshtein 1966):

This metric results from the number of deletions and insertions that need to be made to obtain ‘ y ’from ‘ x ’.

For vectors that have the same length:

the number of deletions that will be made is:

likewise, the number of insertions that will be made is:

slide18

Longest Common Stacked Pair Subsequence

A common subsequence is called a common stacked pair subsequence of length between x and y if two elements , are consecutive inx and consecutive inyor if they are non -consecutive in xand ornon-consecutiveiny, then and are consecutive in xandy.

Let , denote the length of the longest sequence occurring as a common stacked pair subsequence subsequence zbetween sequences x and y. The number , is called a similarityof blocks between xand y.The metric is defined to be

slide19

Stacked Pair Metric Bounds

The upper bound for the average sphere volume in this metric will be:

The Varshamov-Gilbert bound becomes:

slide20

Insertion-deletion stacked pair

thermodynamic metric

Thermodynamic weight of virtual stacked pairs.

  • Can use statistical estimation of sphere volume.
slide21

Sense of Direction

  • There are many possibilities for metrics on the space of DNA sequences.
  • All discussed metrics are non-homogeneous i.e. the sizes of the spheres in the metric spaces depend on the location of their centers.
  • A universal method that will allow us to calculate lower bounds for optimal code sizes was given.
slide22

Bounds for Stacked Pair Metric

Minimum distance (d) = 6

slide23

Bounds for Stacked Pair Metric

Minimum distance (d) = 7

slide24

Bounds for Stacked Pair Metric

Minimum distance (d)= 8

slide25

Bounds for Stacked Pair Metric

Minimum distance (d)= 9

slide26

Bounds for Stacked Pair Metric

Minimum distance (d)= 10

ad