GENERATION OF DNA CODES. Vladimir V. Ufimtsev. Adviser: Dr. V. Rykov. Historical Background. 1948. A Mathematical Theory of Communication C.E. Shannon. Main result: Entropy function - average value of information obtained from a channel. 1950.
Vladimir V. Ufimtsev
Adviser: Dr. V. Rykov
A Mathematical Theory of Communication C.E. Shannon
Main result: Entropy function - average value of information obtained from a channel.
Error Detecting and Error Correcting Codes
Main result: Matrices that can be used to encode messages and provide more reliable transmission across a channel.
A structure for Deoxyribose Nucleic Acid
J. D. WATSON, F. H. C. CRICK,M. H. F. Wilkins, R. E. Franklin,
Main result: Structure found for the building block of life.
There’s Plenty of Room at the Bottom
Main result: Anticipated Science at the nanoscale ( meters).
Let denote a set consisting of all vectors (codewords) of length n built over
Let such that:
Let be such that:
is referred to as a Code of length n, size M, and minimum distance d.
A sphere in centered at x having radius d:
Volume of the sphere around x, of radius d:
A space is HOMOGENEOUS when the volume of a sphere does not depend on where it is centered i.e.
A space is NON - HOMOGENEOUS when the volume of a sphere does depend on where it is centered.
For any code there are 3 conflicting parameters;
Minimum distance: d
The aim of coding theory is:
Given any 2 parameters, find the optimal value for the
3rd. We need small n for fast transmission, large M for
as much information as possible to be encoded and large d so that we can detect and correct many errors.
Exact formulas for sphere volumes and code sizes are extremely difficult to obtain sometimes. In most cases only upper and lower bounds can be obtained for these parameters.
We will be working in a NON-HOMOGENEOUS space making the obtainment of exact formulas for sphere volumes and code sizes VERY HARD.
Hamming Upper Bound on Code Size in with any metric:
Varshamov-Gilbert Lower Bound on Code Size in with any metric:
Let G be a simple graph on vertices and e edges. G contains an M-clique if:
Then there exists a code of size M.
Hence there exists a code of size M and so:
The rules of base pairing
always pairs with
always pairs with
Orientation of single DNA strands is important for hybridization.
Interest into DNA computing was sparked in 1994 by Len Adleman.
Adleman showed how we can use DNA molecules to solve a mathematical problem. (Hamiltonian path problem).
DNA computing relies on the fact that DNA strands can be represented as sequences of bases (4-ary sequences) and the property of hybridization.
In Hybridization, errors can occur. Thus, error-correcting codes are required for efficient synthesis of DNA strands to be used in computing.
is a subsequence of
if and only if there exists a strictly increasing sequence of indices:
is defined to be the set of longest common subsequences of
is defined to be the length of the longest common subsequenceof
Z = ( T C G T ) - subsequence of X
Y = ( T G C A T A )
( T C A T )– L (X,Y)
LCS(X ,Y) = 4
Original Insertion-Deletion metric (Levenshtein 1966):
This metric results from the number of deletions and insertions that need to be made to obtain ‘ y ’from ‘ x ’.
For vectors that have the same length:
the number of deletions that will be made is:
likewise, the number of insertions that will be made is:
A common subsequence is called a common stacked pair subsequence of length between x and y if two elements , are consecutive inx and consecutive inyor if they are non -consecutive in xand ornon-consecutiveiny, then and are consecutive in xandy.
Let , denote the length of the longest sequence occurring as a common stacked pair subsequence subsequence zbetween sequences x and y. The number , is called a similarityof blocks between xand y.The metric is defined to be
The upper bound for the average sphere volume in this metric will be:
The Varshamov-Gilbert bound becomes:
Thermodynamic weight of virtual stacked pairs.
Minimum distance (d) = 6
Minimum distance (d) = 7
Minimum distance (d)= 8
Minimum distance (d)= 9
Minimum distance (d)= 10