Encoding Information for DNA computing - PowerPoint PPT Presentation

Encoding information for dna computing
1 / 45

  • Uploaded on
  • Presentation posted in: General

Encoding Information for DNA computing. Shinnosuke Seki. Purpose. What’s an advantage of encoding? To make a “ good ” or tractable code set for DNA computing. Development of polynomial-time algorithms which decide whether a given code set is “good” or “bad”. Claude Elwood Shannon.

I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.

Download Presentation

Encoding Information for DNA computing

An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Presentation Transcript

Encoding information for dna computing

Encoding Information for DNA computing

Shinnosuke Seki



  • What’s an advantage of encoding?

  • To make a “good” or tractable code set for DNA computing.

  • Development of polynomial-time algorithms which decide whether a given code set is “good” or “bad”.

Claude elwood shannon

Claude Elwood Shannon

  • The father of information theory (Shannon’s entropy)

  • Boolean algebra with binary arithmetic makes it possible to simplify electromechanical relays

  • In “A mathematical theory of communication” [Shannon48],he showed that we can send error-free information even on noisy channel.

  • Chess program using minimax evaluation procedure

  • etc. …

Shannon s information channel

Shannon’s information channel

Positive Noise

  • R > Coverflow

  • R ≤ CWe can make the error rate as small as possible.

  • To attain R = C in the noisy channel, we need to find a ‘good’ code.

capacity C





Information flow R

Negative Noise

Biological perspective

Biological perspective

  • A biological reaction can be described in terms of information channel model.

    • example The case of heredity

  • For billions of years, Mother Nature has developed wonderful code system?

  • Biology -> Computer Science

Natural Selection







Review in vitro dna computing

Review:in vitro DNA computing

  • Encode a given problem into single or double-stranded DNAs (ssDNAs, dsDNAs)

  • Computation by a succession of bio-operations.

  • Decode the resulting solution and extract its output.

Review wk complementarity





5’ - A T C G G T C A A C T G C C C T A A T G  3’

3’  T A G C C A G T T G A C G G G A T T A C - 5’

Review: WK-complementarity

  • Hydrogen bonds

  • Two strands which are

    • complementary to each other

    • with opposite directions

      can form a (complete) dsDNA.

  • Example

Adleman s first trial

Adleman’s first trial

  • Find a solution of Hamiltonian path problem in a solution in polynomial time order of the input graph.

  • The solution is filled with encoding oligonucleotides.














1 -> 2

2 -> 3

3 -> 4



What s a good code set

What’s a good code set?

  • Each code word (oligonucleotide) shouldn’t form any undesirable structure.

  • This may make itself inert.

  • Code words don’t interact with each other in an undesirable way.

  • Structure formation is due to

    • WK-complementarity

    • Gibbs free energy







What s a good code set cont

What’s a good code set? (cont.)

  • Uniform melting temperature

  • Preventing undesirable hybridizations

  • Other constraints

    • Avoiding repeated bases

    • Forbidden subsequences

      • Using a restriction enzyme, its corresponding recognition site should appear only in intended sites

    • Using only 3 types of nucleotides A, C, T

Melting temperature

Melting temperature

  • Melting temperature Tm of a dsDNA is

    • the temperature at which half of the dsDNAs is denatured.

    • The higher Tm is, the more stable the dsDNA is.

      • R: gas constant,

      • Ct: total oligo concentration,

      • ΔH & ΔS : enthalpy & entropy

      • α: 1 for self-complementary and 4 for non-self

Melting temperature cont

Melting temperature (cont.)

  • Uniform melting temperature

    • To uniform Tm can eliminate a bias of hybridization.

  • GC content

    • The ratio of the # of G’s and C’s over the total # of nucleotides in a sequence

    • G-C pair is more stable than A-T pair.

    • Higher GC content implies higher Tm.

    • Sequences are designed with 50% GC content.

Gibbs free energy g

Gibbs free energy (ΔG)

  • A well-known indicator of stability for DNA structures

    • A structure with lower ΔG is more stable.

    • The ΔG of entire structure is the sum of ΔG of each substructures [ZuSt81].

Nearest neighborhood method

Nearest-neighborhood method

Refer to [AlSa97], [TKY04] ([8], [9] in this table)

Secondary structures look like

Secondary structures look like…

Template method arko02

Template method[ArKo02]

  • Prepare 2 bit sequences, each of which has some desirable property

    • (e.g., 50%-GC content, error-correction).

  • Using convert rule, from these 2 sequences, we construct a sequence.

Template method cont

Template method (cont.)

  • Design criteria

    • Template

      • An element x should have at least d-mismatches with xR, xx, xR xR, xxR, xRx.

      • An exhaustive search to find a good template

    • Map (error-correcting code)

      • A code whose words have at least k-mismatches.

      • e.g. BCH code

  • Drawback

    • It cannot prevent sequences from forming secondary structures.

Ag templates gc templates kka03


Template contains the same # of 0’s and 1’s (50% GC-content)

Map is an error correcting code.


Map is constant weight codes (50% GC-content)

Results in the bigger set of sequences

AG-templates, GC-templates[KKA03]

Other approaches

Other approaches

  • DNASequenceGenerator[FBR00]

    • A software with GUI

    • Create a sequence with melting temperature, GC-content, no palindromes, start codons, nor restriction sites.

Other approaches1

Other approaches

  • Suyama’s approach[YoSu00]

    • To generate sequences randomly, add it into a sequence set iff it satisfies all of the following constraints:

      • Uniform melting temperature

      • No mis-hybridization

      • No formation of stable secondary structure

    • Drawback is to fall into local optima easily.

Other approaches2

Other approaches

  • Hybrid randomized neighborhoods[TuHo03]

    • Stochastic local search (SLS) algorithm

    • Searches neighbors by mutating current best sequences randomly with a probability ε.

    • It moves to the direction where the # of constraint conflicts is maximally decreased with a probability 1-ε.

Other approaches3

Other approaches

  • GA (genetic algorithm)-based approach[ANH00]

    • Use GAs to evaluate fitness of solutions

    • As criteria

      • Restriction sites

      • GC-content

      • Hamming distance

      • Same base repetition

Other approaches4

Other approaches

  • Gibbs free energy base approach [TKY05], [KNO08]

    • Taking thermodynamics into consideration

    • Gibbs free energy as a stability measure

    • Advantage

      • Greater accuracy because it takes into account stability of loops or stacking between base-pairs

    • Disadvantage

      • More computational time to calculate free energy

    • How to decrease this computational complexity?

A formal language approach

A formal language approach

  • Design a set of structure-free codes in terms of WK-complementary.

  • Advantage

    • More reliable codes than Free-energy approach

    • More efficient algorithm for decision problems

  • Disadvantage

    • Need to consider each structure separately.

A formal language approach cont



A formal language approach (cont.)

  • Abstraction of biological concepts

    • {A, C, G, T} → an alphabet V,

    • WK-complementarity → an antimorphic involution

      • Involution

        • A mapping θ s.t. θ2 is identity (symmetry).

      • Antimorphism

        • θ(xy) = θ(y)θ(x) (opposite direction).


Bond free properties kks05

Bond-free properties[KKS05]

  • θ-non-overlapping:

  • θ-compliant:

    • Strictly (a) : a property (a) with θ-non-overlapping

Bond free properties kks051

Bond-free properties[KKS05]

  • θ-p-compliant:

  • θ-s-compliant:

Bond free properties kks052

Bond-free properties[KKS05]

  • θ-free:

  • θ-sticky-free:

Bond free properties kks053

Bond-free properties[KKS05]

  • θ-3’-overhang-free:

  • θ-5’-overhang-free:

  • θ-overhang-free: both of these

Decidability kks05

Decidability [KKS05]

  • Theorem

    • the following problem is decidable in quadratic time w.r.t. |A|

      • Input: an NFA A,

      • Output: Yes/No depending on whether L(A) satisfies any of the following properties (or their strictly versions):

        • θ-compliant, θ-p-compliant, θ-s-compliant,

        • θ-sticky-free,

        • θ-3’-overhang-free, θ-5’-overhang-free, θ-overhang-free.

Decidability and maximality kks05

Decidability and maximality[KKS05]

  • Theorem

    • Let M be a regular language and L be a regular subset of M with a property ρ:

      • ρ is one of the followings:

        • θ-compliant,

        • θ-p-compliant,

        • θ-s-compliant, or

        • θ-sticky-free

    • Then it is decidable whether L is a maximal subset of M satisfying ρ.

Secondary structure prevention

Secondary structure prevention

  • Secondary structures:

    • Hairpin-loop (or simply hairpin)

    • Internal loop

    • Multiple-branch loop

    • Pseudoknot

  • They can be undesirable

    • e.g. for Adleman’s encoding technique for Hamiltonian Path Problem (HPP).

Secondary structures


Hairpin frame (multiple loop)




Internal loop








Secondary Structures

Hairpin free language


Hairpin-free language

  • A formal model of hairpin: x v y θ(v) z.

  • Hairpin freeness

    • Intuitively it’s almost impossible to prevent hairpins of short stack length (say 2 or 3).

    • Our desire is to prevent any hairpin of stack length no less than some given parameter k.

x v y θ(v) z

Hairpin free language kkl06

Hairpin-free language [KKL06]

  • A word w is (θ, k)-hairpin-free (abbr. hp(θ, k)-free) iff

  • hpf(θ, k) : the set of all hp(θ, k)-free words on Σ*

  • hp(θ, k) : Σ* - hpf(θ, k).

  • A language L is called (θ, k)-hairpin-free iff

Regularity of hairpin languages






Regularity of hairpin languages

  • hp(θ, k) and hpf(θ, k) are regular.

  • For a hp(θ, k)-free language L, there exists a finite automaton M s.t. L = L(M).

Hairpin freeness problems

Hairpin Freeness Problems

  • Hairpin-Freeness problem

  • Maximal Hairpin-Freeness problem

Input: A nondeterministic automaton M,

Output: Y/N depending on whether L(M) is hp(θ, k)-free.

Input: A deterministic automaton M1, and NFA M2.

Output: Y/N depending on whether there is a word

s.t. is hp(θ, k)-free.



  • The hairpin-freeness problem for regular languages is decidable in time.

  • The maximal hairpin-freeness problem for regular languages is decidable in time.

Hairpin frames

Hairpin Frames

  • So-called Multiple loop

  • hp-frame of degree n:

  • The right figure is an example of hp-frame of degree 3.

  • A word u is hp(fr, j)-word if it contains a hp-frame of degree j.

Regularity decidability

Regularity & decidability

  • hp(θ, fr, j) : the set of all hp(fr, j)-words on Σ*

  • hpf(θ, fr, j) : its complement in Σ*

  • The languages hp(θ, fr, j) & hpf(θ, fr, j) are regular.

  • The hp(fr, j)-freeness problem is decidable in linear time.

  • The maximal hp(fr, j)-freeness problem is decidable in time.

Application dna hrams

Application : DNA-HRAMs



  • n-bit DNA-HRAM consists of n hairpins.

  • Each hairpin stores 1-bit information by forming and deforming a hairpin as shown above.
















N bit dna hram

n-bit DNA-HRAM

  • Concatenation of n 1-bit RAM, which is equivalent to hp-frame of degree n.

  • In order for this word to work as n-bit RAM, the following subword should be hpf(θ, 20)-free.

  • DNA memory with 4 hairpins was proposed in [KYO08].



  • [AlSa97] Allawi, HT., SantaLucia, J.: Thermodynamics and NMR of internal G T mismatches in DNA. Biochemistry 36(34) (1997) 10581-10594

  • [ArKo02] Arita, M., Kobayashi, S.: DNA sequence design using templates. New Generation Computing 20 (2002) 263-277

  • [ANH00] Arita, M., Nishikawa, A., Hagiya, M., Komiya, K., Gouzu, H., Sakamoto, K.: Improving sequence design for dna computing. Proc. Genetic and Evolutionary Computation Conference (2000) 875-882.

  • [FBR00] Feldkamp, U., Saghafi, S., Rauhe, H.: A DNA sequence compiler. Proc. DNA6, (2000)

  • [KKS05] Kari, L., Konstantinidis, S., Sosik, P.: Preventing undesirable bonds between DNA codewords. Prof. DNA10, LNCS 3384 (2005) 182-191.

  • [KKL06] Kari, L., Konstantinidis, S., Losseva, E., Sosik, P., Thierrin, G.: A formal language analysis of DNA hairpin structures. Fundamenta Informaticae 71 (2006) 453-475

  • [KKA03] Kobayashi, S., Kondo, T., Arita, M.: On template method for DNA sequence design. Proc. DNA8, LNCS 2568 (2003) 205-214

Reference cont

Reference (cont.)

  • [KNO08] Kawashimo, S., Ng, Y-K., Ono, H., Sadakane, K., Yamashita, M.: Speeding up local-search type algorithms for designing dna sequences under thermodynamical constraints. Proc. DNA14 (2008) 152-161

  • [KYO08] Kameda, A., Yamamoto, M., Ohuchi, A., Yaegashi, S., Hagiya, M.: Unravel four hairpins! Natural Computing 7 (2008) 287-298

  • [RFL01] Ruben, A. J., Freeland, S. J., Landweber, L. F.: PUNCH: An evolutionary algorithm for optimizing bit set selection. DNA7 (2001) 150-160

  • [Shannon48] Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948) 379-423, 623-656

  • [TKY04] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Thermodynamic parameters based on a nearest-neighbor model for DNA sequences with a single-bulge loop. Biochemistry 43(22) (2004) 7143-7150

  • [TKY05] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Design of nucleic acid sequences for DNA computing based on a thermodynamic approach. Nucleic Acids Res. 33(3) (2005) 903-911

Reference cont1

Reference (cont.)

  • [TuHo03] Tulpan, D., Hoos, H.: Hybrid randomised neighbourhoods improve stochastic local search for dna code design. In Advances in Artificial Intelligence: 16th Conference of the Canadian Society for Computational Studies of Intelligence, 2671 (2003) 418-433

  • [YoSu00] Yoshida, H., Suyama, A.: Solution to 3-sat by breadth first search. Proc. the 5th DIMACS Workshop on DNA Based Computers, 54 (2000) 9-22

  • [ZuSt81] Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1) (1981) 133-148

  • Login