- 90 Views
- Uploaded on
- Presentation posted in: General

Encoding Information for DNA computing

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -

Encoding Information for DNA computing

Shinnosuke Seki

- What’s an advantage of encoding?
- To make a “good” or tractable code set for DNA computing.
- Development of polynomial-time algorithms which decide whether a given code set is “good” or “bad”.

- The father of information theory (Shannon’s entropy)
- Boolean algebra with binary arithmetic makes it possible to simplify electromechanical relays
- In “A mathematical theory of communication” [Shannon48],he showed that we can send error-free information even on noisy channel.
- Chess program using minimax evaluation procedure
- etc. …

Positive Noise

- R > Coverflow
- R ≤ CWe can make the error rate as small as possible.
- To attain R = C in the noisy channel, we need to find a ‘good’ code.

capacity C

sender

encoder

decoder

receiver

Information flow R

Negative Noise

- A biological reaction can be described in terms of information channel model.
- example The case of heredity

- For billions of years, Mother Nature has developed wonderful code system?
- Biology -> Computer Science

Natural Selection

heredity

parent

DNA

DNA

child

Mutation

- Encode a given problem into single or double-stranded DNAs (ssDNAs, dsDNAs)
- Computation by a succession of bio-operations.
- Decode the resulting solution and extract its output.

A

T

C

G

5’ - A T C G G T C A A C T G C C C T A A T G 3’

3’ T A G C C A G T T G A C G G G A T T A C - 5’

- Hydrogen bonds
- Two strands which are
- complementary to each other
- with opposite directions
can form a (complete) dsDNA.

- Example

- Find a solution of Hamiltonian path problem in a solution in polynomial time order of the input graph.
- The solution is filled with encoding oligonucleotides.

1

3

1

2

3

4

ACG CTT

ATA GAT

CGG TTA

ACT TAA

GAA TAT

CTA GCC

AAT TGA

1 -> 2

2 -> 3

3 -> 4

2

4

- Each code word (oligonucleotide) shouldn’t form any undesirable structure.
- This may make itself inert.
- Code words don’t interact with each other in an undesirable way.
- Structure formation is due to
- WK-complementarity
- Gibbs free energy

A

A T

2

ATA GAT

T A

G

- Uniform melting temperature
- Preventing undesirable hybridizations
- Other constraints
- Avoiding repeated bases
- Forbidden subsequences
- Using a restriction enzyme, its corresponding recognition site should appear only in intended sites

- Using only 3 types of nucleotides A, C, T

- Melting temperature Tm of a dsDNA is
- the temperature at which half of the dsDNAs is denatured.
- The higher Tm is, the more stable the dsDNA is.
- R: gas constant,
- Ct: total oligo concentration,
- ΔH & ΔS : enthalpy & entropy
- α: 1 for self-complementary and 4 for non-self

- Uniform melting temperature
- To uniform Tm can eliminate a bias of hybridization.

- GC content
- The ratio of the # of G’s and C’s over the total # of nucleotides in a sequence
- G-C pair is more stable than A-T pair.
- Higher GC content implies higher Tm.
- Sequences are designed with 50% GC content.

- A well-known indicator of stability for DNA structures
- A structure with lower ΔG is more stable.
- The ΔG of entire structure is the sum of ΔG of each substructures [ZuSt81].

Refer to [AlSa97], [TKY04] ([8], [9] in this table)

- Prepare 2 bit sequences, each of which has some desirable property
- (e.g., 50%-GC content, error-correction).

- Using convert rule, from these 2 sequences, we construct a sequence.

- Design criteria
- Template
- An element x should have at least d-mismatches with xR, xx, xR xR, xxR, xRx.
- An exhaustive search to find a good template

- Map (error-correcting code)
- A code whose words have at least k-mismatches.
- e.g. BCH code

- Template
- Drawback
- It cannot prevent sequences from forming secondary structures.

GC-template

Template contains the same # of 0’s and 1’s (50% GC-content)

Map is an error correcting code.

AG-template

Map is constant weight codes (50% GC-content)

Results in the bigger set of sequences

- DNASequenceGenerator[FBR00]
- A software with GUI
- Create a sequence with melting temperature, GC-content, no palindromes, start codons, nor restriction sites.

- Suyama’s approach[YoSu00]
- To generate sequences randomly, add it into a sequence set iff it satisfies all of the following constraints:
- Uniform melting temperature
- No mis-hybridization
- No formation of stable secondary structure

- Drawback is to fall into local optima easily.

- To generate sequences randomly, add it into a sequence set iff it satisfies all of the following constraints:

- Hybrid randomized neighborhoods[TuHo03]
- Stochastic local search (SLS) algorithm
- Searches neighbors by mutating current best sequences randomly with a probability ε.
- It moves to the direction where the # of constraint conflicts is maximally decreased with a probability 1-ε.

- GA (genetic algorithm)-based approach[ANH00]
- Use GAs to evaluate fitness of solutions
- As criteria
- Restriction sites
- GC-content
- Hamming distance
- Same base repetition

- Gibbs free energy base approach [TKY05], [KNO08]
- Taking thermodynamics into consideration
- Gibbs free energy as a stability measure
- Advantage
- Greater accuracy because it takes into account stability of loops or stacking between base-pairs

- Disadvantage
- More computational time to calculate free energy

- How to decrease this computational complexity?

- Design a set of structure-free codes in terms of WK-complementary.
- Advantage
- More reliable codes than Free-energy approach
- More efficient algorithm for decision problems

- Disadvantage
- Need to consider each structure separately.

TCATCCGATTTCGGG

AGTAGGCTAAAGCCC

- Abstraction of biological concepts
- {A, C, G, T} → an alphabet V,
- WK-complementarity → an antimorphic involution
- Involution
- A mapping θ s.t. θ2 is identity (symmetry).

- Antimorphism
- θ(xy) = θ(y)θ(x) (opposite direction).

- Involution
- e.g. (TCATCCGATTTCGGG) = CCCGAAATCGGATGA

- θ-non-overlapping:
- θ-compliant:
- Strictly (a) : a property (a) with θ-non-overlapping

- θ-p-compliant:
- θ-s-compliant:

- θ-free:
- θ-sticky-free:

- θ-3’-overhang-free:
- θ-5’-overhang-free:
- θ-overhang-free: both of these

- Theorem
- the following problem is decidable in quadratic time w.r.t. |A|
- Input: an NFA A,
- Output: Yes/No depending on whether L(A) satisfies any of the following properties (or their strictly versions):
- θ-compliant, θ-p-compliant, θ-s-compliant,
- θ-sticky-free,
- θ-3’-overhang-free, θ-5’-overhang-free, θ-overhang-free.

- the following problem is decidable in quadratic time w.r.t. |A|

- Theorem
- Let M be a regular language and L be a regular subset of M with a property ρ:
- ρ is one of the followings:
- θ-compliant,
- θ-p-compliant,
- θ-s-compliant, or
- θ-sticky-free

- ρ is one of the followings:
- Then it is decidable whether L is a maximal subset of M satisfying ρ.

- Let M be a regular language and L be a regular subset of M with a property ρ:

- Secondary structures:
- Hairpin-loop (or simply hairpin)
- Internal loop
- Multiple-branch loop
- Pseudoknot

- They can be undesirable
- e.g. for Adleman’s encoding technique for Hamiltonian Path Problem (HPP).

Hairpin

Hairpin frame (multiple loop)

5’

3’

5’

Internal loop

3’

5’

A C G T

3’

3’

5’

G C C

TAA---ACG---CGTTA---CGT---CGGT

- A formal model of hairpin: x v y θ(v) z.
- Hairpin freeness
- Intuitively it’s almost impossible to prevent hairpins of short stack length (say 2 or 3).
- Our desire is to prevent any hairpin of stack length no less than some given parameter k.

x v y θ(v) z

- A word w is (θ, k)-hairpin-free (abbr. hp(θ, k)-free) iff
- hpf(θ, k) : the set of all hp(θ, k)-free words on Σ*
- hp(θ, k) : Σ* - hpf(θ, k).
- A language L is called (θ, k)-hairpin-free iff

X

X

X

w

θ(w)

- hp(θ, k) and hpf(θ, k) are regular.
- For a hp(θ, k)-free language L, there exists a finite automaton M s.t. L = L(M).

- Hairpin-Freeness problem
- Maximal Hairpin-Freeness problem

Input: A nondeterministic automaton M,

Output: Y/N depending on whether L(M) is hp(θ, k)-free.

Input: A deterministic automaton M1, and NFA M2.

Output: Y/N depending on whether there is a word

s.t. is hp(θ, k)-free.

- The hairpin-freeness problem for regular languages is decidable in time.
- The maximal hairpin-freeness problem for regular languages is decidable in time.

- So-called Multiple loop
- hp-frame of degree n:
- The right figure is an example of hp-frame of degree 3.
- A word u is hp(fr, j)-word if it contains a hp-frame of degree j.

- hp(θ, fr, j) : the set of all hp(fr, j)-words on Σ*
- hpf(θ, fr, j) : its complement in Σ*
- The languages hp(θ, fr, j) & hpf(θ, fr, j) are regular.
- The hp(fr, j)-freeness problem is decidable in linear time.
- The maximal hp(fr, j)-freeness problem is decidable in time.

C

G

- n-bit DNA-HRAM consists of n hairpins.
- Each hairpin stores 1-bit information by forming and deforming a hairpin as shown above.

A

T

G

C

opening

T

A

--A-C-T-G-T-C-G-A-C-A-G-T--

C

G

A

T

closing

0

1

- Concatenation of n 1-bit RAM, which is equivalent to hp-frame of degree n.
- In order for this word to work as n-bit RAM, the following subword should be hpf(θ, 20)-free.
- DNA memory with 4 hairpins was proposed in [KYO08].

- [AlSa97] Allawi, HT., SantaLucia, J.: Thermodynamics and NMR of internal G T mismatches in DNA. Biochemistry 36(34) (1997) 10581-10594
- [ArKo02] Arita, M., Kobayashi, S.: DNA sequence design using templates. New Generation Computing 20 (2002) 263-277
- [ANH00] Arita, M., Nishikawa, A., Hagiya, M., Komiya, K., Gouzu, H., Sakamoto, K.: Improving sequence design for dna computing. Proc. Genetic and Evolutionary Computation Conference (2000) 875-882.
- [FBR00] Feldkamp, U., Saghafi, S., Rauhe, H.: A DNA sequence compiler. Proc. DNA6, (2000)
- [KKS05] Kari, L., Konstantinidis, S., Sosik, P.: Preventing undesirable bonds between DNA codewords. Prof. DNA10, LNCS 3384 (2005) 182-191.
- [KKL06] Kari, L., Konstantinidis, S., Losseva, E., Sosik, P., Thierrin, G.: A formal language analysis of DNA hairpin structures. Fundamenta Informaticae 71 (2006) 453-475
- [KKA03] Kobayashi, S., Kondo, T., Arita, M.: On template method for DNA sequence design. Proc. DNA8, LNCS 2568 (2003) 205-214

- [KNO08] Kawashimo, S., Ng, Y-K., Ono, H., Sadakane, K., Yamashita, M.: Speeding up local-search type algorithms for designing dna sequences under thermodynamical constraints. Proc. DNA14 (2008) 152-161
- [KYO08] Kameda, A., Yamamoto, M., Ohuchi, A., Yaegashi, S., Hagiya, M.: Unravel four hairpins! Natural Computing 7 (2008) 287-298
- [RFL01] Ruben, A. J., Freeland, S. J., Landweber, L. F.: PUNCH: An evolutionary algorithm for optimizing bit set selection. DNA7 (2001) 150-160
- [Shannon48] Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948) 379-423, 623-656
- [TKY04] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Thermodynamic parameters based on a nearest-neighbor model for DNA sequences with a single-bulge loop. Biochemistry 43(22) (2004) 7143-7150
- [TKY05] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Design of nucleic acid sequences for DNA computing based on a thermodynamic approach. Nucleic Acids Res. 33(3) (2005) 903-911

- [TuHo03] Tulpan, D., Hoos, H.: Hybrid randomised neighbourhoods improve stochastic local search for dna code design. In Advances in Artificial Intelligence: 16th Conference of the Canadian Society for Computational Studies of Intelligence, 2671 (2003) 418-433
- [YoSu00] Yoshida, H., Suyama, A.: Solution to 3-sat by breadth first search. Proc. the 5th DIMACS Workshop on DNA Based Computers, 54 (2000) 9-22
- [ZuSt81] Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1) (1981) 133-148