encoding information for dna computing
Download
Skip this Video
Download Presentation
Encoding Information for DNA computing

Loading in 2 Seconds...

play fullscreen
1 / 45

Encoding Information for DNA computing - PowerPoint PPT Presentation


  • 131 Views
  • Uploaded on

Encoding Information for DNA computing. Shinnosuke Seki. Purpose. What’s an advantage of encoding? To make a “ good ” or tractable code set for DNA computing. Development of polynomial-time algorithms which decide whether a given code set is “good” or “bad”. Claude Elwood Shannon.

loader
I am the owner, or an agent authorized to act on behalf of the owner, of the copyrighted work described.
capcha
Download Presentation

PowerPoint Slideshow about ' Encoding Information for DNA computing' - kelly-conner


An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author.While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.


- - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript
purpose
Purpose
  • What’s an advantage of encoding?
  • To make a “good” or tractable code set for DNA computing.
  • Development of polynomial-time algorithms which decide whether a given code set is “good” or “bad”.
claude elwood shannon
Claude Elwood Shannon
  • The father of information theory (Shannon’s entropy)
  • Boolean algebra with binary arithmetic makes it possible to simplify electromechanical relays
  • In “A mathematical theory of communication” [Shannon48],he showed that we can send error-free information even on noisy channel.
  • Chess program using minimax evaluation procedure
  • etc. …
shannon s information channel
Shannon’s information channel

Positive Noise

  • R > C overflow
  • R ≤ C We can make the error rate as small as possible.
  • To attain R = C in the noisy channel, we need to find a ‘good’ code.

capacity C

sender

encoder

decoder

receiver

Information flow R

Negative Noise

biological perspective
Biological perspective
  • A biological reaction can be described in terms of information channel model.
    • example The case of heredity
  • For billions of years, Mother Nature has developed wonderful code system?
  • Biology -> Computer Science

Natural Selection

heredity

parent

DNA

DNA

child

Mutation

review in vitro dna computing
Review:in vitro DNA computing
  • Encode a given problem into single or double-stranded DNAs (ssDNAs, dsDNAs)
  • Computation by a succession of bio-operations.
  • Decode the resulting solution and extract its output.
review wk complementarity

A

T

C

G

5’ - A T C G G T C A A C T G C C C T A A T G  3’

3’  T A G C C A G T T G A C G G G A T T A C - 5’

Review: WK-complementarity
  • Hydrogen bonds
  • Two strands which are
    • complementary to each other
    • with opposite directions

can form a (complete) dsDNA.

  • Example
adleman s first trial
Adleman’s first trial
  • Find a solution of Hamiltonian path problem in a solution in polynomial time order of the input graph.
  • The solution is filled with encoding oligonucleotides.

1

3

1

2

3

4

ACG CTT

ATA GAT

CGG TTA

ACT TAA

GAA TAT

CTA GCC

AAT TGA

1 -> 2

2 -> 3

3 -> 4

2

4

what s a good code set
What’s a good code set?
  • Each code word (oligonucleotide) shouldn’t form any undesirable structure.
  • This may make itself inert.
  • Code words don’t interact with each other in an undesirable way.
  • Structure formation is due to
    • WK-complementarity
    • Gibbs free energy

A

A T

2

ATA GAT

T A

G

what s a good code set cont
What’s a good code set? (cont.)
  • Uniform melting temperature
  • Preventing undesirable hybridizations
  • Other constraints
    • Avoiding repeated bases
    • Forbidden subsequences
      • Using a restriction enzyme, its corresponding recognition site should appear only in intended sites
    • Using only 3 types of nucleotides A, C, T
melting temperature
Melting temperature
  • Melting temperature Tm of a dsDNA is
    • the temperature at which half of the dsDNAs is denatured.
    • The higher Tm is, the more stable the dsDNA is.
        • R: gas constant,
        • Ct: total oligo concentration,
        • ΔH & ΔS : enthalpy & entropy
        • α: 1 for self-complementary and 4 for non-self
melting temperature cont
Melting temperature (cont.)
  • Uniform melting temperature
    • To uniform Tm can eliminate a bias of hybridization.
  • GC content
    • The ratio of the # of G’s and C’s over the total # of nucleotides in a sequence
    • G-C pair is more stable than A-T pair.
    • Higher GC content implies higher Tm.
    • Sequences are designed with 50% GC content.
gibbs free energy g
Gibbs free energy (ΔG)
  • A well-known indicator of stability for DNA structures
    • A structure with lower ΔG is more stable.
    • The ΔG of entire structure is the sum of ΔG of each substructures [ZuSt81].
nearest neighborhood method
Nearest-neighborhood method

Refer to [AlSa97], [TKY04] ([8], [9] in this table)

template method arko02
Template method[ArKo02]
  • Prepare 2 bit sequences, each of which has some desirable property
    • (e.g., 50%-GC content, error-correction).
  • Using convert rule, from these 2 sequences, we construct a sequence.
template method cont
Template method (cont.)
  • Design criteria
    • Template
      • An element x should have at least d-mismatches with xR, xx, xR xR, xxR, xRx.
      • An exhaustive search to find a good template
    • Map (error-correcting code)
      • A code whose words have at least k-mismatches.
      • e.g. BCH code
  • Drawback
    • It cannot prevent sequences from forming secondary structures.
ag templates gc templates kka03
GC-template

Template contains the same # of 0’s and 1’s (50% GC-content)

Map is an error correcting code.

AG-template

Map is constant weight codes (50% GC-content)

Results in the bigger set of sequences

AG-templates, GC-templates[KKA03]
other approaches
Other approaches
  • DNASequenceGenerator[FBR00]
    • A software with GUI
    • Create a sequence with melting temperature, GC-content, no palindromes, start codons, nor restriction sites.
other approaches1
Other approaches
  • Suyama’s approach[YoSu00]
    • To generate sequences randomly, add it into a sequence set iff it satisfies all of the following constraints:
      • Uniform melting temperature
      • No mis-hybridization
      • No formation of stable secondary structure
    • Drawback is to fall into local optima easily.
other approaches2
Other approaches
  • Hybrid randomized neighborhoods[TuHo03]
    • Stochastic local search (SLS) algorithm
    • Searches neighbors by mutating current best sequences randomly with a probability ε.
    • It moves to the direction where the # of constraint conflicts is maximally decreased with a probability 1-ε.
other approaches3
Other approaches
  • GA (genetic algorithm)-based approach[ANH00]
    • Use GAs to evaluate fitness of solutions
    • As criteria
      • Restriction sites
      • GC-content
      • Hamming distance
      • Same base repetition
other approaches4
Other approaches
  • Gibbs free energy base approach [TKY05], [KNO08]
    • Taking thermodynamics into consideration
    • Gibbs free energy as a stability measure
    • Advantage
      • Greater accuracy because it takes into account stability of loops or stacking between base-pairs
    • Disadvantage
      • More computational time to calculate free energy
    • How to decrease this computational complexity?
a formal language approach
A formal language approach
  • Design a set of structure-free codes in terms of WK-complementary.
  • Advantage
    • More reliable codes than Free-energy approach
    • More efficient algorithm for decision problems
  • Disadvantage
    • Need to consider each structure separately.
a formal language approach cont

TCATCCGATTTCGGG

AGTAGGCTAAAGCCC

A formal language approach (cont.)
  • Abstraction of biological concepts
    • {A, C, G, T} → an alphabet V,
    • WK-complementarity → an antimorphic involution
      • Involution
        • A mapping θ s.t. θ2 is identity (symmetry).
      • Antimorphism
        • θ(xy) = θ(y)θ(x) (opposite direction).
    • e.g. (TCATCCGATTTCGGG) = CCCGAAATCGGATGA
bond free properties kks05
Bond-free properties[KKS05]
  • θ-non-overlapping:
  • θ-compliant:
    • Strictly (a) : a property (a) with θ-non-overlapping
bond free properties kks051
Bond-free properties[KKS05]
  • θ-p-compliant:
  • θ-s-compliant:
bond free properties kks052
Bond-free properties[KKS05]
  • θ-free:
  • θ-sticky-free:
bond free properties kks053
Bond-free properties[KKS05]
  • θ-3’-overhang-free:
  • θ-5’-overhang-free:
  • θ-overhang-free: both of these
decidability kks05
Decidability [KKS05]
  • Theorem
    • the following problem is decidable in quadratic time w.r.t. |A|
      • Input: an NFA A,
      • Output: Yes/No depending on whether L(A) satisfies any of the following properties (or their strictly versions):
        • θ-compliant, θ-p-compliant, θ-s-compliant,
        • θ-sticky-free,
        • θ-3’-overhang-free, θ-5’-overhang-free, θ-overhang-free.
decidability and maximality kks05
Decidability and maximality[KKS05]
  • Theorem
    • Let M be a regular language and L be a regular subset of M with a property ρ:
      • ρ is one of the followings:
        • θ-compliant,
        • θ-p-compliant,
        • θ-s-compliant, or
        • θ-sticky-free
    • Then it is decidable whether L is a maximal subset of M satisfying ρ.
secondary structure prevention
Secondary structure prevention
  • Secondary structures:
    • Hairpin-loop (or simply hairpin)
    • Internal loop
    • Multiple-branch loop
    • Pseudoknot
  • They can be undesirable
    • e.g. for Adleman’s encoding technique for Hamiltonian Path Problem (HPP).
secondary structures

Hairpin

Hairpin frame (multiple loop)

5’

3’

5’

Internal loop

3’

5’

A C G T

3’

3’

5’

G C C

Secondary Structures
hairpin free language

TAA---ACG---CGTTA---CGT---CGGT

Hairpin-free language
  • A formal model of hairpin: x v y θ(v) z.
  • Hairpin freeness
    • Intuitively it’s almost impossible to prevent hairpins of short stack length (say 2 or 3).
    • Our desire is to prevent any hairpin of stack length no less than some given parameter k.

x v y θ(v) z

hairpin free language kkl06
Hairpin-free language [KKL06]
  • A word w is (θ, k)-hairpin-free (abbr. hp(θ, k)-free) iff
  • hpf(θ, k) : the set of all hp(θ, k)-free words on Σ*
  • hp(θ, k) : Σ* - hpf(θ, k).
  • A language L is called (θ, k)-hairpin-free iff
regularity of hairpin languages

X

X

X

w

θ(w)

Regularity of hairpin languages
  • hp(θ, k) and hpf(θ, k) are regular.
  • For a hp(θ, k)-free language L, there exists a finite automaton M s.t. L = L(M).
hairpin freeness problems
Hairpin Freeness Problems
  • Hairpin-Freeness problem
  • Maximal Hairpin-Freeness problem

Input: A nondeterministic automaton M,

Output: Y/N depending on whether L(M) is hp(θ, k)-free.

Input: A deterministic automaton M1, and NFA M2.

Output: Y/N depending on whether there is a word

s.t. is hp(θ, k)-free.

decidability
Decidability
  • The hairpin-freeness problem for regular languages is decidable in time.
  • The maximal hairpin-freeness problem for regular languages is decidable in time.
hairpin frames
Hairpin Frames
  • So-called Multiple loop
  • hp-frame of degree n:
  • The right figure is an example of hp-frame of degree 3.
  • A word u is hp(fr, j)-word if it contains a hp-frame of degree j.
regularity decidability
Regularity & decidability
  • hp(θ, fr, j) : the set of all hp(fr, j)-words on Σ*
  • hpf(θ, fr, j) : its complement in Σ*
  • The languages hp(θ, fr, j) & hpf(θ, fr, j) are regular.
  • The hp(fr, j)-freeness problem is decidable in linear time.
  • The maximal hp(fr, j)-freeness problem is decidable in time.
application dna hrams
Application : DNA-HRAMs

C

G

  • n-bit DNA-HRAM consists of n hairpins.
  • Each hairpin stores 1-bit information by forming and deforming a hairpin as shown above.

A

T

G

C

opening

T

A

--A-C-T-G-T-C-G-A-C-A-G-T--

C

G

A

T

closing

0

1

n bit dna hram
n-bit DNA-HRAM
  • Concatenation of n 1-bit RAM, which is equivalent to hp-frame of degree n.
  • In order for this word to work as n-bit RAM, the following subword should be hpf(θ, 20)-free.
  • DNA memory with 4 hairpins was proposed in [KYO08].
reference
Reference
  • [AlSa97] Allawi, HT., SantaLucia, J.: Thermodynamics and NMR of internal G T mismatches in DNA. Biochemistry 36(34) (1997) 10581-10594
  • [ArKo02] Arita, M., Kobayashi, S.: DNA sequence design using templates. New Generation Computing 20 (2002) 263-277
  • [ANH00] Arita, M., Nishikawa, A., Hagiya, M., Komiya, K., Gouzu, H., Sakamoto, K.: Improving sequence design for dna computing. Proc. Genetic and Evolutionary Computation Conference (2000) 875-882.
  • [FBR00] Feldkamp, U., Saghafi, S., Rauhe, H.: A DNA sequence compiler. Proc. DNA6, (2000)
  • [KKS05] Kari, L., Konstantinidis, S., Sosik, P.: Preventing undesirable bonds between DNA codewords. Prof. DNA10, LNCS 3384 (2005) 182-191.
  • [KKL06] Kari, L., Konstantinidis, S., Losseva, E., Sosik, P., Thierrin, G.: A formal language analysis of DNA hairpin structures. Fundamenta Informaticae 71 (2006) 453-475
  • [KKA03] Kobayashi, S., Kondo, T., Arita, M.: On template method for DNA sequence design. Proc. DNA8, LNCS 2568 (2003) 205-214
reference cont
Reference (cont.)
  • [KNO08] Kawashimo, S., Ng, Y-K., Ono, H., Sadakane, K., Yamashita, M.: Speeding up local-search type algorithms for designing dna sequences under thermodynamical constraints. Proc. DNA14 (2008) 152-161
  • [KYO08] Kameda, A., Yamamoto, M., Ohuchi, A., Yaegashi, S., Hagiya, M.: Unravel four hairpins! Natural Computing 7 (2008) 287-298
  • [RFL01] Ruben, A. J., Freeland, S. J., Landweber, L. F.: PUNCH: An evolutionary algorithm for optimizing bit set selection. DNA7 (2001) 150-160
  • [Shannon48] Shannon, C.E.: A mathematical theory of communication. Bell System Technical Journal 27 (1948) 379-423, 623-656
  • [TKY04] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Thermodynamic parameters based on a nearest-neighbor model for DNA sequences with a single-bulge loop. Biochemistry 43(22) (2004) 7143-7150
  • [TKY05] Tanaka, F., Kameda, A., Yamamoto, M., Ohuchi, A.: Design of nucleic acid sequences for DNA computing based on a thermodynamic approach. Nucleic Acids Res. 33(3) (2005) 903-911
reference cont1
Reference (cont.)
  • [TuHo03] Tulpan, D., Hoos, H.: Hybrid randomised neighbourhoods improve stochastic local search for dna code design. In Advances in Artificial Intelligence: 16th Conference of the Canadian Society for Computational Studies of Intelligence, 2671 (2003) 418-433
  • [YoSu00] Yoshida, H., Suyama, A.: Solution to 3-sat by breadth first search. Proc. the 5th DIMACS Workshop on DNA Based Computers, 54 (2000) 9-22
  • [ZuSt81] Zuker, M., Stiegler, P.: Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. 9(1) (1981) 133-148
ad