Challenges in DNA Computing: Preventing Pseudoknot Formation

Towards the sequence design preventing pseudoknot formation 2nd International Workshop on Natural Computing, Dec. 10-12, 2007 Noyori Conference Hall, Nagoya University, Japan Lila Kari and Shinnosuke Seki Biocomputing Laboratory Department of Computer Science, University of Western Ontario, London, ON, Canada http://www.csd.uwo.ca/~lila/, ~sseki lila, sseki@csd.uwo.ca

DNA computing: brief overview • Process flow • Information is encoded into DNA strings over adenine (A), cytosine (C), guanine (G), and thymine (T) • Succession of intermolecular reactions among encoding DNA strings (bio-operations) in an expected manner based on the base-pairing A-T and C-G (Watson-Crick complementarity) • Resulting DNA strings are decoded using such techniques as Gel-electrophoresis and PCR (polymerase chain reaction). • Advantages • Parallelism • NP-complete problem solvable • Massive storage capacity • Energy efficiency

Watson-Crick complementarity • C - G, A - T (or A - U in RNAs) • Two DNA single strands with opposite orientation (5’ -> 3’) can bind to each other via Watson-Crick complementarity • Bio-operations of DNA computing strongly depend on this biochemical property. 5’ A C C G T A G 3’ 3’ 5’ T G G C A T C

0 6 3 1 4 2 5 Adleman’s first DNA computing [Adleman, 1994] A solution of Hamiltonian Path Problem (NP-complete problem) • Hamiltonian Path • A path which visits each vertex exactly once. • Hamiltonian Path Problem • Whether a Hamiltonian path exists in a given graph. • How to encode this problem into a test tube called DNA computer?

Encoding oligonucleotides O2TATCGGATCGGTATATCCGA O3GCTATTCGAGCTTAAAGCTA O2->3 6 3 1 4 2 5 0 O2 O3 TATCGGATCGGTATATCCGA GCTATTCGAGCTTAAAGCTA CTCGAATAGC TCGGATATAC O2->3 Adleman’s first DNA computing [Adleman, 1994] A solution of Hamiltonian Path Problem (NP-complete problem) encode CTCGAATAGCTCGGATATAC

What’s the challenges in DNA computing? • How to design encoding DNA strands? • Any kind of intramolecular structures are undesirable for encoding DNA single strands; • Intramolecular structures deprive strands of their ability to interact with another strand (bio-operations) • On the other hand, DNA single strands tend to form intramolecular structures for thermodynamic stability.

Intramolecular structures • E coli transfer-messenger RNA • hairpin loops • bulge loops • internal loops • multiple loops • pseudoknots

Intramolecular structure freeness problem Formal language theoretic approach • DNA alphabet: • d-morphic involution • Antimorphism: (cf. morphism: ) • Involution: • Watson-Crick complementarity • Antimorphic involution A T C T A G

ACG AA CGT θ Intramolecular structure freeness problem Formal language theoretic approach • Hairpin structure • The hairpin structure is the most well-known, and hence intensively investigated intramolecular structure (e.g. [2]). • A DNA strand which forms a hairpin can be modeled as A C G AA T G C

θ-bordered words [Kari et al. 2007] • θ-border • v is said to be θ-border of w if • w is said to be θ-bordered if w has a non-empty θ-border; otherwise, it is called θ-unbordered. • A DNA strand which forms a hairpin is modeled as • Therefore, θ-unbordered words do not form hairpins in this sense. • : the set of all θ-borders of w

Pseudoknots 3’ y θ(y) • Generic term of cross-dependent structures: the right figure is the simplest and hence most popular type pseudoknot. • From the viewpoint of formal language, a strand which forms a pseudoknot of this type can be modeled as: • In this paper, we consider the case where ρ x α y γ θ(x) δ θ(y) σ θ(x) x 5’

θ-Pseudoknot-bordered words • θ-pseudoknot-border • v is said to be θ-pseudoknot-border of w if • w is said to be θ-pseudoknot-bordered if w has a non-empty θ-pseudoknot-border; otherwise it is called θ-pseudoknot-unbordered. • In particular, for an antimorphic θ, if xy is a θ-pseudoknot-border of w, then • θ-pseudoknot-unbordered words never form a pseudoknot of the type • : the set of all θ-pseudoknot-borders of w

A A T A T T T y θ(y) Example of θ-Pseudoknot-bordered words • Let • θ: Watson-Crick complementarity • In fact, by letting • Also by letting x θ(x)

θ-border and θ-pseudoknot border • Lemma 2 • Let θ be a d-morphic involution, and . Then • Let be the set of all θ-unbordered words. • Let be the set of all θ-pseudoknot unbordered words. • Proposition 1 • Let θ be a d-morphic involution. Then

Properties of θ-unbordered words [Kari et al. 1997] • Pref(u), Suff(u): the set of all prefixes (suffixes) of u. • Lemma [Kari et al. 1997] • Let θ be an antimorphism and . Then u is θ-unbordered, i.e, , iff • Lemma 5 • Let θ be an antimorphism and . Then • Corollary 2 • Let θ be an antimorphism and . Then

x θ(x) y θ(y) Properties of θ-pseudoknot unbordered words • We cannot say generally • Example 7 • Let • uu is θ-pseudoknot bordered because for AATTTTATA AATTTTATA

Repetitions of a word and their θ-pseudoknot unbordered property (cont.) • Proposition 2 • Let θ be an antimorphism. Then for , if , then for any , • Corollary 3 • For a word ,

Primitive & θ-pseudoknot unborderd words • A word u is primitive if u = wi implies i = 1. • Lemma 7 • Let θ be an antimorphism and . If u2 is θ-pseudoknot bordered, then u is primitive. • Corollary 4 • If and it is not primitive, then u2 is θ-pseudoknot unbordered, i.e., . This implies • Example 8 • Let . Neither u nor uu are θ-pseudoknot-bordered. This implies that

Primitive & θ-pseudoknot unborderd words (cont.) • Theorem 1 • Let θ be an antimorphism and satisfying Then any θ-pseudoknot border of u2 is primitive. • Proposition 3 • Let θ be an antimorphism and . If w is a θ-pseudoknot border of u2, then the factorization of w into x and y s.t. is unique.

Future work • How to ease the condition ? • The current condition is too strict for practical use. • How to guarantee the θ-pseudoknot unbordered property for the concatenation of two different θ-pseudoknot unbordered words? • Concatenation is a basic manipulation used in DNA computing. • How to model more complicated pseudoknots?

Challenges in DNA Computing: Preventing Pseudoknot Formation