Molecular Computing: Challenges across the two tracks in Theoretical Computer Science

Molecular Computing: Challenges across the two tracks in Theoretical Computer Science Masami Hagiya

Outline • Japanese Molecular Computer Project • Adleman-Lipton Paradigm and Improvements • Suyama’s Dynamic Programming DNA Computer • Autonomous Molecular Computing • Sakamoto’s Hairpin Engines • Analysis of Computational Power of Molecules • Complexity of Molecular Computation • Molecular Computation as Randomized Algorithm • Towards New Computational Paradigms • Molecular, Chemical, Cell, and Amorphous Computing • Importance of Engineering Viewpoint --- Programming

JSPS Project on Molecular Computing • Project Leader - Masami Hagiya (Computer Science) • Members • Takashi Yokomori (Computer Science) • Masayuki Yamamura (Computer Science) • Masanori Arita (Genome Informatics) • Akira Suyama (Biophysics) • Yuzuru Husimi (Biophysics) • Kensaku Sakamoto (Biochemistry) • Shigeyuki Yokoyama (Biochemistry) • October 1996 - March 2001 • Funded by Japan Society for Promotion of Science • Research for the Future Program

Goals of Molecular Computing • Analyses and Applications of Computational Power of Biomolecules • Understanding Life from the Viewpoint of Computation • computational mystery of life • Life is computationally very efficient. • Engineering Applications(not restricted to computation) • combinatorial optimization • (computationally inspired) biotechnology • nanotechnology, nanomachine • cryptography • medical and pharmaceutical applications in the future • New Computational Model, New Simulation Technology

Related Fields • Genome Informatics • applying computer science techniques to analyze genomic information • part of the human genome project • the other way round • But genome informatics is a good application area for molecular computing. • Quantum Computing • massively parallel computation by quantum superposition • Artificial Life • Artificial Molecular Evolution

Major Achievements of the Project • Suyama’s Dynamic Programming DNA Computers • reduction of molecules by breadth-first search • automation by robots • Sakamoto’s Hairpin Engines • Whiplash PCR and SAT Engine • molecular computation by hairpin formation • autonomous molecular computation • Theoretical Studies by Yokomori’s Group • Nishikawa’s Simulator for DNA computations • Arita’s New Tool for Code Design • Husimi’s 3SR-Based Evolutionary Reactor • Yamamura’s Aqueous Computing (with Head)

Dynamic ProgrammingDNA Computers

Adleman-Lipton Paradigm • Adleman (Science 1994) • Solving Hamilton Path Problem by DNA • Lipton, et al. • Solving SAT Problem by DNA • Massively Parallel Computation by Molecules • Mainly for Combinatorial Optimization • Random Generation by Self-Assembly • solution candidate ＝ DNA molecule • Selection by Molecular Biology Experiments Scaling Up ⇒Efforts to increase yields and reduce errorsRobot and Chemical IC

cf. Hamiltonian Path Problem by Adleman

Suyama’s Dynamic ProgrammingDNA Computer • “counting” （Ogihara and Ray） • O(20.4n) molecules for n-variable 3-SAT • “dynamic programming” （Suyama） • Iteration of Generation and Selection • generation of candidates of partial solutions • selection of partial solutions • The order of computational complexity does not decrease, but the amount of necessary molecules is drastically reduced. • 3-SAT

DP algorithm for 3CNF-SAT on DNA Computers

3-CNF SAT Solution on DP DNA Computer

DP algorithm for 3CNF-SAT Ú Ø Ú ( x x x ) 1 2 3 Ú Ú ( x x x ) 1 2 3 k’s loop: k ranges over variable indices j’s loop: j ranges over clause indices ifxk is the 3rd literal of the j-th clause then remove those assignments which satisfy neither the 1st nor the 2nd literal append XkF to the remaining assignments (do similarly if Øxk is the 3rd literal) k = 3 x3 X1TX2T X1TX2TX3F X1FX2T X1TX2FX3F X1TX2F X1F X2F

DP algorithm for 3CNF-SAT Ø Ú Ø Ú Ø ( x x x ) 1 2 3 Ú Ú ( x x x ) 1 2 3 k’s loop: k ranges over variable indices j’s loop: j ranges over clause indices ifxk is the 3rd literal of the j-th clause then remove those assignments which satisfy neither the 1st nor the 2nd literal append XkF to the remaining assignments (do similarly if Øxk is the 3rd literal) k = 3 Øx3 X1TX2T X1FX2TX3T X1FX2T X1TX2F X1FX2FX3T X1F X2F

DP algorithm for 3CNF-SAT Ú Ø Ú Ø Ú Ø Ú ( x x x ) ( x x x ) 2 3 4 2 3 4 Ú Ú ( x x x ) 2 3 4 k’s loop: k ranges over variable indices j’s loop: j ranges over clause indices ifxk is the 3rd literal of the j-th clause then remove those assignments which satisfy neither the 1st nor the 2nd literal append XkF to the remaining assignments (do similarly if Øxk is the 3rd literal) k = 4 x4 X1FX2TX3T X1FX2FX3T X1TX2TX3F X4F X1TX2TX3F X1T X2F X3F

Implementation of Basic Operations append (T, s, e) amplify (T, T1, T2, …Tn) get (T, +s), get (T, -s) annealing and ligation annealing annealing Taq DNA ligase T PCR s e s s s immobilization and cold wash immobilization and cold wash immobilization s e s s s hot wash hot wash and divide s get (T, -s) cold wash hot wash T1, T2, …Tn s get (T, +s)

On Scaling Up the Size of Computations • Suyama’s estimation • 2x10-3 g of DNA for100-variable 3-SAT • 2x1012 g of DNA by Adleman-Lipton • Current status: 4-variable 10-clause 3-SAT • Project goal: 30-variable 100-clause 3-SAT • Ultimate goal: 100-variable 400-clause 3-SAT • Still, 100 variables are not many. • A number of breakthroughs (in algorithms and experimental techniques) are required to defeat electronic computers. Robots, for example, …

Robot for DNA Computing Based onMAGTRATIONTM

Automatic Operation of get Command on DNA Computer Robot get (T, +s), get (T, -s) annealing s s immobilization s s get (T, -s) cold wash hot wash s get (T, +s)

[Instrument] [Reset Counter] 0 [Home Position] 0 [MJ-Open Lid] ･･･ [Get1(0)] [Get2(1)] [Append(2)] ･･･ [Exit] (1-1-4) [MJ-Open Lid] Do 2 _SEND "LID OPEN" Do 10 _SEND "LID?" Wait_msec 500 _CMP_GSTR "OPEN" IF_Goto EQ 0 ;open Wait_msec 1000 Loop Loop ; Time out End ;open protocol-level script-level Pascal/C-level Programming in DNA Computer

Hairpin Engines

Autonomous Molecular Computing • Adleman-Lipton Paradigm • generation of candidates ＝ autonomous reaction • selection of solutions ＝ many operations from outside • One-Pot Reaction ⇒Autonomous Computation Comutation by Successive Autonomous Reactions by Molecules • Winfree’s DNA Tile • Sakamoto’s Hairpin Engines • Whiplash PCR and SAT Engine • Applications: • Nanotechnology, Nanomachine • (Computationally Inspired) Biotechnology

cf. Winfree’s DNA Tile

Hairpin Engines • Molecular Computation by Hairpin Formation • Hairpin --- Typical Secondary Structure • Whiplash PCR • DNA Automaton: State Machine by DNA • 5 Transitions in a Control Experiment • SAT Engine • Selection by Hairpin Structures of DNA • 3‐SAT: 6-Variable 10-Clause Formula

SAT Engine • Sakamoto et al., Science, May 19, 2000. • Selection by Hairpin Structures of DNA • digestion by restriction enzyme • exclusive PCR • 3-SAT • ssDNA consisting of literals, each selected from a clause • complementary literal ＝ complementary sequence • detection of inconsistency ⇒ hairpin • The essential part of the SAT computation is done by hairpin formation. • Autonomous Molecular Computation

(a∨b∨c)∧(￢d∨e∨￢f)∧ … ∧(￢c∨￢b∨a)∧ ... e b ￢b digestion by restriction enzyme exclusive PCR b ￢b

Selection by Hairpin Structures • Digestion by Restriction Enzyme • Hairpins are cut at the restriction site inserted in each literal sequence. • Exclusive PCR • PCR is inefficient for hairpins. • In exclusive PCR, solution is diluted in each cycle to keep the difference in amplification. • The number of steps is independent on the number of variables or clauses.

6-Variable 10-Clause Formula (a∨b∨!c)∧(a∨c∨d)∧(a∨!c∨!d)∧(!a∨!c∨d)∧ (a∨!c∨e)∧(a∨d∨!f)∧(!a∨c∨d)∧(a∨c∨!d)∧ (!a∨!c∨!d)∧(!a∨c∨!d) ! = ￢

Solution of a6-Variable 10-Clause formula

Whiplash PCR • DNA Automaton： State Machine by DNA • Polymerization of Hairpin • Polymerization Stop • Autonomous MIMD Computation of Boolean μ-formulas • Solving NP-Complete Problems in O(1)-Step e.g., vertex cover: vertex cover candidate ＝ transition table ＝ ssDNA vertex cover ＝ transition table that reaches the final state • 5 Transitions in a Control Experiment

Whiplash PCR B x b a C x B A x

Whiplash PCR B C x B A x

Whiplash PCR a x x B A x C B

Whiplash PCR a c b x x B A x C B

5 Transitions ina Control Experiment

7 6 5 4 3 2 1 0

Analysis of Computational Power of Molecules

Complexity of Molecular Computation • Time • Number of Laboratory Operations • Time for Each Operation • more essential for the analysis of the computational power of molecules • Space (= Parallelism) • Number of Molecules • maximum number • total number • Size (Length) of Molecules • Analysis of the Trade-Off

Some Classical Results • Reif (SPAA’95) • A nondeterministic Turing machine computation with input size n, space s and time 2O(s) can be executed in our PAM Model using O(s) PA-Match steps and O(s log s) other PAM steps, employing aggregates of length O(s). • Beaver (DNA1, 1995) • Polynomial-step molecular computers compute PSPACE. • Rooß and Wagner (I&C, 1996) • Exactly the problems in PNP=Dp2 can be solved in polynomial time using Lipton’s model.

Yield and Error in Reactions • Yield • equilibrium --- equilibrium constant (K) • time to reach equilibrium --- reaction constant (k) • example: A « B [B] = (K/(1+K))(1-e-(k+k-1) t ) K = k/k-1 • Error • example: mis-hybridization • Error probability is never zero.

Reduction of Errors • Iteration of Laboratory Operations • increase in computation time • increase in loss of molecules • increase in number of molecules • Reduction of Error Probability • appropriate conditions • temperature, salt concentration • Low temperature leads to frequent mis-hybridzation. • However, high temperature decreases the yield. • good encoding • A number of papers have been published for designing good encoding.

Some Analyses • Karp, Keynon and Waarts (SODA’96) • The number of extract operations required for achieving error-resilient bit evaluation is Q(éloge dù×élogg dù). • Kurtz (DNA2, 1996) • thermodynamical analysis of path formation in Adleman’s experiment • time needed to form a Hamiltonian path --- W(n2) • Winfree (1998, Ph.D. Thesis) • thermodynamical analysis of DNA Tiling • Rose, et al. (GECCO’99) • Computational Incoherency (thermodynamical analysis of mis-hybridization)

Efficiency of SAT Engine:Tentative Analysis • Parameters • n : number of clauses • e : the probability that a satisfying assignment cannot be detected • Orders • Time O(n2.5) • Number of Molecules O(4n ln(1/e))

Molecular Computation and Randomized Algorithms • Randomized Algorithms with Molecules • Massive Parallelism • Random Operations • very easy to implement by chemical reactions • Error in Non-Random Operations • Error in non-random operations should not damage the error reducibility of a randomized algorithm. • Error should be compensated by random operations.

Some Recent Results • Chen and Ramachandran (DNA6, 2000) • k-SAT by Paturi et al. • Díaz, Esteban and Ogihara (DNA6, 2000) • k-SAT by Schöning • Sakakibara (DNA6, 2000) • PAC Learning of DNF Formulas • Approximate Consistent Learning

Molecular Computing: Challenges across the two tracks in Theoretical Computer Science