1 / 42

Joint work with and

The Biochemical Abstract Machine BIOCHAM Logic programming steps towards formal biology François Fages, INRIA Rocquencourt http://contraintes.inria.fr/. Joint work with and Nathalie Chabrier-Rivier Sylvain Soliman

lora
Download Presentation

Joint work with and

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Biochemical Abstract Machine BIOCHAMLogic programming steps towards formal biologyFrançois Fages, INRIA Rocquencourthttp://contraintes.inria.fr/ • Joint work with and • Nathalie Chabrier-Rivier Sylvain Soliman • ARC CPBIO “Process Calculi and Biology of Molecular Networks” • Alexander Bockmayr, LORIA Nancy, Vincent Danos, CNRS Paris PPS, • Vincent Schächter, Genoscope Evry, et al. • http://contraintes.inria.fr/cpbio/

  2. Current Revolution in Systems Biology • Elucidation of high-level biological processes • in terms of their biochemical basis at the molecular level. • Mass production of genomic and post-genomic data: • ARN expression, protein synthesis, protein-protein interactions,… • Need for a strong parallel effort on the formal representation of biological processes: Systems Biology. • Need for formal tools for modeling and reasoning about their global behavior.

  3. Formalisms for Modeling Biochemical Systems • Diagrammatic notation • Boolean networks [Thomas 73] • Milner’s pi–calculus [Regev-Silverman-Shapiro 99-01, Nagasali et al. 00] • Concurrent transition systems [Chabrier-Chiaverini-Danos-Fages-Schachter 03] • Biochemical abstract machine BIOCHAM [Chabrier-Fages-Soliman 03] • Pathway logic [Eker-Knapp-Laderoute-Lincoln-Meseguer-Sonmez 02] • Bio-ambients [Regev-Panina-Silverman-Cardelli-Shapiro 03] • Differential equations • Hybrid Petri nets [Hofestadt-Thelen 98, Matsuno et al. 00] • Hybrid automata [Alur et al. 01, Ghosh-Tomlin 01] • Hybrid concurrent constraint languages [Bockmayr-Courtois 01]

  4. Our Goal • Beyond simulation, provide formal tools for querying, validating and completing biological models. • Our proposal: • Use of temporal logic CTL as a query language for models of biological processes; • Use of concurrent transition systems for their modeling; • Use of symbolic and constraint-based model checkers for automatically evaluating CTL queries in qualitative and quantitative models. • Use of inductive logic programming for learning models • In course, learn and teach bits of biology with logic programs.

  5. References • A wonderful textbook: • Molecular Cell Biology. 5th Edition, 1100 pages+CD, Freeman Publ. • Lodish, Berk, Zipursky, Matsudaira, Baltimore, Darnell. Nov. 2003. • Genes and signals. Ptashne, Gann. CSHL Press. 2002. • Modeling dynamic phenomena in molecular and cellular biology. • Segel. Cambridge Univ. Press. 1987. • Modeling and querying bio-molecular interaction networks. • Chabrier, Chiaverini, Danos, Fages, Schächter. To appear in TCS. 2004. • The biochemical abstract machine BIOCHAM. Chabrier, Fages, Soliman. http://contraintes.inria.fr/BIOCHAM

  6. Plan of the Talk • Introduction • A simple algebra of cell molecules • Concurrent transition systems of biochemical reactions • Example of the mammalian cell cycle control • Temporal logic CTL as a query language • Computational results with BIOCHAM • Learning reaction rules • An experiment with inductive logic programming • Kinetics models • Simulation with differential equations • Hybrid systems • Conclusion

  7. 2. A Simple Algebra of Cell Molecules • Small molecules: covalent bonds (outer electrons shared) 50-200 kcal/mol • 70% water • 1% ions • 6% amino acids (20), nucleotides (5), • fats, sugars, ATP, ADP, … • Macromolecules: hydrogen bonds, ionic, hydrophobic, Waals 1-5 kcal/mol • Stability and bindings determined by the number of weak bonds: 3D shape • 20% proteins (50-104 amino acids) • RNA (102-104 nucleotides AGCU) • DNA (102-106 nucleotides AGCT)

  8. Structure Levels of Proteins • 1) Primary structure: word of n amino acids residues (20n possibilities) • linked with C-N bonds • ICLP • Isoleucine Cysteine Leucine Proline • 2) Secondary: word of m a-helix, b-strands, random coils,… (3m-10m) • stabilized by hydrogen bonds H---O • 3) Tertiary 3D structure: spatial folding • stabilized by • hydrophobic • interactions

  9. Formal proteins • Cyclin dependent kinase 1 Cdk1 • (free, inactive) • Complex Cdk1-Cyclin B Cdk1–CycB • (low activity) • Phosphorylated form Cdk1~{thr161}-CycB • at site threonine 161 • (high activity) • (BIOCHAM syntax)

  10. Abstraction of gene expression DNA  RNA  protein • DNA:word over 4 nucleotides Adenine, Guanine, Cytosine, Thymine • double helix of pairs A--T and C---G • Replication: DNA synthesis • Genes: parts of DNA • Transcription: RNA copying from a gene • #ERCC1-(PRB-JUN-CFOS)

  11. BIOCHAM Algebra of Cell Molecules • E ::= Name|E-E|E~{E,…,E}|(E) S ::= _|E+S • Names: molecules, proteins, #gene binding sites, abstract @processes… • - : binding operator for protein complexes, gene binding sites, … • Associative and commutative. • ~{…}: modification operator for phosphorylated sites, … • Set of modified sites (Associative, Commutative, Idempotent). • + : solution operator, “soup aspect”, Assoc. Comm. Idempotent, Neutral _ • No membranes, no transport formalized. Bitonal calculi [Cardelli 03].

  12. Plan of the talk • Introduction • A simple algebra of cell molecules • Concurrent transition systems of biochemical reactions • Example of the mammalian cell cycle control • Temporal logic CTL as a query language • Computational results with BIOCHAM • Learning reaction rules • An experiment with inductive logic programming • Kinetics models • Simulation with differential equations • Hybrid systems • Conclusion

  13. 3. Concurrent Transition Syst. of Biochemical Reactions • Enzymatic reactions: • R ::= S=>S | S=[E]=>S | S=[R]=>S | S<=>S | S<=[E]=>S • (where A<=>B stands for A=>BB=>A and A=[C]=>B for A+C=>B+C, etc.) • define a concurrent transition system over integers denoting the multiplicity of the molecules (multiset rewriting). • One can associate a finite abstract CTS over booleanstate variables denoting the presence/absence of molecules • which correctly over-approximates the set of all possible behaviors • a reaction A+B=>C+D is translated with 4 rules for possible consumption: • A+BA+B+C+D A+BA+B +C+D • A+BA+B+C+D A+BA+B+C+D

  14. Six Rule Schemas • Complexation: A + B => A-B Decomplexation A-B => A + B • Cdk1+CycB => Cdk1–CycB • Phosphorylation: A =[C]=> A~{p} Dephosphorylation A~{p} =[C]=> A • Cdk1–CycB =[Myt1]=> Cdk1~{thr161}-CycB • Cdk1~{thr14,tyr15}-CycB =[Cdc25~{Nterm}]=> Cdk1-CycB • Synthesis: _ =[C]=> A. • _ =[#Ge2-E2f13-Dp12]=> CycA • Degradation: A =[C]=> _. • CycE =[@UbiPro]=> _ (not for CycE-Cdk2 which is stable)

  15. MAPK Signaling Pathway • RAF + RAFK <=> RAF-RAFK. • RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH. • MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} • where p2 not in $P. • MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH. • MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} • where p2 not in $P. • MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH. • RAF-RAFK => RAFK + RAF~{p1}. • RAF~{p1}-RAFPH => RAF + RAFPH. • MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. • MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. • MEK~{p1}-MEKPH => MEK + MEKPH. • MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. • MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. • MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+ MEK~{p1,p2}. • MAPK~{p1}-MAPKPH => MAPK + MAPKPH. • MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.

  16. MAPK Signaling Pathway • RAF + RAFK <=> RAF-RAFK. • RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH. • MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} • where p2 not in $P. • MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH. • MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} • where p2 not in $P. • MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH. • RAF-RAFK => RAFK + RAF~{p1}. • RAF~{p1}-RAFPH => RAF + RAFPH. • MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. • MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. • MEK~{p1}-MEKPH => MEK + MEKPH. • MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. • MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. • MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+ MEK~{p1,p2}. • MAPK~{p1}-MAPKPH => MAPK + MAPKPH. • MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.

  17. Cell Cycle: G1  DNA Synthesis  G2  Mitosis • G1: CdK4-CycD • Cdk6-CycD • Cdk2-CycE • S: Cdk2-CycA • G2 • M: Cdk1-CycA • Cdk1-CycB

  18. Mammalian Cell Cycle Control Map [Kohn 99]

  19. Kohn’s map detail for Cdk2 • Complexation with CycA and CycE • Phosphorylation sites PY15 and P • Biocham Rules: • cdk2~$P + cycA-$C => cdk2~$P-cycA-$C • where $C in {_,cks1} . • cdk2~$P + cycE~$Q-$C => cdk2~$P-cycE~$Q-$C • where $C in {_,cks1} . • p57 + cdk2~$P-cycA-$C => p57-cdk2~$P-cycA-$C • where $C in {_, cks1}. • cycE-$C =[cdk2~{p2}-cycE-$S]=> cycE~{T380}-$C • where $S in {_, cks1} and $C in {_, cdk2~?, cdk2~?-cks1} • 147-2733 rules, 165 proteins and genes, 500 variables, 2500 states.

  20. Plan of the talk • Introduction • A simple algebra of cell molecules • Concurrent transition systems of biochemical reactions • Example of the mammalian cell cycle control • Temporal logic CTL as a query language • Expressivity and computational results • Learning reaction rules • An experiment with inductive logic programming • Kinetics models • Simulation with differential equations • Hybrid systems • Conclusion

  21. E, A Non-determinism AG EU EF F,G,U Time 4. Temporal Logic CTL as a Query Language • Computation Tree Logic

  22. Kripke Structures • A Kripke structure K is a triple (S; R; L) where S is a set of states, and RSxS is a total relation. • s |= f if f is true in s, • s |= E f if there is a path  from s such that  |= f, • s |= A f if for every path  from s,  |= f, •  |= f if s |= f where s is the starting state of , •  |= X f if 1 |= f, •  |= F f if there exists k >0 such that k |= f, •  |= G f if for every k >0, k |= f, •  |= f1 U f2 iff there exists k>0 such that k |= f for all j < k j |= f. • Following [Emerson 90] we identify a formula f to the set of states which satisfy it f ~ {sS : s |= f}.

  23. Symbolic Model Checking • Model Checking is an algorithm for computing, in a given finite Kripke structure the set of states satisfying a CTL formula: {sS : s |= f}. • Basic algorithm: represent K as a graph and iteratively label the nodes with the subformulas of f which are true in that node. • Add f to the states satisfying f • Add EF f(EX f) to the (immediate) predecessors of states labeled by f • Add E(f1 U f2 ) to the predecessor states of f2 while they satisfy f1 • Add EG f to the states for which there exists a path leading to a non trivial strongly connected component of the subgraph of states satisfying f • Symbolic model checking: use OBDDs to represent states and transitions as boolean formulas (S is finite).

  24. Biological Queries (1/3) • About reachability: • Given an initial state init, can the cell produce some protein P? init  EF(P) • Which are the states from which a set of products P1,. . . , Pn can be produced simultaneously? EF(P1^…^Pn) • About pathways: • Can the cell reach a state s while passing by another state s2? init  EF(s2^EFs) • Is state s2 a necessary checkpoint for reaching state s? EF(s2U s) • Is it possible to produce P without using nor creating Q? EF(Q U s) • Can the cell reach a state s without violating some constraints c? init  EF(c U s)

  25. Biological Queries (2/3) • About stability: • Is a certain (partially described) state s a stable state? sAG(s)sAG(s) (s denotes both the state and the formula describing it). • Is s a steady state (with possibility of escaping) ? sEG(s) • Can the cell reach a stable state? initEF(AG(s))not a LTL formula. • Must the cell reach a stable state? initAF(AG(s)) • What are the stable states? Not expressible in CTL [Chan 00]. • Can the system exhibit a cyclic behavior w.r.t. the presence of P ? init  EG((P  EF P) ^ (P  EF P))

  26. Biological Queries (3/3) • About the correctness of the model: • Can one see the inaccuracies of the model and correct them? • Exhibit a counterexample pathway or a witness. Suggest refinements of the model or biological experiments to validate/invalidate the property of the model. • About durations: • How long does it take for a molecule to become activated? • In a given time, how many Cyclins A can be accumulated? • What is the duration of a given cell cycle’s phase? • CTL operators abstract from durations. Time intervals can be modeled in FO by adding numerical arguments for start times and durations.

  27. MAPK Signaling Pathway • RAF + RAFK <=> RAF-RAFK. • RAF~{p1} + RAFPH <=> RAF~{p1}-RAFPH. • MEK~$P + RAF~{p1} <=> MEK~$P-RAF~{p1} • where p2 not in $P. • MEKPH + MEK~{p1}~$P <=> MEK~{p1}~$P-MEKPH. • MAPK~$P + MEK~{p1,p2} <=> MAPK~$P-MEK~{p1,p2} • where p2 not in $P. • MAPKPH + MAPK~{p1}~$P <=> MAPK~{p1}~$P-MAPKPH. • RAF-RAFK => RAFK + RAF~{p1}. • RAF~{p1}-RAFPH => RAF + RAFPH. • MEK~{p1}-RAF~{p1} => MEK~{p1,p2} + RAF~{p1}. • MEK-RAF~{p1} => MEK~{p1} + RAF~{p1}. • MEK~{p1}-MEKPH => MEK + MEKPH. • MEK~{p1,p2}-MEKPH => MEK~{p1} + MEKPH. • MAPK-MEK~{p1,p2} => MAPK~{p1} + MEK~{p1,p2}. • MAPK~{p1}-MEK~{p1,p2} => MAPK~{p1,p2}+ MEK~{p1,p2}. • MAPK~{p1}-MAPKPH => MAPK + MAPKPH. • MAPK~{p1,p2}-MAPKPH => MAPK~{p1} + MAPKPH.

  28. MAPK Signaling Pathway • MEK~{p1} is a checkpoint for producing MAPK~{p1,p2} • biocham: !E(!MEK~{p1} U MAPK~{p1,p2}) • True • The PH complexes are not compulsory for the cascade • biocham: !E(!MEK~{p1}-MEKPH U MAPK~{p1,p2}) • false • Step 1 rule 15 • Step 2 rule 1 RAF-RAFK present • Step 3 rule 21 RAF~{p1} present • Step 4 rule 5 MEK-RAF~{p1} present • Step 5 rule 24 MEK~{p1} present • Step 6 rule 7 MEK~{p1}-RAF~{p1} present • Step 7 rule 23 MEK~{p1,p2} present • Step 8 rule 13 MAPK-MEK~{p1,p2} present • Step 9 rule 27 MAPK~{p1} present • Step 10 rule 15 MAPK~{p1}-MEK~{p1,p2} present • Step 11 rule 28 MAPK~{p1,p2} present

  29. Mammalian Cell Cycle Control Benchmark • 700 rules, 165 proteins and genes, 500 variables, 2500 states. • BIOCHAM NuSMV model-checker time in seconds:

  30. Plan of the talk • Introduction • A simple algebra of cell molecules • Concurrent transition systems of biochemical reactions • Example of the mammalian cell cycle control • Temporal logic CTL as a query language • Computational results with BIOCHAM • Learning reaction rules • An experiment with inductive logic programming • Kinetics models • Simulation with differential equations • Hybrid systems • Conclusion

  31. 5. Learning Reaction Weights and Rules • Idea 1: learning reaction weights from temporal properties • reaction weights restricts the non-determinism (Markov models) • Idea 2: learn reaction rules from temporal properties of the system. • Learning of cell cycle reaction rules from reachability properties and counterexamples with Progol [Muggleton 00]. • reaction([m_CP,m_Y],[m_pM]). • reaction([m_CP],[m_C2]). • % reaction([m_pM],[m_M]). • reaction([m_M],[m_C2,m_YP]). • reaction([m_C2],[m_CP]). • reaction([m_YP],[]). • reaction([],[m_Y]). • pathway(S1,S2) :- same(S1,S2). • pathway(S1,S2) :- reaction(L1,L2), transition(S1,L1,S3,L2), pathway(S3,S2).

  32. Inductive Logic Programming • reaction([m_pM],[m_M]) learned… • 6th PCRD APRIL 2 “Applications of Probabilistic Inductive Logic Progr.” Luc de Raedt, Univ. Freiburg, Stephen Muggleton, Imperial College London.

  33. Plan of the talk • Introduction • A simple algebra of cell molecules • Concurrent transition systems of biochemical reactions • Example of the mammalian cell cycle control • Temporal logic CTL as a query language • Computational results with BIOCHAM • Learning reaction rules • An experiment with inductive logic programming • Kinetics models • Simulation with differential equations • Hybrid system • Conclusion

  34. 6. Kinetics Models • Enzymatic reactions with rates k1 k2 k3 • E+S k1 C k2 E+P • E+S k3 C • can be compiled by the law of mass action into a system of • Michaelis-Menten Ordinary Differential Equations (non-linear) • dE/dt = -k1ES+(k2+k3)C • dS/dt = -k1ES+k3C • dC/dt = k1ES-(k2+k3)C • dP/dt = k2C

  35. MAPK kinetics model

  36. Gene Interaction Networks • Gene interaction example [Bockmayr-Courtois 01] • Hybrid Concurrent Constraint Programming HCC [Saraswat et al.] • 2 genes x and y. • Hybrid linear approximation • dx/dt = 0.01 – 0.02*x if y < 0.8 • dx/dt = – 0.02*x if y ≥ 0.8 • dy/dt = 0.01*x

  37. Concurrent Transition System • Time discretization using Euler’s method: • y < 0.8  x’ = x + dt*(0.01-0.02*x) , y’ = y + dt*0.01*x • y ≥ 0.8  x’ = x + dt*(0.01-0.02*x) , y’ = y + dt*0.01*x • Initial condition: x=0, y=0. • Translation into a CLP(R) program (dt=1) • Init :- X=0, Y=0, p(X,Y). • p(X,Y):-X>=0, Y>=0, Y<0.8, • X1=X-0.02*X+0.01, Y1=Y+0.01*X, p(X1,Y1). • p(X,Y):-X>=0, Y>=0, Y>=0.8, • X1=X-0.02*X, Y1=Y+0.01*X, p(X1,Y1).

  38. Proving CTL properties by computing fixpoints of Constraint Logic Programs Theorem [Delzanno Podelski 99] EF(f)=lfp(TP{p(x):-f}), EG(f)=gfp(TPf ). Safety property AG(f) iff EF(f) iff initlfp(TP{f}) Liveness property AG(f1AF(f2)) iff initlfp(TPf1gfp(T P{f2} ) ) Implementation in Sicstus-Prolog CLP(R,B) [Delzanno 00]

  39. Deductive Model Checker DMC: Gene Interaction • r(init, p(s_s,A,B), {A=0,B=0}). • r(p(s_s,A,B), p(s_s,C,D), {A>=0,B>=0.8,C=A-0.02*A,D=B+0.01*A}). • r(p(s_s,A,B), p(s_s,C,D), {A>=0,B>=0,B<0.8, • C=A-0.02*A+0.01,D=B+0.01*A}). • | ?- prop(P,S). • P = unsafe, S = p:s*(x>=0.6) • | ?- ti. • Property satisfied. Execution time 0.0 • | ?- ls. • s(0, p(s_s,A,_), {A>=0.6}, 1, (0,0)).

  40. Demonstration DMC (continued) • | ?- prop(P,S). • P = unsafe, S = p:s*(x>=0.2) ? • | ?- ti. • Property NOT satisfied. Execution time 1.5 • | ?- ls. • s(0, p(s_s,A,_), {A>=0.2}, 1, (0,0)). • s(1, p(s_s,A,B), {B<0.8,B>=-0.0,A>=0.19387755102040816}, 2, (2,1)). • … • s(26, p(s_s,A,B), {B>=0.0,A>=0.0, • B+0.1982676351105516*A<0.7741338175552753}, 27, (2,26)). • s(27, init, {}, 28, (1,27)).

  41. Conclusion • The biochemical abstract machine BIOCHAM provides: • A first-order-rule-based language for modeling biochemical systems • A powerful query language based on temporal logic CTL • Models of complex biochemical processes, • intracellular and extracellular signaling, • cell-cycle control,… • a repository of models http://contraintes.inria.fr/CMBSlib • Implementation in Prolog + model-checkers NuSMV and DMC • Learning techniques investigated in APrIL 2 • PILP-based learning of reaction weights from temporal properties • PILP-based learning of reaction rules from temporal properties

  42. Perspectives • Collaboration with biologists on BIOCHAM models of the cell-cycle control • Colon cancer therapies, Domenjoud, UHP Nancy • Chronotherapies, Clairambault, INSERM • Hybrid concurrent constraint logic programming[Bockmayr Courtois 01, Saraswat 04] • Multi-scale molecular-electro-physiological models[Sorine et al. 03] • http://www-rocq.inria.fr/sosso/icema2 • http://www.sci.sdsu.edu/movies

More Related