1 / 45

Joint work with :

Biological Validation as Model-Checking François Fages, INRIA Rocquencourt http://contraintes.inria.fr/. Joint work with : Laurence Nathalie Sylvain

zalman
Download Presentation

Joint work with :

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Biological Validation as Model-Checking François Fages, INRIA Rocquencourthttp://contraintes.inria.fr/ • Joint work with: • Laurence Nathalie Sylvain • Calzone Chabrier-Rivier Soliman • Implementation: The Biochemical Abstract Machine BIOCHAM 2.4

  2. Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • ? ?

  3. Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • boolean semantics 0 1

  4. Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • concentration semantics y ý

  5. Transposing Programming Language Notions to Systems Biology • Formally, “the” behavior of a system depends on our choice of observables. • set semantics • { } Set of behaviors in different/all initial conditions or in different/all contexts

  6. Theory of Abstract Interpretation [Cousot Cousot 77] abstraction Hierarchy of models Models for answering queries The more abstract the better Boolean model Discrete model Differential model Stochastic model

  7. Query: What are the Stationary States ? abstraction Boolean analysis abstraction Discrete analysis Boolean model abstraction Differential analysis Discrete model abstraction Stochastic analysis Differential model abstraction Issue of correctness / optimality of an abstraction w.r.t. a query Stochastic model

  8. Modularity and Compositionality • When the behavior of a complex system is a function of the behaviors of its components…

  9. Modularity and Compositionality • When the behavior of a complex system is a function of the behaviors of its components… • Notion relative to « behaviors » hence to our choices of observables for the complex system and for the components. • Complex systems may be modular for some observables, and not for other choices. • Mathematically, any system can be made modular by considering the behaviors of its components in all contexts. (« emergence » can always be formally eliminated with an appropriate choice of semantics for the components)

  10. The Thesis • Biochemical model = Transition system • Biological property = Temporal Logic formula • Biological validation = Model-checking • [Lincoln et al. 02] [Chabrier Fages 03] [Bernot et al. 04] [Alon et al. 04] …

  11. The Thesis • Biochemical model = Transition system • Biological property = Temporal Logic formula • Biological validation = Model-checking • Biochemical Abstract Machine environment [Fages et al. 04] • Model: BIOCHAM Biological Properties: • - Boolean - simulation - CTL • - Concentration - query evaluation - LTL with constraints • - Population - rule learning - PCTL with constraints • - parameter search • (SBML)

  12. The Thesis • Biochemical model = Transition system • Biological property = Temporal Logic formula • Biological validation = Model-checking • Biochemical Abstract Machine environment [Fages et al. 04] • Model: BIOCHAM Biological Properties: • - Boolean - simulation - CTL • - Concentration - query evaluation - LTL with constraints • - Population - rule learning - PCTL with constraints • - parameter search • (SBML)

  13. Plan of the Presentation • Rule-based Language for Modeling Biochemical Systems • Syntax of objects and reactions • Semantics at 3 abstraction levels: boolean, concentration, population • Temporal Logic Language for Formalizing Biological Properties • Computation Tree Logic for the boolean semantics • Constraint Linear Time Logic for the concentration semantics • Probabilistic Constraint Temporal Logic for the population semantics • Machine Learning Rules from Temporal Properties • Learning reaction rules • Learning kinetic parameter values • Conclusion

  14. 1. Syntax of Reaction Rules • Complexation A + B => A-B Cell cycle control model [Tyson 91] • Decomplexation A-B => A + B k6*[Cdc2-Cyclin~{p1}] for • Cdc2-Cyclin~{p1} => Cdc2+Cyclin~{p1} • Phosphorylation A =[B]=> A~{p} k8*[Cdc2] for Cdc2 => Cdc2~{p1} • Dephosphorylation A~{p} =[B]=> A • Synthesis _ =[B]=> A k1 for _ => Cyclin. • Degradation A =[B]=> _ k2*[Cyclin] for Cyclin => _. • Transport A::L1 => A::L2

  15. BIOCHAM Semantics of a Rule Set {ei for Si => S’i} • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • A reaction A+B=>C+D is translated into 4 transition rules considering the possible consumption of the reactants by the reaction: • A+BA+B+C+D • A+BA+B +C+D • A+BA+B+C+D • A+BA+B+C+D

  16. BIOCHAM Semantics of a Rule Set {ei for Si => S’i} • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • 2. Concentration Semantics: number / volume • Ordinary Differential Equations or hybrid automaton (deterministic) • dxk/dt = ΣXi=1n ri(xk) * ei − ΣXj=1n lj(xk) * ej • where ri (resp. lj) is the stochiometric coefficient of xk in S’i (resp. Si) multiplied by the volume ratio of the location of xk.

  17. BIOCHAM Semantics of a Rule Set {ei for Si => S’i} • Boolean Semantics: presence-absence of molecules • Concurrent Transition System (asynchronous, non-deterministic) • 2. Concentration Semantics: number / volume • Ordinary Differential Equations or hybrid automaton (deterministic) • 3. Population of molecules: number of molecules • Continuous time Markov chain the ei’s giving transition probabilities

  18. Example: Cell Cycle Control Model [Tyson 91] • k1 for _ => Cyclin. • k2*[Cyclin] for Cyclin => _. • k7*[Cyclin~{p1}] for Cyclin~{p1} => _. • k8*[Cdc2] for Cdc2 => Cdc2~{p1}. • k9*[Cdc2~{p1}] for Cdc2~{p1} =>Cdc2. • k3*[Cyclin]*[Cdc2~{p1}] for Cyclin+Cdc2~{p1} => Cdc2~{p1}-Cyclin~{p1}. • k4p*[Cdc2~{p1}-Cyclin~{p1}] for Cdc2~{p1}-Cyclin~{p1} => Cdc2-Cyclin~{p1}. • k4*[Cdc2-Cyclin~{p1}]^2*[Cdc2~{p1}-Cyclin~{p1}] for • Cdc2~{p1}-Cyclin~{p1} =[Cdc2-Cyclin~{p1}] => Cdc2-Cyclin~{p1}. • k5*[Cdc2-Cyclin~{p1}] for Cdc2-Cyclin~{p1} => Cdc2~{p1}-Cyclin~{p1}. • k6*[Cdc2-Cyclin~{p1}] for Cdc2-Cyclin~{p1} => Cdc2+Cyclin~{p1}.

  19. Concentration Simulation

  20. Boolean Simulation

  21. Interaction Graph

  22. 2. Formalizing Biological Properties in Temporal Logics • Biological property = Temporal Logic Formula • Biological validation = Model-checking • Express properties in: • Computation Tree Logic CTL for the boolean semantics • Linear Time Logic with numerical constraints for the concentration semantics • Probabilistic CTL with numerical constraints for the stochastic semantics

  23. E, A Non-determinism AG EU EF F,G,U Time Computation Tree Logic CTL • Extension of propositional (or first-order) logic with operators for time and choices

  24. Biological Properties formalized in CTL [Chabrier Fages 03] • Aboutreachability: • Can the cell produce some protein P? reachable(P)==EF(P) • Aboutpathways: • Is it possible to produce P without having Q? E(Q U P) • Is state s2 a necessary checkpoint for reaching state s? • checkpoint(s2,s)== E(s2U s) • Aboutstationarity: • Is a (partially described) state s a stable state? stable(s)== AG(s) • Is s a steady state (with possibility of escaping) ? steady(s)==EG(s) • Can the cell reach a stable state? EF(stable(s)) • Aboutoscillations (approximation without strong fairness): • Can the system exhibit a cyclic behavior w.r.t. the presence of P ? oscillation(P)== EG((P  EF P) ^ (P  EF P))

  25. CTL Properties Satisfied in the Cell Cycle Model • reachable(Cdc2~{p1}) • reachable(Cyclin) • reachable(Cyclin~{p1}) • reachable(Cdc2-Cyclin~{p1}) • reachable(Cdc2~{p1}-Cyclin~{p1}) • oscil(Cdc2) • oscil(Cdc2~{p1}) • oscil(Cdc2~{p1}-Cyclin~{p1}) • oscil(Cyclin) • AG((!(Cdc2-Cyclin~{p1}))->checkpoint(Cdc2~{p1}-Cyclin~{p1},Cdc2-Cyclin~{p1})) • … • Automatically checked / generated by model-checking techniques (NuSMV BDD)

  26. Mammalian Cell Cycle Control Map [Kohn 99]

  27. Transcription of Kohn’s Map • _ =[ E2F13-DP12-gE2 ]=> cycA. • ... • cycB =[ APC~{p1} ]=>_. • cdk1~{p1,p2,p3} + cycA => cdk1~{p1,p2,p3}-cycA. • cdk1~{p1,p2,p3} + cycB => cdk1~{p1,p2,p3}-cycB. • ... • cdk1~{p1,p3}-cycA =[ Wee1 ]=> cdk1~{p1,p2,p3}-cycA. • cdk1~{p1,p3}-cycB =[ Wee1 ]=> cdk1~{p1,p2,p3}-cycB. • cdk1~{p2,p3}-cycA =[ Myt1 ]=> cdk1~{p1,p2,p3}-cycA. • cdk1~{p2,p3}-cycB =[ Myt1 ]=> cdk1~{p1,p2,p3}-cycB. • ... • cdk1~{p1,p2,p3} =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}. • cdk1~{p1,p2,p3}-cycA =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}-cycA. • cdk1~{p1,p2,p3}-cycB =[ cdc25C~{p1,p2} ]=> cdk1~{p1,p3}-cycB. 165 proteins and genes, 500 variables, 800 rules [Chiaverini Danos 03]

  28. Cell Cycle Model-Checking • biocham: check_reachable(cdk46~{p1,p2}-cycD~{p1}). • Ei(EF(cdk46~{p1,p2}-cycD~{p1})) is true • biocham: check_checkpoint(cdc25C~{p1,p2}, cdk1~{p1,p3}-cycB). • Ai(!(E(!(cdc25C~{p1,p2}) U cdk1~{p1,p3}-cycB))) is true • biocham: nusmv(Ai(AG(!(cdk1~{p1,p2,p3}-cycB) -> checkpoint(Wee1, cdk1~{p1,p2,p3}-cycB))))). • Ai(AG(!(cdk1~{p1,p2,p3}-cycB)->!(E(!(Wee1) U cdk1~{p1,p2,p3}-cycB)))) is false • biocham: why. • -- Loop starts here • cycB-cdk1~{p1,p2,p3} is present • cdk7 is present • cycH is present • cdk1 is present • Myt1 is present • cdc25C~{p1} is present • rule_114 cycB-cdk1~{p1,p2,p3}=[cdc25C~{p1}]=>cycB-cdk1~{p2,p3}. • cycB-cdk1~{p2,p3} is present • cycB-cdk1~{p1,p2,p3} is absent • rule_74 cycB-cdk1~{p2,p3}=[Myt1]=>cycB-cdk1~{p1,p2,p3}. • cycB-cdk1~{p2,p3} is absent • cycB-cdk1~{p1,p2,p3} is present

  29. Mammalian Cell Cycle Control Benchmark • 500 variables, 2500 states. • BIOCHAM NuSMV model-checker time in sec. [Chabrier et al. TCS 04]

  30. Temporal Properties with Numerical Constraints • Constraints over concentrations and derivatives as FOL formulae over the reals: • [M] > 0.2 • [M]+[P] > [Q] • d([M])/dt < 0 • Constraint LTL operators for time X, F, U, G (no non-determinism). • F([M]>0.2) • FG([M]>0.2) • F ([M]>2 & F (d([M])/dt<0 & F ([M]<2 & d([M])/dt>0 & F(d([M])/dt<0)))) • oscil(M,n) defined as at least n alternances of sign of the derivative • Period(A,75)= t v F(T = t & [A] = v & d([A])/dt > 0 & X(d([A])/dt < 0) & F(T = t + 75 & [A] = v & d([A])/dt > 0 & X(d([A])/dt < 0)))

  31. Traces from Numerical Simulation • From a system of Ordinary Differential Equations • dX/dt = f(X) • Numerical integration methods produce a discretization of time (adaptive step size Runge-Kutta or Rosenbrock’s method for stiff syst.) • The trace is a linear Kripke structure…: • (t0,X0,dX0/dt), (t1,X1,dX1/dt), …, (tn,Xn,dXn/dt), … • over concentrations and their derivatives at discrete time points

  32. Trace-Based Constraint LTL Model Checking • Hypothesis 1: the initial state is completely known • Hypothesis 2: the formula can be checked over a finite period of time [0,T] • Run the numerical integration from 0 to T producing a trace of values at a finite sequence of time points • Iteratively label the time points with the sub-formulae of f that are true: • Add f to the time points where a FOL formula f is true, • Add F f(X f) to the (immediate) previous time points labeled by f, • Add f1 U f2to the predecessor time points of f2 while they satisfy f1, • (Add G f to the states satisfying f until T) • Model checker and numerical integration implemented in Prolog

  33. 3. Searching Parameters from Temporal Properties • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • oscil(Cdc2-Cyclin~{p1},3),150).

  34. 3. Searching Parameters from Temporal Properties • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • oscil(Cdc2-Cyclin~{p1},3),150). • First values found : • parameter(k3,10). • parameter(k4,70).

  35. Searching Parameters from Temporal Properties • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • oscil(Cdc2-Cyclin~{p1},3) & F([Cdc2-Cyclin~{p1}]>0.15), 150). • First values found : • parameter(k3,10). • parameter(k4,120).

  36. Learning Parameter Values from LTL Specification • biocham: learn_parameter([k3,k4],[(0,200),(0,200)],20, • period(Cdc2-Cyclin~{p1},35), 150). • First values found: • parameter(k3,10). • parameter(k4,280).

  37. Learning One Reaction Rule from CTL Specification • Suppose that the MPF activation rule is missing in the model • biocham: delete_rule(MPF~{p}=[cdc25C~{p1,p2}]=>MPF). • biocham: check_all. • The specification is not satisfied. • This formula is the first not verified: Ei(EF(MPF)) • Rules can be searched to correct the model w.r.t. specification: • biocham: learn_one_rule(all_elementary_interaction_rules).

  38. Learning One Reaction Rule from CTL Specification • Suppose that the MPF activation rule is missing in the model • biocham: delete_rule(MPF~{p}=[cdc25C~{p1,p2}]=>MPF). • biocham: check_all. • The specification is not satisfied. • This formula is the first not verified: Ei(EF(MPF)) • Rules can be searched to correct the model w.r.t. specification: • biocham: learn_one_rule(all_elementary_interaction_rules). • _=[cdc25C~{p1,p2}]=>MPF • MPF~{p}=[cdc25C~{p1,p2}]=>MPF • CKI+MPF~{p}=[cdc25C~{p1,p2}]=>CKI-MPF

  39. Learning Several Rules from CTL Properties • Classification of CTL formulae: • ECTL formulae contain only E quantifiers: reachability, oscillation, … • If false, remain false after deleting a rule  add rule • ACTL formulae contain only A quantifiers: checkpoint,… • If false, remains false after adding a rule  delete rule • Remove a rule on the path given by the model checker (why command) • Unclassified CTL formulae • Mixed E and A quantifiers

  40. Model Revision Algorithm • Treat ECTL and UCTL formulas first (by adding one rule) • Treat ACTL formulas by removing rules (possibly insatisfying E/UCTL) • Iterate keeping ACTL formulae satisfied • Enumerates sets of rule additions/deletions which correctly satisfy the CTL specification • Terminates (search depth bounded by |A|*|EU| transitions) • Incomplete: may fail to find a model revision when one exists • (only one rule addition is tried to satisfy ECTL and UCTL formulae)

  41. Learning Reaction Rules from CTL Specification • Example: finding an intermediary step between MPF and APC activation • biocham: absent(X). add_rule(_=>X). add_rule(X=>_). • biocham: add_specs({ Ei(reachable(X)), Ai(oscil(X)), • Ai(AG(!APC->checkpoint(X,APC))), • Ai(AG(!X->checkpoint(MPF,X))) }). • biocham: check_all. • False. First formula not verified: Ai(AG(!APC->!(E(!X U APC)))) • Biocham searches for revisions of the model satisfying the specification • biocham: revise_model.

  42. Learning Reaction Rules from CTL Specification • Example: finding an intermediary step between MPF and APC activation • biocham: absent(X). add_rule(_=>X). add_rule(X=>_). • biocham: add_specs({ Ei(reachable(X)), Ai(oscil(X)), • Ai(AG(!APC->checkpoint(X,APC))), • Ai(AG(!X->checkpoint(MPF,X))) }). • biocham: check_all. • False. First formula not verified: Ai(AG(!APC->!(E(!X U APC)))) • Biocham searches for revisions of the model satisfying the specification • biocham: revise_model. • Deletion(s): _=[MPF]=>APC. _=>X. • Addition(s): _=[X]=>APC. _=[MPF]=>X.

  43. Methodology • Specify the biological properties of the system known from experiments in temporal logic, either qualitative or quantitative. • Build and re-use models satisfying the specification • Infer rules from boolean CTL properties • Infer parameters from numerical LTL properties • Refine models keeping their specification • Couple models with explicit coupled specification • Publish not only the model but also its « specification » i.e. the relevant biological captured by the model • Compare and discuss models on a formal ground

  44. Conclusion • Two Formal languages in BIOCHAM: • Rule language  modeling biochemical systems(SBML compatible) • Temporal logic  formalizing their biological properties • Automated reasoning tools in BIOCHAM: • Model-checking temporal logic properties  model validation • Searching parameter values satisfying TL prop. parameter search • Learning reaction rules from TL prop.  model revision • Pros: Practical for the boolean and concentration semantics • Cons: not for the stochastic semantics (the simulation times with our implementation of Gillespie or Gibson algorithms are too costly) • Cons: Boolean abstraction very crude (need to restrict non-determinism)

  45. Collaborations • STREP APRIL 2: Applications of probabilistic inductive logic programming • Luc de Raedt, Freiburg, Stephen Muggleton, Imperial College London,… • Learning in a probabilistic logic setting • NoE REWERSE: Reasoning on the web with rules and semantics • François Bry, Münich, Rolf Backofen Jena, Mike Schroeder Dresden,… • Connecting Biocham to the semantic web: gene and protein ontologies • INRIA Bang, Jean Clairambault, Benoît Perthame • INSERM Villejuif, Francis Lévi “Cancer chronotherapies” • Coupled models of cell cycle, circadian cycle, cytotoxic drugs.

More Related