1 / 37

Identification of Transition Models of Biological Systems in the Presence of Transition Noise

This article discusses the modeling and identification of transition models in biological networks, focusing on Petri Nets and Logical Guarded Transition Systems (LGTSs). The identification process is examined, and the use of background knowledge and noisy data is explored.

ecopper
Download Presentation

Identification of Transition Models of Biological Systems in the Presence of Transition Noise

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Identification of Transition Models of Biological Systems in the Presence of Transition Noise A. Srinivasan, M. Bain, D. Vatsa, S. Agarwal

  2. Part 1: Transition Models in Biology

  3. Networks in Biology • Biological processes are often represented as networks • Gene-regulatory networks, signal-transduction networks, metabolic networks, protein-protein interaction networks, phylogenetic trees, food-webs, ecosystems • Modelling, visualisation and analysis of these networks is a fundamental part of modern Biology • Here, we will be looking at one kind of model for networks in Biology (transition models) • Most well known: Petri Net (and variants) • Generalisation to Logical Guarded Transition Systems (LGTSs)

  4. Some Examples of Networks

  5. Discrete System Observations

  6. Petri Net Models

  7. From Extended PNs to LGTSs

  8. Part 2: Model Identification

  9. Identification of Petri Nets • Durzinsky et al have proposed an algorithm that enumerates all Petri Nets consistent with a set of discrete state-pairs • These are called conformal networks • This work has since been extended to a procedure that enumerates conformal extended PNs (i.e. Petri nets with read/write arcs) • Limitations • Does not allow any explicit inclusion of background knowledge, though some constraints are ``hard-wired’’ • Some technical limitations when data are Boolean valued • Unclear whether the technique scales to arbitrary combinations of read/write arcs; and does not extend to other forms of PNs

  10. PN Identification

  11. LGTS and FSMs • With a bound on the number of tokens allowed in each place, the LGTS models for a sequence of observations S. the LGTS model can computed by a DFA (Takahashi, 1992) • The DFA is a transducer that reads zero or one input symbols (observations) and writes out the Tj = (tj , rj, , mj-1 , mj) • This view of an LGTS will be useful when looking at noisy data

  12. LGTS Identification • System states are as in Petri nets (i.e., place-value vectors). • System behaviours are sequences of system states Si = (si,0,si,1,…, si,n) or equivalently, a set of state-pairs {(si,0,si,1), (si,1, si,2),…,(si,n-1, si,n)}. Let StatePairs be the union of the sets of state-pairs for a set of sequences S = {S1,S2,…,Sj}. • An LGTS trace for a state-pair (si,sf) is a set Trace(si,sf) = {T1 , T2 , …,Tk}, where T1 = (t1 , r1, , m0 , m1), T2 = (t2 , r2, , m1, m2), …, Tk = (tk , rk, , mk-1 , mk). • (a) Each tj is a guarded transition; (b) rj = mj –mj-1; (c) si =m0; and (d) sf =mk • m1, m2, …, mk-1 are intermediate states. • An LGTS model for a state-pair (si,sf) is T(si,sf) = {(t,r): (t,r,ma,mb)  Trace(si,sf)}. • Given a set of sequences S = {S1,S2,…,Sj}, let TracePairs be ( ) StatePairs(S) Trace(si,sj). • Then LGTS(S) = {(t,r): (t,r,ma,mb)  Trace(si,sf)}.

  13. System Identification Setting Data Perfect Imperfect Perfect Background Knowledge Imperfect

  14. Identification of LGTSs We can formulate this as logical consequence-finding: • Given: (a) A set of sequences S of states, representing observations of the system behaviour; and (b) Background knowledge B containing generic and domain-specific constraints and definitions of guarded transitions; and (c) the definition of a relation G= lgts(S,T) that is TRUE for all pairs S and T s.t. T is an LGTS model of S, i.e., T = LGTS(S). • Find: All T’s s.t. B  G  T lgts(S,T) If B and G can be encoded as a logic programs, then the T’s can be computed using the usual theorem prover used by logic programming systems.

  15. LGTS Identification: Completeness and Correctness If B is complete and correct, and G is correct, then all T’s that satisfy the equation will be found by the system (refutation-completeness of resolution) Every T found by the system will correctly explain S, in the sense that lgts(S,T) will be TRUE (soundness of resolution) Given a data sequence S, for every (extended or normal) PN found by Durzinsky et al, there is some background knowledge B and an LGTS model T s.t. lgts(S,T) is a logical consequence of B and G

  16. Background Knowledge • The constraints provided as background knowledge can greatly reduce the search-space of possible answers to the system-identification task • For example, we can restrict chemical reactions to those that break no more than 3 bonds (on grounds that any more would require too much energy in a cell) • This along with the mass-balance restrictions can provide very effective constraints on the search

  17. Part 3: Model Identification with Transition Noise

  18. System Identification Setting Data Incomplete Incorrect Perfect LP Perfect Background Knowledge Imperfect

  19. System identification with noisy data Discretiser Sequence of Discrete System States LGTS Model LGTS Trace LGTS model selection Model Filtering PFA LGTS Identifier Background Knowledge Ranked Transition Sequences Automaton Builder Generic and Problem-specific constraints; Guarded transitions Viterbi Estimator 19

  20. Two kinds of incompleteness • Data are missing intermediate states • States are missing place values • Of these, the first can be handled adequately by the capability of obtaining LGTS models with intermediate states. In DFA terms, this means allowing -transitions that do not consume input observations, and still produce T-tuples as outputs • The second kind of incompleteness handled by abduction

  21. System Identification Setting Data Incomplete Incorrect Perfect LP LP Perfect ALP Background Knowledge Imperfect

  22. “Noise” • Chemical equations are symbolic representations of what may happen, not what must happen • Filling a balloon with hydrogen and oxygen will not necessarily result in a balloon full of water vapour (the temperature has to be right) • Reactions are subject to extrinsic and intrinsic sources of “noise” • External conditions may not be suitable • Molecular collisions may not happen properly for a reaction to take place • In addition, data are subject to errors of observation, recording etc.

  23. Noise and System Identification • 3 kinds of incorrectness in the data • Signal noise: time-series data has noise • State noise: values of places has errors • Transition noise: output of transitions do not follow usual patterns • In principle, if we assume all states are the output of some transition, then it is possible to model both (2) and (3) using a discrete probability model • we will use the term transition noise for both kinds of errors

  24. Transition Noise Transitions have some probability of going to unexpected states. Transition-noise: unexpected states are related to the post-state of the transition State-noise: unexpected states are unrelated to the post-state of the transition If transition T = (t,r,spre,spost) then transition non-determinacy gives transition set T’ = (t,r,spre,spost’) where Hamming(spost, spost’) >= 0. A probability distribution on set of T’ gives a probabilistic transition. Implemented in PRISM [4] as a probabilistic automaton (PFA).

  25. LGTS models with noisy data • With noisy data, there may not be any known transition between a pair of noisy states s0 and s1 • That is, with S = (s0,s1), there is no T s.t. B  G   T lgts(S,T) • But, allowing the abduction of new transitions, will allow finding a T • Tnew = (tnew,r,s0,s1) where r = s1 – s0 and guards of tnew are always TRUE • A new transition is abduced for each “unexpected” state-pair • With logic programs this is similar to what is done when extending SLD-resolution to SOLD-resolution [7]

  26. System Identification Setting Data Incomplete Incorrect Perfect LP LP PLP Perfect ALP Background Knowledge Imperfect

  27. PFA Identification from LGTS with Noisy Transitions With abduction, it will always be possible to obtain a T s.t. B  G   T lgts(S,T). The corresponding NFA will contain the abduced transitions as output. But some transitions may be more likely than others From the noisy data sequences we determine the parameters for transitions in a PFA using PRISM (Viterbi probability for an HMM where state pairs are observed data and transitions are internal states). We show on the following slides a worked example

  28. From “Noisy” to Probabilistic Transitions

  29. From “Noisy” to Probabilistic Transitions

  30. From “Noisy” to Probabilistic Transitions

  31. From “Noisy” to Probabilistic Transitions

  32. Experiments Identification evaluation hard on unknown systems, so we use reconstruction 3 standard biological models Water, MAPK and Glycolysis We vary Noise level (low, medium and high) Sample size (small and large) with multiple replicates Implentation LGTS in YAP with data generation and Viterbi estimation in PRISM

  33. Error (FNR) and Viterbi probability of transitions

  34. Transitions in LGTS and probabilistic model

  35. Related Work • Durzinsky et al. (2011) • Petri net identification as optimisation • Inoue (2011) and Inoue et al. (2014) • Learning from interpretation transition • Bioinformatics and systems biology • Probabilistic network identification

  36. Conclusion • Dynamic qualitative model identification • Identification as logical consequence finding using logic programming (DFA) • Transition model incompleteness • Abductive LP (NFA) • Transition model incorrectness • Probabilistic LP (PFA) • Future work • Generalisation of probabilistic transitions

  37. References [1] M. Durzinsky, A. Wagler, and W. Marwan. Reconstruction of extended Petri nets from time series data and its application to signal transduction and to gene regulatory networks. BMC Systems Biology, 5:113, 2011. [2] K. Inoue, T. Ribeiro, and C. Sakama. Learning from interpretation transition. Machine Learning, 94(1):51-79, 2014. [3] R. King, K. Whelan, F. Jones, P. Reiser, C. Bryant, S. Muggleton, D. Kell, and S. Oliver. Functional genomic hypothesis generation and experimentation by a robot scientist. Nature, 427:247-252, 2004. [4] T. Sato and Y. Kameya. PRISM: A symbolic-statistical modeling language. In Proc. 15th Intl. Joint Conf. on Artificial Intelligence (IJCAI97), pp. 1330-1335, 1997. [5] A. Srinivasan and M. Bain. Knowledge-Guided Identification of Petri Net Models of Large Biological Systems. In S. Muggleton, A. Tamaddoni-Nezhad, and F. Lisi, (Eds.), Proc. 21st Intl. Conf. on Inductive Logic Programming (ILP 2011) LNCS 7207 pp. 317-331, Springer, 2012. [6] A. Srinivasan and M. Bain. Identification of Transition-Based Models of Biological Systems using Logic Programming. Technical Report UNSW-CSE-TR-201425, University of New South Wales, Sydney, Australia, 2014. [7] A. Yamamoto. Representing Inductive Inference with SOLD-Resolution. In Proceedings of the IJCAI'97 Workshop on Abduction and Induction in AI, 1997.

More Related