Abstract

AN APPROACH TO DETERMINE THE APPLICATION DOMAIN OF GROUP CONTRIBUTION MODELSNina Jeliazkova1 Joanna Jaworska2, (2) Central Product Safety, Procter & Gamble, Brussels, Belgium(1) IPP, Bulgarian Academy of Sciences, Sofia, Bulgaria Abstract There is a practical need for an automatic (computerized) procedure to find out the application domain of a QSAR model. In this paper we attempt to address this need and focus on defining the application domain of group contribution methods. These methods are characterized by high number of descriptors i.e. high dimensionality. For feasibility reasons we propose to estimate the application domain as the parameter space, bounded by the training set parameter ranges. Then, we demonstrate how to practically apply this approach using the Syracuse Research Corporation KOWWIN model as an example. Methods Atom Fragment Contribution (AFC) method SRC KOWWIN full text output • Uses counts of fragments as descriptors; • Uses very simple fragments (each non-hydrogen atom is a core for fragment; this minimizes the possibility of missing fragments); • In addition to simple fragments uses correction (these are complex fragments always larger than a single atom) • Two-stage multivariate regression SMILES : Oc(c(cc(c1)Cc(cc(c(O)c2C(C)(C)C)C(C)(C)C)c2)C(C)(C)C)c1C(C)(C)C CHEM : Phenol, 4,4'-methylenebis 2,6-bis(1,1-dimethylethyl)- MOL FOR: C29 H44 O2 MOL WT : 424.67 -------+-----+--------------------------------------------+---------+-------- TYPE | NUM | LOGKOW FRAGMENT DESCRIPTION | COEFF | VALUE -------+-----+--------------------------------------------+---------+-------- Frag | 12 | -CH3 [aliphatic carbon] | 0.5473 | 6.5676 Frag | 1 | -CH2- [aliphatic carbon] | 0.4911 | 0.4911 Frag | 12 | Aromatic Carbon | 0.2940 | 3.5280 Frag | 2 | -OH [hydroxy, aromatic attach] |-0.4802 | -0.9604 Frag | 4 | -tert Carbon [3 or more carbon attach] | 0.2676 | 1.0704 Factor| 1 | -CH2- (aliphatic), 2 phenyl attach correc |-0.2326 | -0.2326 Factor| 2 | Ring rx: -OH / di-ortho;sec- or t- carbon |-0.8500 | -1.7000 Const | | Equation Constant | | 0.2290 -------+-----+--------------------------------------------+---------+-------- Log Kow = 8.9931 fi - the coefficient for each fragment; ni - the number of times the fragment occurs in the structure; cj - the coefficient for each correction factor; nj - the number of times, the correction factor occurs or is applied in the structure Data Approach • Approximate application domain by ranges determined from the training set: • Fragment and correction factors range • Log Kow range because the combination of fragments is out of range • Analyse KOWWIN training set and obtain fragment and correction factor statistics for training and validation sets • Compare training and validation set of KOWWIN model KOWWIN training set and validation set were provided by Syracuse Research Corp. A software was developed in order to read the full text output of SRC KOWWIN program and extract the fragment and correction factor statistics of training and validation set An excerpt from the 508 fragment list for the KOWWIN and its representation in training and validation sets Overlay between training and validation sets Discussion Application domain and prediction error • The AFC method is representative of group contribution methods, which have two inherent fundamental assumptions: • Additivity - implies that each of the structural components of a compound makes a separate and additive contribution to the property of interest for the compound. Additivity is widely agreed hypothesis, with supporting evidence from empirical studies and contemporary quantum theories. • Transferability - assumes that these contributions are the same across a wide variety of compounds. • The property of a single compound is modelled as a sum of the contributions associated with an atom or fragment (additivity) assuming that the contributions of the identical atoms or fragments are the same as that in the original compounds used to develop these contributions (transferability). • Assumptions failures examples: • molecules where the same fragment occurs many times in a molecule (e.g. a long aliphatic chain) – additivity exceeded beyond training set. • molecules with “uncommon” functional groups because transferability is difficult to establish because of poor statistics. • Complex structures are not always sufficiently represented, because the AFC method uses very simple fragments (e.g. compounds with large aliphatic rings are treated like aliphatic chains). • The average prediction error outside application domain defined by the training set ranges is twice larger than the prediction error inside the domain. Note that it is true only on average, i.e. there are many individual compounds with low error outside of the domain, as well as individual compounds with high error inside the domain. • The training space as defined by fragment and correction factor ranges consists of 5.44E+41 unique points. Of this enormous space the training set uses only 2113 unique points (some of the 2434 points coincide). This means only 3.88E-37 % of the training space is covered by the training set points! Given good practical experience with the model means that additivity assumption is working within the training set space. These observations support the view that to determine the applicability of a (QSAR) model it is essential to evaluate the model assumptions.

Abstract

Abstract

Presentation Transcript

Abstract

Abstract

Abstract

Abstract

Abstract

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

Abstract

ABSTRACT

ABSTRACT

Abstract

Abstract

Abstract

ABSTRACT THE ABSTRACT / TUTORIALOUTLETDOTCOM

Abstract