1 / 14

Training pK a and logP prediction

Training pK a and logP prediction. Jozsef Szegezdi. Solutions for Cheminformatics. logP calculation models in Marvin. Three models are provided in Marvin. They share the same atom type definitions taken from. Viswanadhan, V. N., et al. J.Chem.Inf.Comput.Sci. , 1989 , 29 , 3, 163-172;.

nysa
Download Presentation

Training pK a and logP prediction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Training pKa and logP prediction Jozsef Szegezdi Solutions for Cheminformatics

  2. logP calculation models in Marvin Three models are provided in Marvin. They share the same atom type definitions taken from Viswanadhan, V. N., et al. J.Chem.Inf.Comput.Sci., 1989, 29, 3, 163-172; Unfortunately we can not tellin advance which model will be better for a molecule if it is not included in the training set.

  3. Problem with logP models Frequently occuring problems of constructing logP models - logP training set size is too small - logP trainingset is unrepresentative - Specification of atom types and interactions is subjective - The number of logP parameters is restricted in order to ensure the ‘predictive power’ As a result, there will be missing interactions and atom types for the models.

  4. 2.03 -0.77 1.51 0.25 0.88 -0.31 4.57 1.29 3.77 3.00 1.28 2.62 1.48 1.19 -0.92 1.23 -3.24 1.79 -1.04 -1.76 2.85 1.46 0.16 0.15 Example for creating a local logP model 0.88

  5. Example for creating a local logP model The logP of the molecules calculated with the standard weighted method which is shown on the figure below. The ‘principal of uniformity of nature’ would say thatother ‘OH’ containig molecules could be predicted reasonably by the standard ‘weighted’ method. Is it true? We test this with the ‘hydroquinone’ molecule.

  6. Test of standard models The logP value of hydroquinone is 0.59. The next table summarizes the ‘logP’ errors of the standard models. Error of the standard models is relatively large. How can one improve the accuracy of the predicition? Prediction error can be reduced by creating a local model using linear regression for the 25 molecules mentioned above. Command line call for creating the local model: cxcalc -T logP -t LOGP –o logPparameters.txt training25.sdf

  7. User’s model The logP value of 25 molecules containing ‘OH’ groups calculated with the ‘user defined’ method after logP training on the figure below. Comparision of the standard and the user model The user-trained local model based on 25 molecules outperforms all of the standard models.

  8. Conclusions The local model based on 25 molecules is more accurate than any of the standard global models. Depending on the training set different parameter values will be assigned to the same atom type. This is one of the main characteristics of the user model. A ‘carefully’ created set of local models must be superior to any ‘large’ model. We plan to develop a model that combines many local models.

  9. 4.30 2.49 10.28 5.10 Apparent pKa and ionization%-pH curve The ionization % -pH curve denoted with blue color for basic centers and with red color for acidic centers.

  10. Method for predicting pKa and training • Marvin’s prediction model considers: • partial charges • polarizability • effect of ionizable centers on each others • Training refines the existing parameters for ionizable centers and at the same time creates new modifier parameters based on structures and experimental values specified by the user.

  11. Example for training pKa prediction 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

  12. Experimental vs. calculated pKa values

  13. Curating experimental pKa data The input ‘sdf’ file may be created in IJC The teaching can be run using this command line : cxcalc –T pka –o c:/output InputpKadata.sdf

  14. Conclusions • User defined pKa model is more accurate then the built-in default model. • IJC can be used for curating input data for the training. • The new model is only a refinement of the default model, so the training assumes a robust base model that is provided in Marvin.

More Related