420 likes | 683 Views
CORAL SEA. Workflow. The software “ CORAL SEA “ is a tool to build up the quantitative structure – property / activity relationships ( QSPRs / QSARs ). The representation of the molecular structure that is used in the CORALSEA is SMILES
E N D
CORALSEA Workflow
The software “CORALSEA“ is a tooltobuild up the quantitative structure – property / activityrelationships (QSPRs/QSARs) The representationof the molecularstructurethatisused in the CORALSEA is SMILES = simplifiedmolecularinput-line entry system Fordetails, pleasesee http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html
Hereweusedfor the demo ofCORALSEAourmodelfromarticle“THE DEFINITION OF THE MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA AGENTS BY THE MONTE CARLO METHOD” Struct. Chem. 2013; 24:1369–1381 You can develop a better model , but now please follow our suggestions.
The first action is the preparation of SMILES file which is the input for CORALSEA Each compound should be represented by (1) The type=[+,-,#]; (2) The ID: it can be CAS (chemical abstract service) or a number; (3) SMILES; and (4) Endpoint value. “+” is indicator of sub-training set; “-” is indicator of calibration set; “#” is indicator of test set. The role of sub-training set is developer of model; The role of calibration set is critic of model; The role of test set is estimator of model. +1 COc1ccc2c(c1)NC(C)=C(CCCCCCC)C2=O 7.332 +2 COc1ccc2c(c1)NC(C)=CC2=O 4.903 +3 O=C1c2ccccc2NC(C)=C1CCCCCCC 6.979 +4 O=C1c2ccccc2NC(C)=C1CCCCCCCCC 7.400 #5 O=C1c3ccccc3NC(C)=C1C2CCCCC2 5.652 -6 O=C1c3ccccc3NC(C)=C1c2ccccc2 6.270 +7 O=C2c3ccccc3NC(C)=C2Cc1ccccc1 5.207 +8 O=C1c2ccccc2NC(C)=C1Br 7.110 -9 O=C1c2ccccc2NC(C)=C1\C=C\CCCCCCC 7.824 +10 C=C(CCCCCCC)C=1C(=O)c2ccccc2NC=1C 7.472 +12 O=C2c3ccccc3NC(C)=C2/C=C/c1ccccc1 5.827 +13 COc1ccc2NC(C)=C(Br)C(=O)c2c1 5.934 -14 Cc1ccc2NC(C)=C(Br)C(=O)c2c1 6.583 #15 Brc1ccc2NC(C)=C(Br)C(=O)c2c1 6.470 +17 Fc1ccc2NC(C)=C(Br)C(=O)c2c1 6.903 +18 Clc1ccc2NC(C)=C(C#CCCCC)C(=O)c2c1 4.336 #19 COc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.675 -21 COc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 5.859 -22 COc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.295 -23 COc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 6.570 +24 COc3cccc1c3NC(C)=C(C1=O)c2ccccc2 5.779 -25 Clc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.279 #26 Clc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 5.485 #28 Clc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.324 -29 Clc1ccc2NC(C)=C(C(=O)c2c1)c3ccccc3 6.110 -30 Clc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 5.731 -31 Clc1ccc2NC(C)=C(C(=O)c2c1Cl)c3ccccc3 5.493 #33 Clc1cc2NC(C)=C(C(=O)c2c(Cl)c1)c3ccccc3 5.464 #34 COc1ccc3c(c1)C(=O)C(Cc2ccccc2)=C(C)N3C 5.094 +35 COc1ccc3c(c1)N(C)C(C)=C(Cc2ccccc2)C3=O 5.106 +36 Fc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.081 +37 Clc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.815 +38 Brc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.602 #39 Fc1cc2c(cc1OC)NC(C)=C(CC)C2=O 6.793 +41 Brc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.440 -44 Clc1cc2c(cc1OC)NC(C)=C(C2=O)C3CCCCC3 6.401 +45 Clc1cc3c(cc1OC)NC(C)=C(Cc2ccccc2)C3=O 7.164 -46 Clc1cc2c(cc1OC)NC(C)=C(C)C2=O 7.564 #47 CC(C)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 6.712 +48 CC(CC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.199 +49 Clc1cc2c(cc1OC)NC(C)=CC2=O 5.731 -50 Clc1cc2c(cc1OC)NC(C)=C(C#CCCCC)C2=O 5.376 #53 CC(C)(C)OC(=O)/C=C/C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.271 MyFile.txt
It is a good idea to reserve some substances as "invisible" validation set for final estimation of the model 10 *11 O=C1c2ccccc2NC(C)=C1C\C=C\CCCCCC 6.728 *16 Clc1ccc2NC(C)=C(Br)C(=O)c2c1 6.900 *20 COc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 4.624 *27 Clc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 4.805 *32 Clc1cc2c(cc1Cl)NC(C)=C(C2=O)c3ccccc3 6.456 *40 Clc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.559 *42 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCC)C2=O 8.530 *43 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCCCC)C2=O 8.779 *51 C=C(CCCCC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.830 *52 Clc1cc2c(cc1OC)NC(C)=C(\C=C\CCCCC)C2=O 7.975 • Format of file for this validation is the following: • The number of compounds; • (2) list of compounds in the above-mentioned format type-ID-SMILES-Endpoint values. MyInput.txt
In order to start your work you must download CORALSEA.zip from www.insilico.eu/coral When it is done, you must insert folder "CORALSEA" in your computer:
…and insert your data (i.e. “MyTRNCLBTST.txt”) in folder “MyCORALSEA”:
In order to carry out QSPR/QSAR analysis of data represented for CLASSIFICATION MODEL one should do the following: • Insert “#TRNCLBTST-1.txt” in the folder; • Insert “#Input-1.txt” in the folder. • Click CORALSEA.exe. “#TRNCLBTST.txt-is file whichcontains training (TRN), calibration(CLB) ,and test(TST) sets #Input.txt is data which are not visible during building up model
It appears in your screen: Click Button “Loadmethod”…
It appears in your screen: 1 3 2 Insertname “#TRNCLBTST-1.txt” in text box
It appears in your screen: Click “ SAVE SYSTEM”
It appears in your screen: Restartprogram and Click “Load system”
It appears in your screen: Click “OK”
It appears in your screen: This plot relatesto the external “invisible” validation set
It appears in your screen: File “#Output-1.txt containsstatisticalcharacteristicsfor the validation set (#Output-1.txt isplaced in folder “Model”)
In order to carry out QSPR/QSAR analysis of data represented for REGRESSION MODEL one should do the following: • Insert “#TRNCLBTST.txt” in the folder; • Insert “#Input-1.txt” in the folder. • Click CORALSEA.exe. “#TRNCLBTST.txt-is file whichcontains training (TRN), calibration(CLB) ,and test(TST) sets #Input.txt is data which are not visible during building up model
It appears in your screen: INSERT SELECT Insert name “#TRNCLBTST-1.txt” in text box. After this, please select “Classic Scheme” or “Balance of Correlation” for your QSPR/QSAR investigation
It appears in your screen: 1 2 Twoactions: (1) defineMethod and (2)Savemethod
It appears in your screen: 1 2 You can involve graphinvariants in additionto SMILES attributes
It appears in your screen: You can use “classicscheme”, balanceofcorrelations, and Idealslopes C1,C1’
It appears in your screen: 3 1 1 2 You can choice your mode e.g. (1) Define Dstart=0.25 ; (2) Nepoch=20; after this you must do (3) Click “Save method”, otherwise method remains the same
It appears in your screen: Click “Searchforpreferablemodel (T*,N*)”
It appears in your screen: Programmwillcarry out the Monte Carlo optimizationwithvariousthreshold and the numberofepochs. The preferablevaluesofthreshold and the numberofepochsone can find in file “Search/BestMDL.txt” when the calculationwillbecompleted.
The containing of file “search/BestMDL.txt” will be approximately the following: One can see that preferable threshold (T*) is 2, and the preferable number of epochs (N*) is 15. One can use this information to build up robust model.
An attempt to build up robust model… • Create Folder “MyCORALSEA-T2-N15” (copy of “MyCORALSEA”) • RunCORALSEA.exe in this folder “MyCORALSEA-T2-N15” • Click “Loadmethod”
It appears in your screen: 2 4 3 1 T*=2 N*=15 • Insert Nepoch=15, • (2) Click “Building up preferablemodel (T*,N*)” (3)Insert Threshold=2, and (4) Click “Continue”
It appears in your screen: Click “Yes”
When the model will be ready the screen will be the following : Click “Save system”
Folder “Model” contains parameters of the QSPR/QSAR model File “#Output-1.txt containsstatisticsfor the invisiblevalidation set
When the model will be ready the screen will be the following : Click “Load system”
It will appear at the screen 2 1 MyInput.txt • Insertname “MyInput.txt” insteadof “#Input-1.txt” • (2) Click “Start of DCW and Endpointcalculationfor SMILES input file”
It will appear at the screen Aftertheseactions, file “model/Output.txt” willcontainresultsofcalculationforcompoundsfrom “MyInput.txt” Click “OK”
It will appear at the screen You will see graphical representation for sub-training, calibration, test, and validation sets.
The containing of the “model/Output.txt” will be the following: Last, butnotleast…
One can calculate model for individual SMILES 1 2 • Insert SMILES in indicated box; • (2) Click “Start of DCW and EndpointCalculationforInserted SMILES”
It appears in your screen: See file “Model/DemoDesc.txt”
The Containing of “Model/DemoDesc.txt” is the following: DCW is DCW(2,15) for NC(CCCNC(N)=N)C(O)=O; Endpoint=2.9412. This example is only demo, the NC(CCCNC(N)=N)C(O)=O is apparently out of Domain of applicability.
These slides have shown the "technology", but to understand "philosophy", please read file "ReadMe.pdf"
Some definitions Thank you for your attention ! CORALSEA TEAM