1 / 41

CORAL SEA

CORAL SEA. Workflow. The software “ CORAL SEA “ is a tool to build up the quantitative structure – property / activity relationships ( QSPRs / QSARs ). The representation of the molecular structure that is used in the CORALSEA is SMILES

Download Presentation

CORAL SEA

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CORALSEA Workflow

  2. The software “CORALSEA“ is a tooltobuild up the quantitative structure – property / activityrelationships (QSPRs/QSARs) The representationof the molecularstructurethatisused in the CORALSEA is SMILES = simplifiedmolecularinput-line entry system Fordetails, pleasesee http://www.daylight.com/dayhtml/doc/theory/theory.smiles.html

  3. Hereweusedfor the demo ofCORALSEAourmodelfromarticle“THE DEFINITION OF THE MOLECULAR STRUCTURE FOR POTENTIAL ANTI-MALARIA AGENTS BY THE MONTE CARLO METHOD” Struct. Chem. 2013; 24:1369–1381 You can develop a better model , but now please follow our suggestions.

  4. The first action is the preparation of SMILES file which is the input for CORALSEA Each compound should be represented by (1) The type=[+,-,#]; (2) The ID: it can be CAS (chemical abstract service) or a number; (3) SMILES; and (4) Endpoint value. “+” is indicator of sub-training set; “-” is indicator of calibration set; “#” is indicator of test set. The role of sub-training set is developer of model; The role of calibration set is critic of model; The role of test set is estimator of model. +1 COc1ccc2c(c1)NC(C)=C(CCCCCCC)C2=O 7.332 +2 COc1ccc2c(c1)NC(C)=CC2=O 4.903 +3 O=C1c2ccccc2NC(C)=C1CCCCCCC 6.979 +4 O=C1c2ccccc2NC(C)=C1CCCCCCCCC 7.400 #5 O=C1c3ccccc3NC(C)=C1C2CCCCC2 5.652 -6 O=C1c3ccccc3NC(C)=C1c2ccccc2 6.270 +7 O=C2c3ccccc3NC(C)=C2Cc1ccccc1 5.207 +8 O=C1c2ccccc2NC(C)=C1Br 7.110 -9 O=C1c2ccccc2NC(C)=C1\C=C\CCCCCCC 7.824 +10 C=C(CCCCCCC)C=1C(=O)c2ccccc2NC=1C 7.472 +12 O=C2c3ccccc3NC(C)=C2/C=C/c1ccccc1 5.827 +13 COc1ccc2NC(C)=C(Br)C(=O)c2c1 5.934 -14 Cc1ccc2NC(C)=C(Br)C(=O)c2c1 6.583 #15 Brc1ccc2NC(C)=C(Br)C(=O)c2c1 6.470 +17 Fc1ccc2NC(C)=C(Br)C(=O)c2c1 6.903 +18 Clc1ccc2NC(C)=C(C#CCCCC)C(=O)c2c1 4.336 #19 COc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.675 -21 COc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 5.859 -22 COc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.295 -23 COc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 6.570 +24 COc3cccc1c3NC(C)=C(C1=O)c2ccccc2 5.779 -25 Clc2cccc3NC(C)=C(Cc1ccccc1)C(=O)c23 5.279 #26 Clc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 5.485 #28 Clc1cccc2NC(C)=C(C(=O)c12)c3ccccc3 5.324 -29 Clc1ccc2NC(C)=C(C(=O)c2c1)c3ccccc3 6.110 -30 Clc1ccc2c(c1)NC(C)=C(C2=O)c3ccccc3 5.731 -31 Clc1ccc2NC(C)=C(C(=O)c2c1Cl)c3ccccc3 5.493 #33 Clc1cc2NC(C)=C(C(=O)c2c(Cl)c1)c3ccccc3 5.464 #34 COc1ccc3c(c1)C(=O)C(Cc2ccccc2)=C(C)N3C 5.094 +35 COc1ccc3c(c1)N(C)C(C)=C(Cc2ccccc2)C3=O 5.106 +36 Fc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.081 +37 Clc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.815 +38 Brc1cc2c(cc1OC)NC(C)=C(C2=O)c3ccccc3 7.602 #39 Fc1cc2c(cc1OC)NC(C)=C(CC)C2=O 6.793 +41 Brc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.440 -44 Clc1cc2c(cc1OC)NC(C)=C(C2=O)C3CCCCC3 6.401 +45 Clc1cc3c(cc1OC)NC(C)=C(Cc2ccccc2)C3=O 7.164 -46 Clc1cc2c(cc1OC)NC(C)=C(C)C2=O 7.564 #47 CC(C)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 6.712 +48 CC(CC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.199 +49 Clc1cc2c(cc1OC)NC(C)=CC2=O 5.731 -50 Clc1cc2c(cc1OC)NC(C)=C(C#CCCCC)C2=O 5.376 #53 CC(C)(C)OC(=O)/C=C/C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.271 MyFile.txt

  5. It is a good idea to reserve some substances as "invisible" validation set for final estimation of the model 10 *11 O=C1c2ccccc2NC(C)=C1C\C=C\CCCCCC 6.728 *16 Clc1ccc2NC(C)=C(Br)C(=O)c2c1 6.900 *20 COc2ccc3NC(C)=C(Cc1ccccc1)C(=O)c3c2 4.624 *27 Clc1ccc3c(c1)NC(C)=C(Cc2ccccc2)C3=O 4.805 *32 Clc1cc2c(cc1Cl)NC(C)=C(C2=O)c3ccccc3 6.456 *40 Clc1cc2c(cc1OC)NC(C)=C(CC)C2=O 7.559 *42 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCC)C2=O 8.530 *43 Clc1cc2c(cc1OC)NC(C)=C(CCCCCCCCC)C2=O 8.779 *51 C=C(CCCCC)C=1C(=O)c2cc(Cl)c(cc2NC=1C)OC 7.830 *52 Clc1cc2c(cc1OC)NC(C)=C(\C=C\CCCCC)C2=O 7.975 • Format of file for this validation is the following: • The number of compounds; • (2) list of compounds in the above-mentioned format type-ID-SMILES-Endpoint values. MyInput.txt

  6. In order to start your work you must download CORALSEA.zip from www.insilico.eu/coral When it is done, you must insert folder "CORALSEA" in your computer:

  7. …and insert your data (i.e. “MyTRNCLBTST.txt”) in folder “MyCORALSEA”:

  8. Containing of MyCORALSEA is the following:

  9. In order to carry out QSPR/QSAR analysis of data represented for CLASSIFICATION MODEL one should do the following: • Insert “#TRNCLBTST-1.txt” in the folder; • Insert “#Input-1.txt” in the folder. • Click CORALSEA.exe. “#TRNCLBTST.txt-is file whichcontains training (TRN), calibration(CLB) ,and test(TST) sets #Input.txt is data which are not visible during building up model

  10. It appears in your screen: Click Button “Loadmethod”…

  11. It appears in your screen: 1 3 2 Insertname “#TRNCLBTST-1.txt” in text box

  12. It appears in your screen: Click “ SAVE SYSTEM”

  13. It appears in your screen: Restartprogram and Click “Load system”

  14. It appears in your screen: Click “OK”

  15. It appears in your screen: This plot relatesto the external “invisible” validation set

  16. It appears in your screen: File “#Output-1.txt containsstatisticalcharacteristicsfor the validation set (#Output-1.txt isplaced in folder “Model”)

  17. In order to carry out QSPR/QSAR analysis of data represented for REGRESSION MODEL one should do the following: • Insert “#TRNCLBTST.txt” in the folder; • Insert “#Input-1.txt” in the folder. • Click CORALSEA.exe. “#TRNCLBTST.txt-is file whichcontains training (TRN), calibration(CLB) ,and test(TST) sets #Input.txt is data which are not visible during building up model

  18. It appears in your screen: INSERT SELECT Insert name “#TRNCLBTST-1.txt” in text box. After this, please select “Classic Scheme” or “Balance of Correlation” for your QSPR/QSAR investigation

  19. It appears in your screen: 1 2 Twoactions: (1) defineMethod and (2)Savemethod

  20. It appears in your screen: 1 2 You can involve graphinvariants in additionto SMILES attributes

  21. It appears in your screen: You can use “classicscheme”, balanceofcorrelations, and Idealslopes C1,C1’

  22. It appears in your screen: 3 1 1 2 You can choice your mode e.g. (1) Define Dstart=0.25 ; (2) Nepoch=20; after this you must do (3) Click “Save method”, otherwise method remains the same

  23. It appears in your screen: Click “Searchforpreferablemodel (T*,N*)”

  24. It appears in your screen: Programmwillcarry out the Monte Carlo optimizationwithvariousthreshold and the numberofepochs. The preferablevaluesofthreshold and the numberofepochsone can find in file “Search/BestMDL.txt” when the calculationwillbecompleted.

  25. The containing of file “search/BestMDL.txt” will be approximately the following: One can see that preferable threshold (T*) is 2, and the preferable number of epochs (N*) is 15. One can use this information to build up robust model.

  26. An attempt to build up robust model… • Create Folder “MyCORALSEA-T2-N15” (copy of “MyCORALSEA”) • RunCORALSEA.exe in this folder “MyCORALSEA-T2-N15” • Click “Loadmethod”

  27. It appears in your screen: 2 4 3 1 T*=2 N*=15 • Insert Nepoch=15, • (2) Click “Building up preferablemodel (T*,N*)” (3)Insert Threshold=2, and (4) Click “Continue”

  28. It appears in your screen: Click “Yes”

  29. Gradually the program will be calculating the model :

  30. When the model will be ready the screen will be the following : Click “Save system”

  31. Folder “Model” contains parameters of the QSPR/QSAR model File “#Output-1.txt containsstatisticsfor the invisiblevalidation set

  32. When the model will be ready the screen will be the following : Click “Load system”

  33. It will appear at the screen 2 1 MyInput.txt • Insertname “MyInput.txt” insteadof “#Input-1.txt” • (2) Click “Start of DCW and Endpointcalculationfor SMILES input file”

  34. It will appear at the screen Aftertheseactions, file “model/Output.txt” willcontainresultsofcalculationforcompoundsfrom “MyInput.txt” Click “OK”

  35. It will appear at the screen You will see graphical representation for sub-training, calibration, test, and validation sets.

  36. The containing of the “model/Output.txt” will be the following: Last, butnotleast…

  37. One can calculate model for individual SMILES 1 2 • Insert SMILES in indicated box; • (2) Click “Start of DCW and EndpointCalculationforInserted SMILES”

  38. It appears in your screen: See file “Model/DemoDesc.txt”

  39. The Containing of “Model/DemoDesc.txt” is the following: DCW is DCW(2,15) for NC(CCCNC(N)=N)C(O)=O; Endpoint=2.9412. This example is only demo, the NC(CCCNC(N)=N)C(O)=O is apparently out of Domain of applicability.

  40. These slides have shown the "technology", but to understand "philosophy", please read file "ReadMe.pdf"

  41. Some definitions Thank you for your attention ! CORALSEA TEAM

More Related