1 / 29

MayaChemTools: An open source package for computational discovery Manish Sud

MayaChemTools: An open source package for computational discovery Manish Sud. COMP Poster #306, 243rd ACS National Meeting & Exposition, March 25-29 2012, San Diego, CA. Introduction.

adie
Download Presentation

MayaChemTools: An open source package for computational discovery Manish Sud

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MayaChemTools: An open source package for computational discovery Manish Sud COMP Poster #306, 243rd ACS National Meeting & Exposition, March 25-29 2012, San Diego, CA

  2. Introduction • A growing collection of Perl scripts, modules and classes to support day-to-day computational drug discovery needs • Freely available under the terms of the LGPL license at www.MayaChemTools.org

  3. Introduction • Manipulation and analysis of data in SD, CSV/TSV, sequence/alignments, PDB and fingerprints files • Properties of periodic table elements, amino acids and nucleic acids • Calculation of physicochemical properties such as hydrogen bond donors and acceptors, SLogP and topological polar surface area • Generation of fingerprints corresponding to atom neighborhoods, atom types, E-state indicies, extended connectivity, MACCS keys, path lengths, topological atom pairs/triplets/torsions and topological pharmacophore atom pairs/triplets • Similarity searching and calculation of similarity matrices • An extensive set of modules and classes available for custom development

  4. Software architecture Out of the box scripts Custom scripts bin Classes Modules & Packages lib Data files Third party: Jmol lib/data, lib/Jmol

  5. Physicochemical properties profiling

  6. Physicochemical properties profiling SD files Calculate Physicochemical Properties.pl Analyze data & generate plots

  7. Physicochemical properties profiling Distribution of physicochemical properties for a subset (7447) of NCGC pharmaceutical collection data set Scripts used: FilterSDFiles.pl, ExtractFromSDFiles.pl, ExtractFromTextFiles.pl, CalculatePhysicochemicalProperties.pl, Rscript; Data set URL: tripod.nih.gov/npc

  8. Physicochemical properties profiling Distribution of physicochemical properties for a subset (7447) of NCGC pharmaceutical collection data set Scripts used: FilterSDFiles.pl, ExtractFromSDFiles.pl, ExtractFromTextFiles.pl, CalculatePhysicochemicalProperties.pl, Rscript; Data set URL: tripod.nih.gov/npc

  9. 2D Fingerprints Atom identifier atom types: Atomic invariants, Functional class, DREIDING, EState, MMFF94, SLogP, SYBYL, TPSA and UFF Atomic invariants:AS(Atom symbol), X(Num of heavy atom neighbors), BO(Sum of bond orders to heavy atoms), LBO(Largest bond order to heavy atoms), SB(Num of single bonds to heavy atoms), DB(Num of double bonds to heavy atoms), TB(Num of Triple bonds to heavy atoms), H(Num of implicit and explicit hydrogens), Ar (Aromatic), RA(Ring atom), FC(Formal charge), MN(Mass number), SM(Spin multiplicity) Functional class:HBD(Hydrogen bond donor), HBA(Hydrogen bond acceptor), PI(Positively ionizable), NI(Negatively ionizable), Ar(Aromatic), Hal(Halogen), H(Hydrophobic), RA(RingAtom), CA(ChainAtom)

  10. 2D Fingerprints Atom identifier atom types: Atomic invariants, Functional class, DREIDING, EState, MMFF94, SLogP, SYBYL, TPSA and UFF Atomic invariants:AS(Atom symbol), X(Num of heavy atom neighbors), BO(Sum of bond orders to heavy atoms), LBO(Largest bond order to heavy atoms), SB(Num of single bonds to heavy atoms), DB(Num of double bonds to heavy atoms), TB(Num of Triple bonds to heavy atoms), H(Num of implicit and explicit hydrogens), Ar (Aromatic), RA(Ring atom), FC(Formal charge), MN(Mass number), SM(Spin multiplicity) Functional class:HBD(Hydrogen bond donor), HBA(Hydrogen bond acceptor), PI(Positively ionizable), NI(Negatively ionizable), Ar(Aromatic), Hal(Halogen), H(Hydrophobic), RA(RingAtom), CA(ChainAtom)

  11. 2D Fingerprints SD files Generate fingerprints 2D fingerprints SD, FP, CSV/TSV MACCSKeysFingerprints.pl, ExtendedConnectivityFingerprints.pl,PathLengthFingerprints.pl, TopologicalPharmacophoreAtomPairs.pl,… … …

  12. Fingerprints comparisons Fingerprints bit-vectors: Nt = Num of bits set to "1" or "0" in A and B Nt = Na + Nb - Nc + Nd Na = Num of bits set to "1" in A Nb = Num of bits set to "1" in B Nc = Num of bits set to "1" in both A and B Nd = Num of bits set to "0" in both A and B Na -Nc = Num of bits set to “1” in A not in B Nb - Nc = Num of bits set to “1” in B not in A

  13. Fingerprints comparisons Fingerprints bit-vectors: Nt = Num of bits set to "1" or "0" in A and B Nt = Na + Nb - Nc + Nd Na = Num of bits set to "1" in A Nb = Num of bits set to "1" in B Nc = Num of bits set to "1" in both A and B Nd = Num of bits set to "0" in both A and B Na -Nc = Num of bits set to “1” in A not in B Nb - Nc = Num of bits set to “1” in B not in A

  14. Fingerprints comparisons Fingerprints vectors containing ordered numerical, numerical or alphanumerical values: N = Num of values SUM = Sum over values Xa = Values of vector A Xai= Value of ith element in A Xb = Values of vector B Xbi = Value of ith element in B Na = Num of bits set to "1" in A = SUM(Xai) Nb = Num of bits set to "1" in B = SUM(Xbi) Nc = Num of bits set to "1" in both A and B = SUM(Xai*Xbi) Nd = Num of bits set to "0" in both A and B = SUM(1 - Xai - Xbi + Xai*Xbi) SetIntersectionXaXb = SUM(MIN(Xai, Xbi)) SetDifferenceXaXb = SUM(Xa)+ SUM(Xb) - SUM(MIN(Xai, Xbi))

  15. Fingerprints comparisons Fingerprints vectors containing ordered numerical, numerical or alphanumerical values: N = Num of values SUM = Sum over values Xa = Values of vector A Xai= Value of ith element in A Xb = Values of vector B Xbi = Value of ith element in B Na = Num of bits set to "1" in A = SUM(Xai) Nb = Num of bits set to "1" in B = SUM(Xbi) SetIntersectionXaXb = SUM(MIN(Xai, Xbi)) SetDifferenceXaXb = SUM(Xa)+ SUM(Xb) - SUM(MIN(Xai, Xbi))

  16. Similarity matrices Fingerprints SD, FP, CSV/TSV Similarity Matrices Fingerprints.pl Similarity matrix: full, upper or lower CSV/TSV

  17. Similarity matrices Scripts used: ExtendedConnectivityFingerprints.pl, SimilarityMatricesFingerprints.pl, TextFilesToHTML.pl

  18. Similarity searching Reference fingerprints SD, FP, CSV/TSV Database fingerprints Similarity Searching Fingerprints.pl Neighbors of reference compounds SD, FP, CSV/TSV

  19. Similarity searching Scripts used: PathLengthFingerprints.pl, SimilaritySearchingFingerprints.pl, SDFilesToHTML.pl

  20. File data info, manipulation & analysis SD Analyze, Extract, Filter, Info, Join, Merge, Modify, ToHTML, ToMOL, Sort, Split SD, CSV/TSV text or HTML Input files Operations Output files

  21. File data info, manipulation & analysis CSV/TSV text Analyze, Extract, Info, Join, Merge, Modify, Sort, Split, ToHTML, ToSD CSV/TSV text, or HTML Input files Operations Output files

  22. File data info, manipulation & analysis Sequence & alignment Analyze, Extract, Info Sequence & alignment Input files Operations Output files

  23. File data info, manipulation & analysis PDB Extract, Info, Modify PDB Input files Operations Output files

  24. Data retrieval from databases DBSQLToTextFiles.pl DBSchemaTablesToTextFiles.pl DBTablesToTextFiles.pl CSV/TSV text files Perl DBI

  25. Information for periodic table elements Input: Name, symbol, number, group name/number, group label, period number InfoPeriodicTableElements.pl Atomic number: 6 Element symbol: C Element name: Carbon Atomic weight: 12.0107 … … …

  26. Information for amino acids Input: One letter code, three letter code, Name InfoAminoAcids.pl Three letter code: Glu One letter code: E Name: Glutamic acid Molecular weight: 147.1308 ... ... …

  27. Information for nucleic acids Input: Code, Name, Type InfoNucleicAcids.pl Code: Ado Other codes: A Name: Adenosine Type: Nucleoside Molecular weight: 267.2413 ... … …

  28. Your feedback is welcome: msud@san.rr.com

  29. The End

More Related