1 / 38

Crystallographic Informatics

Crystallographic Informatics. Similarity and Statistics. Simon Coles Associate Professor Director, UK National Crystallography Service Dr Graham Tizzard (UK National Crystallography Service) (Dr) Philip Adler (now Haverford College, PA) ACS Spring Meeting 2016, San Diego.

toothman
Download Presentation

Crystallographic Informatics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Crystallographic Informatics Similarity and Statistics Simon Coles Associate Professor Director, UK National Crystallography Service Dr Graham Tizzard (UK National Crystallography Service) (Dr) Philip Adler (now Haverford College, PA) ACS Spring Meeting 2016, San Diego

  2. It’s a Grand Challenge!

  3. Two Approaches • Beyond the Molecule • Conventional wisdom: ‘its all about h-bonding’ • Hasn’t ‘shape’ been forgotten in all this hype? • Intermolecular interactions are about so much more than directed ‘bonding’ • Chemical, and hence solid-state, space is sparsely populated… • Build tailor-made families of homologous compounds • Adopt a more holistic, statistical approach

  4. Engineering interactions • Substitute systematically with F. Simple synthesis. Simple shape - stacking dimers? • Direct complementary H…F overlap • Variety of complementary H…F overlap • Frustrated or clashing F…F overlap

  5. F-Substitution Possibilities 100% complementary H…F overlap 100% clashing F…F overlap Varying degree of clashing F…F or overlapping H…F

  6. Some expectations come true 0-23456 345-26 • 100% Complementary H…F Overlap • 100% Clashing F…F Overlap • Varying Clashing & Overlap 25-25 35-35 236-24 245-25

  7. Avoidance tactics? Tapes - short “side on” interactions (ca 20 observations) Threads - short “end on”interactions (25 observations)

  8. Isostructurality • 35-246, 4-2356

  9. Manual comparisons • Unpredicted motifs found - painfully • Some predominant themes occur simultaneously in the same structure • Need a way to automate and scale the process • Independent of ‘traditional’ synthon-type approaches

  10. High Throughput Systematics X Y Cryst. Eng. Comm., 2005

  11. X~IX~CF3CH3~CF3(X = CF3, I, Br, Cl, F, H) Br~Br(ii) Br~Br (i)I~BrCF3~Cl Br~Br (iii)I~ClI~Br (ii)I~I (iii) CN~Br (i)CN~CN 3D 2D C2 C3 C1 1D I-Dimer 0D Evaluaing Similarity - XPac Similarity – supramolecular constructs (SCs) Isostructurality Sheets Chains, Tapes Discrete (eg dimer)

  12. What does this look like?

  13. Substituted Mandelic Acids X = H F Cl Br I CF3 Me OMe • Simple chiral molecule – 2 x H-bond donors, 3 x acceptors • Substituted at 2, 3 and 4 positions • Substituents • No H-bond donors • Sterically undemanding • Mono substituted Cryst. Growth & Des., 2014 (x2)

  14. The Bigger Picture • Part of larger project • Quasiracemates, diastereoisomers and racemate/enantiomers • ~2000 structures

  15. Structural Relationship Plot

  16. A-type Constructs (1) • Based on COOH dimer • Exhibited in 11 structures • 3 x 1D constructs • 1 x 2D construct

  17. Structural Relationship Plot

  18. B-type Constructs (1) • Based on C=O and chain OH dimer • More prevalent than A-type – 20 structures • 2 x 1D constructs

  19. B-type Constructs (2) • B11 basis for 2 x 2D constructs – B21 & B22 • B21 + B22 → B32, largest isostructural group • B21→B31, B22→B33 • B12 basis for 4 x 2D constructs B23 & B24 and hybrid AB21 & AB22

  20. Structural Relationship Plot

  21. AB-type Constructs • 2 x 2D constructs combining A11 & B12 stacks • AB21 – A11 & B12 H-bond via available OH → ABAB… bilayer • AB22 – stacks of A11 + 2 x B12 BAB bilayers linked by Hal-Hal interactions

  22. Polymorphs • 9 substituents yielded polymorphs (so far!) • 7 have no relationship or a common dimer only • 2Cl – common A13 1D dimer chain • 3Br and 3Cl – common B22 2D sheet • 3Cl-1 and 3Cl-2 – isostructural!

  23. 3Cl – Isostructural Polymorphs • Phase transition T dependent • Reversible but subject to hysteresis effect • Bond lengths and angles and lattice overlay near identical

  24. Isostructural 2F and 3F • Isostructural within B32 group • Swapping between ortho and meta gives same structure • Why is isostructurality between 2- and 3-substituted heteroaromatics not more common?

  25. Further Observations • Prevalence of structures based on 10-membered rings vs 8-membered rings • Identify ‘missing’ structures to target with cross seeding 3Cl ‘seed’ 3Me new polymorph

  26. Chalcones – 55 structures

  27. Acylanilides Y:p-X Y:m-X Y:o-X

  28. Crystallisation Trends Synthesis Crystallisation 220 acylanilides 400 reactions 300 crystalline samples 260 XRD data sets 40 side products IncreasingTime CH3 C(Me)3 CF3 NH2 H OC2H5 OCH3 C2H5 C3H7 Organic Process Research & Development, 2009

  29. Statistics to the Rescue? • XPac relies on n pairwise comparisons of structures • Can we extract, generate, find features from any/all structures independently • Build statistical models • Look for correlations • Rationalise sets we already have • Predict what might happen • Correlate features (structure property, different aspects of structure, etc) • QSAR for crystal structures?

  30. What does Crystallographic Descriptor Space Look Like? • 1000s of potential descriptors • Develop appropriate descriptors • Molecular (well explored) • Ordinary: a single value • Spectral: calculated a priori and cannot be directly compared

  31. What does Crystallographic Descriptor Space Look Like? • Crystallographers use ‘quantities’ - these are not necessarily statistical descriptors! • Response descriptors must be invariant • OK: Energy calculations, Specific geometry • Dubious: Graph sets, Space groups • Therefore we invented: Graph-based descriptor

  32. Correlations - n pairwise comparisons, a Big Data problem? Correlation ?! 1000’s of structures

  33. Cambridge - we have a problem… • Solid-state descriptor space is really quite big • Chemical complexity space is really, really big • We have only just dipped into this space. • There is a vast wide open space - draw a line between SAN and NYC: how much is populated?! • Simply too much uncovered territory for statistics to be meaningful – publishing “negative results” would help (a bit) • Look at ‘constrained’ space (regions we have control over)

  34. Something more tractable? • Trials with F-anil compounds vs melting point • Reasonable model fit, but not 100% conclusive – still a complex problem • So what question are we asking? • Lets try a yes/no problem instead

  35. Statistical Prediction • Pharmaceutical co-crystal formation – screen design

  36. Getting somewhere? • Training set of co-crystals (CSD) used to find descriptors that discriminate for formation • 3D geometry, complexity of bonding, LogP, shape • Decision tree

  37. Conclusions • From experiments, everything is not what it might seem – unexpected isostructurality between family members. • Similarity in unexpected places • Can use an understanding of similarity in solid-state space to engineer that which doesn’t readily form (seeding etc) • For stats to be meaningful we need better sampling of chemical space • Constrained problems beginning to get meaningful answers

More Related