1 / 27

Integrating Informatics with First Principle Calculations: Building a Materials Genome project

The Third International Workshop on DFT Applied to Metals and Alloys . Integrating Informatics with First Principle Calculations: Building a Materials Genome project. Krishna Rajan NSF International Materials Institute Combinatorial Sciences and Materials Informatics Collaboratory CoSMIC-IMI

kylia
Download Presentation

Integrating Informatics with First Principle Calculations: Building a Materials Genome project

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Third International Workshop on DFT Applied to Metals and Alloys Integrating Informatics with First Principle Calculations: Building a Materials Genome project • Krishna Rajan • NSF International Materials Institute • Combinatorial Sciences and Materials Informatics Collaboratory • CoSMIC-IMI • Department of Materials Science and Engineering • Iowa State University May 3rd 2007

  2. OVERVIEW • What is “materials informatics” ? • What is the link between first principle calculations and informatics? • Mining latent variables…eg. CASTEP • Establish virtual libraries • DFT a source of data for subsequent data mining • Clustering analysis of attributes….use DFT for final screening of stability ---suggesting crystal structures that yet need to be determined • Challenge for international collaboration • An virtual DFT toolkit combining data, DFT codes and informatics algorithms • CoSMIC infrastructure Krishna Rajan

  3. WHY MATERIALS INFORMATICS? • Potential of informatics: • Management of informational complexity • Accelerated discovery • Identifying new pathways • Building new learning communities through cyber-infrastructure • Realizing the potential: • Data mining and statistical learning • Cyber infrastructure • Research platforms • Impact on education – new paradigm for materials education Krishna Rajan

  4. DATA MINING and KNOWLEDGE DISCOVERY Interpretation Data Mining & Visualization Feature Extraction Data Warehousing Knowledge Patterns Transformed Data • Reducing the dimensionality of data offers • Identify the strongest patterns in the data • Capture most of the variability of the data by a small fraction of the total set of dimensions • Eliminate much of the noise in the data making it beneficial for both data mining and other data analysis algorithms Original Data

  5. DATA DRIVEN MATERIALS SCIENCE Data + Correlations + Theory =Knowledge Discovery • Combinatorial • experimentation • Digital libraries • & data bases • Atomistic based • calculations • Continuum based • theories • Materials discovery • Structure-property-processing • relationships • Hidden data trends • Data mining • Dimensionality • reduction + + = Information is multivariate, diverse , very large and access / expertise is globally distributed

  6. INFORMATICS BASED DESIGN STRATEGIES Ideker and Lauffenburger:(2003)

  7. PRINCIPAL COMPONENT ANALYSIS: PCA From a set of N correlated descriptors, we can derive a set of N uncorrelated descriptors (the principal components). Each principal component (PC) is a suitable linear combination of all the original descriptors. PCA reduces the information dimensionality that is often needed from the vast arrays of data in a way so that there is minimal loss of information . ( from Nature Reviews Drug Discovery1, 882-894 (2002) : INTEGRATION OF VIRTUAL AND HIGH THROUGHPUT SCREENING Jürgen Bajorath   ; and Materials Today; MATERIALS INFORMATICS , K. Rajan , October 2005

  8. Functionality 1 = F ( x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 ……) Functionality 2 =F ( x1 , x2 , x3 , x4 , x5 , x6 , x7 , x8 ……) I ……. X1 = f ( x2) X2 = g( x3) X3= h(x4) PC 1= A1 X1 + A2 X2 + A3 X3 + A4 X4 ……. PC 2 = B1 X1 + B2 X2 + B3 X3 +B4 X4 ……. PC 3 = C1 X1 + C2 X2 + C3 X3 + C4 X4……. II III …….

  9. Miedema map (’73) Mooser-Pearson map (’59) Villars map (’83) Makino map (’94)

  10. Stage 1 Stage 2 Mixing rules Modeling & calculation Elemental properties Compound properties (E.N.)A, (E.N.)B (V.E.)A, (V.E.)B (Rs+p)A, (Rs+p)B E.N. V.E. Rs+p Experimental database AxBy Compounds Compound properties (structure descriptors) For AB2, AB3, A2B3, A3B5 89 elements×88 = 7832 compounds For AB compounds ½×(89 elements×88) = 3916 compounds

  11. HIGH DIMENSIONAL STRUCTURE MAPS : 3d Unknown compound Possible structure type candidates

  12. CLASSIFICATION in HIGH DIMENSIONAL STRUCTURE MAPS : AB2 Compounds Krishna Rajan

  13. ASSOCIATION MINING: Establishing association rules for crystal chemistry Krishna Rajan

  14. TRACKING the PATHWAY for a CRYSTAL STRUCTURE < INPUT: AuBe2 elemental parameters ∆X = 0.85978 ΣVE = -0.30361 ∆Rzs+p= 0.61403 ∆nav = -1.61573 ∆nws1/3 = -0.93887 2x∆X = -0.14441 ∆Φ* = -1.68529 1 Structure decision route 2 Possible structure types of AuBe2 3 OUTPUT: Structure type candidates list 1. MgCu2 (-3.65757 eV) 2. PbCl2(-3.60992 eV) 3. OsGe2 (-3.58157 eV) 4. CaF2 (-3.46498 eV) 5. AlB2 (-3.46430 eV) First principles calculations

  15. Crystal Structure of Spinel(AB2X4) a 3 a 8 X B A

  16. CRYSTAL CHEMISTRY DESIGN Chinget.al J. Amer. Ceram.Soc. 85 75-80 (2002) 1. Assess influence of latent variables ( i.e. electronic structure parameters) on properties of known data 2. Establish heuristic relationships on database of all input variables instead of phenomenological relationships in bivariate manner 3. Use statistical learning to predict new materials behavior on new multivariate input data 4. Inverse problem approach to formulate quantitative structure-property relationships Krishna Rajan

  17. Bond length along body diagonal of unit cell, Wycoff notation) Lattice sites along the cubic unit cell body diagonal in the spinel, AB2X4 Polyhedral Volume Interbond angle A-X-B, X-A-X B-X-B, X-B-X A-X-A

  18. Combinatorial selection of Spinel Nitride(AB2N4) Bond length Interbond angle A-X-B, X-A-X B-X-B, X-B-X A-X-A Polyhedral Volume

  19. QSAR

  20. As distance from origin of PCA plot increases, the intensity of DOS peak is increased Can quickly determine:1) Co-Al and Ni-Al alloys have DOS mostly below EF and Ti-Al alloys and Al have DOS mostly above EF. 2) DOS at EF highest for Co3Al and Ti-Al alloys. All of these conclusions are correct Demonstrate the usefulness of PCA in screening for these above points --- more useful as number of systems examined is increased Co3Al TiAl/ TiAl3 Al CoAl/NiAl Ni3Al Blue: E < EF Red: E > EF

  21. Benefits of using PCA to examine entire DOS curves : • Accomplished: • 1) Remove energies that just provide background -- fewer energies to consider • 2) New way to visualize DOS curves, many figures converted to 1 figure -- once full interpretation is understood, much more convenient • 3) Classify alloy structures based on quantum structure • 4) Can quickly screen and visualize which DOS peaks are below or above EF, and can determine which alloys have the largest DOS value at EF • Longer term: • 1) Use plot to quickly visualize other effects, such as peak shifts, double peaks, etc. • 2) Compare to other properties to determine which DOS at a certain energy determine properties • 3) Understand many more features of the DOS curve

  22. Normalize the DOS curves by shifting the curves so that EF of each alloy is at the same energy Reason for doing this: Two common things looked at in literature on DOS curves. DOS relative to EF and DOS value at EF We can quickly screen if an alloy’s DOS are primarily above or below EF. Among a series of alloys, we can say which alloys have a high value of EF EF

  23. Schematic Procedure of Crystal Structure Prediction

  24. Computational Informatics • Searching for patterns of behavior among multivariate data sets • Can pattern recognition lead to predictions? • Establish new correlations • Identify outliers • Enlarge database / virtual libraries • Evaluate databases • Establish predictions

  25. Objective / Rationale: • Accelerated discovery of new • materials, processes and mechanisms using • informatics / combinatorial methods • Global “E-science” virtual laboratory • and educational portal • International research / learning • community • International Materials Network: • Database and data mining network • Materials domain networks • IMI network • Network of NSF leveraged programs : • Intl. professional materials societies/ Intl. Agencies network • Int. Student exchange / research network / diversity • Approaches: • “Rational drug design” approach to • materials discovery (in-silicomaterials science) • combinatorial- high throughput • screening • informatics and data mining • Cyber infrastructure: • Ultra large scale databases • Data sharing and high performance computing • Research / Education Accomplishments: • Materials discovery: • Materials education: • Informatics for materials science http://www.mse.iastate.edu/cosmic

More Related