1 / 23

Looking for a (standard) Common Format for (Quantum)

Motivation Vocabolary wrappers. Looking for a (standard) Common Format for (Quantum). Computational Chemistry. A WG activity within COST action 23 ( WG D23/0006/01 ). Elda Rossi , Andrew Emerson – CINECA Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna

rea
Download Presentation

Looking for a (standard) Common Format for (Quantum)

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Motivation Vocabolary wrappers Looking for a(standard)CommonFormatfor(Quantum) Computational Chemistry A WG activity withinCOST action 23(WG D23/0006/01) • Elda Rossi, Andrew Emerson – CINECA • Gian Luigi Bendazzoli, Antonio Monari – Univeristà di Bologna • Renzo Cimiraglia, Celestino Angeli, Stefano Borini - Università di Ferrara • Daniel Maynau, Stefano Evangelisti - IRSAMC – Toulouse • José Sanchez-Marin - Universitat de Valencia • Peter Szalay - Eötvös Loránd University • Rosa Caballol - Universitat Rovira i Virgili Tarragona

  2. Motivation for the work Motivation Vocabolary wrappers To build a meta-system for supporting research collaboration in the field of “Localised Orbitals in post-SCF methods …Linear Scaling methods in a Multi-Reference context”

  3. The scenario Motivation Vocabolary wrappers • Different laboratories need to collaborate • Different “home-made” codes need to be used together since they give different views of the same problem • General purpose “basic” codes needed to pre-compute data in a sort of pipeline • Programmes should remain on their original sites under the responsibility of their authors • Different platforms • Network connections(grid architecture) • Workflow

  4. The need of a Common Format Motivation Vocabolary wrappers The first problem we faced: How different codes(on different platforms)can communicate we need aCommon Formatfor (at least) Quantum Chemistry codes

  5. Preliminary steps Motivation Vocabolary wrappers • Looking around … • CML available since long time • XML is use by Accelrysfor internal files • XML is used by ArgusLabfor internal files All of them not completed suited for computational chemistry mainly structural chemistry, no Quantum Chemistry properties • XMLseems the best technology so we took the decision to try another XML based format • HDF5looked nice for storing large binary data typical of QC

  6. How should work the engine Motivation Vocabolary wrappers IN-wrapper • Leaves the program unchanged • One wrapper for each program – If a code is added only one wrapper to be written IN-files Data RepositoryXML/HDF Program OUT-files OUT-wrapper

  7. QCML: an XML format for QC Motivation Vocabolary wrappers • In order to be as general as possible we need to write down a hierarchical schema of Quantum Chemistry quantities • As a first approximation three domains can be identified • Base FACTS initial data for describing the physics of the system • DERIVED quantities computed from FACTS using QC Fact algorithms (Energies, Props, integrals, coeff, …) • W-FLOWwhich codes are in the pipeline, specific input Parameters data, … • A base fact is a fact that is a given in the world and is remembered (stored) in the system. • A derived fact is created by an inference or a mathematical calculation from terms, facts, other derivations, or even action assertions.

  8. FACT: molecule Motivation Vocabolary wrappers <systemtitle date program author> <moleculenElectrons charge spinMultiplicity spaceSymmetry> <symmetry>groupName/> <geometrytype unit numAtoms symmetryRef > <atomsymbol isotope x3 y3 z3/> <basisname type numOrbitals > <atomBaseangularMomMAX symbol > <angularMomvalue symbol numOrbitals> <orbitalid numPrimitives> <exps/> <coeffs/> Symmetry: group name & other symmetry data Geometry: only cartesian, full or unique for sym Basis: by name or fully defined • FACTS • DERIVED • W-FLOW

  9. DERIVED data: computedData Motivation Vocabolary wrappers <system…> <computedData> <energy unit levelOfTheory quality value> <state spaceSymmetry spinMultiplicity excitationLevel /> <property unit levelOfTheory quality value> <state “bra” spaceSymmetry spinMultiplicity excitationLevel /> <state “ket” spaceSymmetry spinMultiplicity excitationLevel /> <operator ordername/> <file address URL/> A “schema” has been written for QCML • FACTS • DERIVED • W-FLOW

  10. DERIVED : computedData/file Motivation Vocabolary wrappers Two possible strategies: • Leave data in their native format and translate them only when needed. Maintain different version (formats) of the same data • Define a “standard” format for binary data and convert them anyway • Problem with large binary datasets • include the reference not the actual data • The second was the solution of choice • HDF5 appears to be a good solution

  11. HDF Mission Motivation Vocabolary wrappers To develop, promote, deploy, and support open and free technologies that facilitate scientific data storage, exchange, access, analysis and discovery. • Format and software for scientific data • Stores images, multidimensional arrays, tables, etc. • Emphasis on storage and I/O efficiency • Free and commercial software support • Emphasis on standards • Users from many engineering and scientific fields

  12. Example HDF5 file Motivation Vocabolary wrappers Property Overlap Repulsion Kinetic Kinetic+Repulsion Orb | occ | energy ----|-----|----- 1 | 0 | 0.35 2 | 0.5| 0.26 3 | 2. | 0.69 Table “/” (root) “/MO” “/MO” “/AO” “/bi” “/mono” “/mono” “/bi” “/coefficients” 4-D array

  13. HDF file structure for QC Motivation Vocabolary wrappers Norb Name QCML_ref Norb Root AO <i/j> <i/T/j> <i/Vnuc/j> <i/T/j>+<i/Vnuc/j> <ij/kl> MO <i/T/j> <i/V/j> <i/T/j>+<i/Vnuc/j> <ij/kl> coeff(i,j) Property <i/p/j> Spin Polar.: a=b a b Orb Classif: Core Active Virtual Orb Energies: Orb Symm: [1-order] + format metadata (integer, binary, Endian-ism, …)

  14. QCML processing: wrappers Motivation Vocabolary wrappers • One couple of wrappers for each code in the metasystem • They should be written & maintained by the authors of the chemical codes • XML processing can be used (DOM) but … what language??? • Fortran: no easy and stable DOM available • Scripting languages (Perl/Python/Java): not known by chemists • We tried both ways (Fortran & Python)

  15. Fortran DOM: drawbacks Motivation Vocabolary wrappers • The only problem is the Fortran binding • It doesn’t exist (at least last year …) • DOM is OO and Fortran is not • It exists a C binding (Gdome2) • Gdome2 was installed – very hard work – on a mainframe platform (it was conceived for Linux) • We are currently converting it to Fortran, by adopting the DOM recommendations (simplified …)

  16. Why Fortran Motivation Vocabolary wrappers GOOD • Users don't need to learn a new language • Homogeneous environment BAD • Tricky: need an external library (f77xml) built on top of gdome2 • Porting problems for gdome2/libxml2 may arise

  17. Still in development v0.4 is out (experimental, with limited features) v1.0 upcoming, API changed to be nearly DOM2 compliant Written in C on top of gdome2 http://gdome2.cs.unibo.it/index.html Designed for interfacing to F77 (also F90 soon) Reduced namespace pollution F77xml library Motivation Vocabolary wrappers Cons: • F77 syntax is difficult (DOM2 + tricks) • F90 syntax is simpler • A pre-processor will convert F90 syntax to F77 http://freshmeat.net/projects/f77xml

  18. F77xml library - V1.0 example Motivation Vocabolary wrappers Gdome2 (C) GdomeNode* gdome_el_firstChild (GdomeElement *self, GdomeException *exc); F90 Call f77xml_el_firstChild(nodeCode, elemCode, exc) First position: Return value NodeCode, elemCode,exc mapped to INTEGER F77 Func='el_firstChild' Call xp3t1(nodeCode,func,elemCode,exc) Multiplexer function:x: p3: 3 parameters (+ name function) t1: type 1 parameter schema(code/code/error)

  19. Why Python Motivation Vocabolary wrappers GOOD • Very Easy Object Oriented Language • Works well with strings • Simple ed efficient DOM interface for XML • Present in almost all UNIX/LINUX distribution BAD • Users do need to learn a new language • Maybe less powerful than Perl • Usually not used by chemists

  20. At the present a prototype does work with molpro-fci chain. It takes information from xml-repository Writes down proper MOLPRO and FCI input Starts the two programs With a different XML file users should only specify the file name and some simple parameters (orbital guess for FCI) Python Wrapper Motivation Vocabolary wrappers

  21. Python or not Motivation Vocabolary wrappers • Python is very simple to learn and works very efficiently with xml • Scripts written in Python (at least for prototypes) are quite clear, linear and easy to maintain or upgrade • Possibility of a GUI could make our project much more user-friendly

  22. What we have done … MolProIN-file IN-wrapper Single platform: IBM SP4 Two code chains • MolPro to FCI • MolPro to CasDI MolPro OUT-wrapper FCIDUMP Start here QCMLRepository HDF5Repository IN-wrapper Bin file for FCI FCIIN-file IN-wrapper FCI Stop here

  23. In conclusion … Two important hints on data… • Use some XML dialect for describing simple structured data • Use HDF5 for storing large array and binary data Need of a good and easy API to XML & HDF How to manage the workflow How to manage the grid connection

More Related