1 / 13

Preliminary CPMD Benchmarks

Preliminary CPMD Benchmarks. On Ranger, Pople , and Abe TG AUS Materials Science Project Matt McKenzie LONI. What is CPMD ?. Car Parrinello Molecular Dynamics www.cpmd.org Parallelized plane wave / pseudopotential implementation of Density Functional Theory

ravi
Download Presentation

Preliminary CPMD Benchmarks

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preliminary CPMD Benchmarks On Ranger, Pople, and Abe TG AUS Materials Science Project Matt McKenzie LONI

  2. What is CPMD ? • Car Parrinello Molecular Dynamics • www.cpmd.org • Parallelized plane wave / pseudopotential implementation of Density Functional Theory • Common chemical systems: liquids, solids, interfaces, gas clusters, reactions • Large systems ~500atoms • Scales w/ # electrons NOT atoms

  3. Key Points in Optimizing CPMD • Developers have done a lot of work here • The Intel compiler is used in this study • BLAS/LAPACK • BLAS levels 1 (vector ops) and 3 (matrix-matrix ops) • Some level 2 (vector-matrix) • Integrated optimized FFT Library • Compiler flag: -DFFT_DEFAULT

  4. Benchmarking CPMD is difficult because… • Nature of the modeled chemical system • Solids, liquids, interfaces • Require different parameters stressing the memory along the way • Volume and # electrons • Choice of the pseudopotential (psp) • Norm-conserving, ‘soft’, non-linear core correction (++memory) • Type of simulation conducted • CPMD, BOMD, Path Integral, Simulated Annealing, etc… • CPMD is a robust code • Very chemical system specific • Any one CPMD sim. cannot be easily compared to another • However, THERE ARE TRENDS • FOCUS: simple wave function optimization timing • This is a common ab initio calculation

  5. Probing Memory Limitations • For any ab initio calculation: • Accuracy is proportional to # basis sets used • Stored in matrices, requiring increased RAM • Energy cutoff determines the size of the Plane wave basis set, NPW = (1/2π2)ΩEcut3/2

  6. Model Accuracy & Memory Overview Image obtained from the CPMD user’s manual Pseudopotential’s convergence behavior w.r.t. basis set size (cutoff) NOTE: Choice of psp is important i.e. ‘softer’ psp = lower cutoff = loss of transferability VASP specializes in softpsp’s ; CPMD works with any psp’s

  7. Memory ComparisonΨoptimization, 63 Si atoms, SGS psp Ecut= 50 Ryd Ecut = 70 Ryd • NPW≈ 134,000 • Memory = 1.0 GB • NPW ≈ 222,000 • Memory = 1.8 GB Well known CPMD benchmarking model: www.cpmd.org Results can be shown either by: Wall time = (n steps x iteration time/step) + network overhead Typical Results / Interpretations, nothing new here Iteration time = fundamental unit, used throughout any given CPMD calculation It neglects the network, yet results are comparable Note, CPMD runs well on a few nodes connected with gigabyte ethernet Two important factor which affects CPMD performance MEMORY BANDWIDTH FLOATING-POINT

  8. Pople, Abe, Ranger CPMD Benchmarks

  9. Results I • All calculations ran no longer than 2 hours • Ranger is not the preferred machine for CPMD • Scales well between 8 and 96 cores • This is a common CPMD trend • CPMD is known to super-linearity scale above ~1000 processors • Will look into this • Chemical system would have to change as this smaller simulation is likely not to scale in this manner

  10. Results II • Pople and Abe gave the best performance • IF a system requires more than 96 procs, Abe would be a slightly better choice • Knowing the difficulties in benchmarking CPMD, ( psp, volume, system phase, sim. protocol ) this benchmark is not a good representation of all the possible uses of CPMD. • Only explored one part of the code • How each system performs when taxed with additional memory requirements is a better indicator of CPMD’s performance • To increase system accuracy, increase Ecut

  11. Percent Difference between 70 and 50 Ryd%Diff = [(t70-t50) / t50]*100

  12. Conclusions RANGER • Re-ran Ranger calculations • Lower performance maybe linked to Intel compiler on AMD chips • PGI compiler could show an improvement • Nothing over 5% is expected: still be the slowest • Wanted to use the same compiler/math libraries ABE • Possible super-linear scaling, tAbe, 256procs < tothers, 256procs • Memory size effects hinders performance below 96 procs POPLE • Is the best system for wave function optimization • Shows a (relatively) stable, modest speed decrease as the memory requirement is increased, it is the recommended system

  13. Future Work • Half-node benchmarking • Profiling Tools • Test the MD part of CPMD • Force calculations involving the non-local parts of the psp will increase memory • Extensive level 3 BLAS & some level 2 • Many FFT all-to-all calls, Now the network plays a role • Memory > 2 GB • A new variable ! Monitor the fictitious electron mass • Changing the model • Metallic system (lots of electrons, change of psp; Ecut) • Check super-linear scaling

More Related