1 / 30

ACESIII

Outline Design philosophy Implementation Results Conclusions. Collaborators Mr. Mark Ponton, ACES Q. C. (SIP/SIAL/Compiler) Dr. Norbert Flocke, QTP (Integral package) Dr. Erik Deumens, QTP (Architect) Dr. Ajith Perera, QTP Dr. H. Lei, ACES Q. C. (Compiler) Dr. Anthony Yau, HPTi

reyna
Download Presentation

ACESIII

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Outline Design philosophy Implementation Results Conclusions Collaborators Mr. Mark Ponton, ACES Q. C. (SIP/SIAL/Compiler) Dr. Norbert Flocke, QTP (Integral package) Dr. Erik Deumens, QTP (Architect) Dr. Ajith Perera, QTP Dr. H. Lei, ACES Q. C. (Compiler) Dr. Anthony Yau, HPTi Dr. Rodney Bartlett, QTP ACES Q. C. ACESIII

  2. Traditional Design code

  3. ACESIII Design control compute communication Disk I/O code hardware

  4. ACESIII design High level Problem Performance Low level concepts communication Data structures algorithms Input/output Super instruction Assembly language SIAL Super instruction Processor SIP (xaces3) input output

  5. Key features Index segmentation Data blocking Task isolation Advantageous Flexibility Tune ability: Fast optimization New methods implemented in reduced time Portable SIAL (Super Instruction Assembly Language)

  6. SCF MBPT(2) gradient CCSDgradient CCSD(T) MBPT(2)Hessian EOM CCSD (Tomasz Kus) RHF, UHF RHF, UHF, ROHF RHF, UHF RHF, UHF RHF, UHF, ROHF RHF, UHF Implemented

  7. SCF Transformation CCSD CCSD(T) Easy if you have a good integrals package Hard but small cost Hard as highly nonlinear Trivial !!! At least that is the common wisdom CCSD(T)

  8. (T) Strategy occupied o1 o2 o3 o4 E1 E2 E3 E4 E(TOTAL)

  9. (T) Strategy occupied o1 o2 o3 o4 E1 E2 E3 E4 E(TOTAL)

  10. Advantages of DUAL layer parallelism • Less data replication or I/O bottlenecks • Trivial restart capability • Better turnaround due to queuing • Since more processors are used the effective (T) time is comparable to the CCSD time making the CCSD as/more important that the (T)!

  11. Luciferin(C11H8O3S2N2) RHF C1 symmetry Basis = aug-cc-pvdz (498bf) Ncorrocc = 46 Sucrose (C12H22O11) RHF C1 symmetry Basis = 6-311G** (546bf) 68 CCSD(T)

  12. 32 256

  13. 32 256

  14. 32 512

  15. H10C 3O 4P C1 208 basis functions 75 electrons Number of processors = 64 Time CCSD = 69 minutes (3.8 min/iter) Time (T) =111 minutes *** *** 7 dual jobs DMMP+OH

  16. Systematic set of Benchmarks • Why? To remove confusion over technological verses algorithmic advances. • Allow users informed choices. • Provide a set of calculations to evaluate each program so strengths and weaknesses become evident. • Remove ambiguity in literature.

  17. Specifications(Mine!) N=6 UHF C1 symmetry Basis = aug-cc-pvtz (300bf) Ncorrocc = 54 R = 5 bohr Methods MBPT(2) gradient CCSD gradient CCSD(T) (core dropped) MBPT(2) Hessian (RHF) ArN Cluster Benchmarks(Performance)

  18. 32 256

  19. 32 256 32 32 256

  20. 32 256 32 32 256 256

  21. 256 32

  22. 32 256 32 256

  23. MBPT(2) Hessian perturbations d d[ [Vabij Vabij Dabij ] / dp ]dq dV/dp*dV/dq V*d2V/dp/dq

  24. Details of calculation • Number of basis functions = 300 • Number of correlated occupied = 54 • Number of Hessian elements = 324/2 • Number of processors = 128 • RHF reference

  25. V*d2V/dpdq dV/dp dV/dq dV/dp*dV/dq T=381 minutes 155 sec / pert p 330 sec / pert q 16 sec / element Results

  26. Observations • Ideally suited for dual layer parallelization with ‘dual’ layer being over the perturbations. • Dual layer strategy not optimal from an operation viewpoint as some computation must be repeated but many advantages: restart capability, real time of calculation, queuing, data storage.

  27. Conclusions • ACESIII provides an ideal parallel environment in which to implement computationally intense methods. • MBPT(2) gradient achieved over 90% scaling until work exhausted • CCSD achieved better than ideal scaling up to 512 processors (32 as reference) indicating an optimal range of processors exists for each computation. • CCSD(T) perturbative triples can be computed quit effectively using a dual layer parallelization strategy so that (T) and CCSD are comparable to compute in a pragmatic way. • CCSD gradients (Ar6) exhibit ideal scaling from 32-256 processors.

  28. Conclusions • MBPT(2) Hessians (and others also) benefit from dual layer parallelism but care bust be taken to segment the work optimally. • A set of benchmark calculations would be very valuable to the quantum chemistry community to remove ambiguities among various programs. • ACESIII has been successfully ported to the following systems: IBM SP4 SP5, ALTIX, Linux cluster, Opteron cluster and is available on many DOD machines. • ACESIII benefits from ‘many’ processors indicating potential in the massively parallel regime. • The flexibility offered by the ACESIII environment allows for rapid tuning and implementation of codes.

More Related