1 / 30

ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

ACES III and SIAL: technologies for petascale computing in chemistry and materials physics. Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida. Outline of the talk. Performance results

lindsay
Download Presentation

ACES III and SIAL: technologies for petascale computing in chemistry and materials physics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ACES III and SIAL: technologies for petascale computing in chemistry and materials physics Erik Deumens, Victor Lotrich, Mark Ponton, Tomasz Kus, Norbert Flocke, Ajith Perera, Rod Bartlett AcesQC, LLC QTP, University of Florida Gainesville, Florida ACES III and SIAL

  2. Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL

  3. ACES III software • Developed under CHSSI CBD-03 • Parallel for shared and distributed memory • Capabilities • Hartree-Fock (RHF, UHF) • MBPT(2) energy, gradient, hessian • CCSD(T) energy and gradient (DROPMO) • EOM-CC excited state energies ACES III and SIAL

  4. Luciferin(C11H8O3S2N2) RHF C1 symmetry Basis = aug-cc-pvdz (494 bf) Ncorrocc = 46 Sucrose (C12H22O11) RHF C1 symmetry Basis = 6-311G** (546 bf) =91 Two examples ACES III and SIAL

  5. Luciferin CCSD(T) • CCSD on 128 processors • One iteration: 23 min • Total 12 iterations: 275 min • (T) • Hardest 8 occupied orbitals: 420 min on 128 processors • Total 48 correlated orbitals: 420 min on 768 processors ACES III and SIAL

  6. Luciferin CCSD scalingmin per iter; 12 iterations; two versions; ACES III and SIAL

  7. Sucrose CCSD scalingmin per iter, 8 iterations, on Cray XT4 ACES III and SIAL

  8. (H2O)21H+ scalingmin per iter; 657 bf 84 corr occ ACES III and SIAL

  9. Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL

  10. A computer with a single CPU • Basic data item: 64 bit number • High level language: Fortran, C • c = a + b • Assembly language • ADD dest,src • ADD is an operation code • dest and src are registers ACES III and SIAL

  11. The ACES III parallel machine • Basic data item: data block 10,000 64 bit numbers -> super number • High level language: being developed • Assembly language: SIAL super instruction assembly language • R(I,J,K,L) += V(I,J,C,D) * T(C,D,K,L) • xaces3-> super instruction processor ACES III and SIAL

  12. User level execution flow algo.sio input algo.sial ACES III SIAL compiler ACES III and SIAL

  13. Coarse grain parallelism • Executing super instructions in SIAL algorithm • Example: memory super instruction • GET block • Can be from • Local node RAM • Other node RAM • Time for data to become available differs ACES III and SIAL

  14. Fine grain parallelism • Inside super instructions • Example: Compute super instruction • * (contractions) • compute_integrals • Can use multiple cores • Can use accelerators • GPGPUs and Cell processors • FPGAs (field programmable gate arrays) ACES III and SIAL

  15. Worker i GET a -> ask j … d=b*c … wait for a? a arrives <- e=a*d … Worker j … <- send a … … … … … Super instruction flow ACES III and SIAL

  16. Super instruction performance • Super instructions are asynchronous • Makes execution very elastic • Helps maintain consistent performance on many parallel architectures ACES III and SIAL

  17. Distributed data • N worker tasks, each with local RAM • Data distributed in RAM of workers • AO-based: direct use of integrals • MO-based: use transformed integrals • Array blocks are spread over all workers ACES III and SIAL

  18. Served (disk resident) data • M server tasks • have access to local or global disk storage • accept, store and retrieve blocks • also can compute integrals when asked • Data served to and from disk ACES III and SIAL

  19. ACESIII design High level Problem Performance Low level concepts communication Data structures algorithms Input/output Super instruction Assembly language SIAL Super instruction Processor SIP (xaces3) input output ACES III and SIAL

  20. Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL

  21. Clear divisions • Extreme object oriented approach • High level = problem domain specific • Concepts • Data structures • Algorithms • Low level = focus on performance • Processor and memory speed • Communication latency and bandwidth ACES III and SIAL

  22. Super Instruction Coding • Write algorithm in high level super instruction assembly language • Declare (block) arrays, (block) indices • DO - END DO construct • PARDO – END PARDO construct • Basic operations: add and multiply and contract • SIP_BARRIER • Each line maps to a few super instructions ACES III and SIAL

  23. Optimize and Tune • Optimize with traditional techniques • optimize the basic contraction operations by mapping them to DGEMM calls • create fast code to generate integrals • optimize memory allocation by using multiple block stacks • optimize execution and data movement ACES III and SIAL

  24. Programmer productivity: Other • Other tools for parallel development • UPC (Universal Parallel C) • CAF (Co-Array Fortran) • GA (Global Array Tools) • DDI (Distributed Data Interface) • Simple syntax • Specify precise data layout • PGAS partitioned global address space • Rigorous array blocking ACES III and SIAL

  25. Programmer productivity: SIAL • SIAL has simple syntax • Experience shows it is more expressive • Exact data layout is done by SIP • Allows runtime tuning and optimization • SIAL has rich set of data structures • Distributed array • Served array • Temporary array • Local array ACES III and SIAL

  26. Outline of the talk • Performance results • What can be done today? • Design of petascale capable software • How does SIAL work? • What makes it different? • Outlook ACES III and SIAL

  27. New SIAL developer tools coming • Develop higher level programming language • Programmer support • Eclipse as IDE (integrated development environment) for SIAL coding • Understands SIAL syntax • Code refactoring tools • Rewrite code • Help improve performance ACES III and SIAL

  28. New algorithms being explored • SIAL: Data staging • Huge served array • Copy section in distributed array • Work efficiently on distributed array • Similar to BLAS-3 management of cache • ACES III: Linear scaling • Localized orbitals ACES III and SIAL

  29. New domains being explored • Need • A domain specialist, or a few of them • Willingness and expertise to explore alternative algorithms • Apply “super instruction” design pattern • Find “super number”, the basic data item in the domain • “Super instructions” then follow ACES III and SIAL

  30. Towards petascale computing • ACES III • Ready for real work • Has run on 8,192 processors • SIAL • Useful in electronic structure • Can be used in other domains ACES III and SIAL

More Related