1 / 23

Porting Chemistry Applications to Abe: Lessons Learned

Porting Chemistry Applications to Abe: Lessons Learned. Dodi Heryadi Advanced Application Support Group. Outline . A very brief overview of Abe Chemistry Applications on Abe Porting an OpenMP code Porting an MPI code Debugging on abe. A very brief overview of a Multi-core system.

austin
Download Presentation

Porting Chemistry Applications to Abe: Lessons Learned

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Porting Chemistry Applications to Abe: Lessons Learned Dodi Heryadi Advanced Application Support Group

  2. Outline • A very brief overview of Abe • Chemistry Applications on Abe • Porting an OpenMP code • Porting an MPI code • Debugging on abe Imaginations unbound

  3. A very brief overview of a Multi-core system a simplified diagram of a system with two CPU sockets, main memory, and PCI expansion boards  such as a mpi network card for infiniband or myrinet, an ethernet card, or a graphics card. An example is tungsten.ncsa.uiuc.edu the core layout with L2 cache (shown in blue) for the two CPU sockets containing quad-core Intel Xeon processors on Abe at NCSA [abe.ncsa.uiuc.edu]. Imaginations unbound

  4. Comparing Abe and Tungsten • Abe: 8 cores per node • Tungsten: 2 cores per node #SUs for abe = 8 * # Nodes * Wall_Time #SUs for tungsten = 2 * # Nodes * Wall_Time • For the same #Nodes and Wall_Time, jobs running on abe will be charged four times as much as those running on tungsten • applications running on abe should ideally be at least four times faster compared to those running on tungsten Imaginations unbound

  5. Chemistry Applications on Abe: Available and Planned Quantum Chemistry Gaussian (OpenMP) Gamess (MPI) NWChem (Global Array with MPI) Molpro (Global Array and OpenMP) Classical Molecular Dynamics Amber (MPI) Gromacs (MPI) CHARMM (MPI) NAMD (CHARMM++ with MPI) Ab-initio Molecular Dynamics CPMD (MPI) VASP (MPI) Wien2k (MPI) Imaginations unbound

  6. Porting an OpenMP Code: Gaussian • Perhaps the most widely used Computational Chemistry package in the world • Well known for consuming most of available computing resources in Supercomputer Centers • Migration of Gaussian users from tungsten since its retirement Imaginations unbound

  7. Very Brief Overview of Gaussian Code • Developed since 1970s (over 1 million lines of code, mostly in Fortran with some C) • Memory is allocated in a big chunk (through malloc) Imaginations unbound

  8. Older version (Gaussian 98): DMP: Linda SMP: fork, shmget New Version (Gaussian 03) Linda OpenMP hybrid Parallelization of Gaussian Imaginations unbound

  9. Porting Gaussian 03 on Abe • Support PGI Compilers for EM64T • Used the makefile for IA64 (with some modifications) Imaginations unbound

  10. Initial Gaussian 03 Benchmarks (Valinomycin Force Calculations): Wall time (seconds) Imaginations unbound

  11. Initial Gaussian 03 Benchmarks (Valinomycin Force Calculations): Speed-Up Imaginations unbound

  12. Improving Gaussian 03 Performance with Cache Blocking • Reordering memory accesses to increase temporal locality • Used block size of 2 MB (the size of L2 cache per core) Imaginations unbound

  13. Gaussian 03 Benchmarks on abe: Before and After Cache Blocking Imaginations unbound

  14. Porting an MPI code: Amber • a set of molecular mechanical force fields for the simulation of biomolecules (which are in the public domain, and are used in a variety of simulation programs) • a package of molecular simulation programs which includes source code and demos. (http://amber.scripps.edu/) Imaginations unbound

  15. Porting Amber to abe • One of the first few applications ported to abe • Tested with three different MPI implementations: VMI, MVAPICH, and OpenMPI • Performances on VMI and MVAPICH were comparable • Performance on OpenMPI was the worst Imaginations unbound

  16. Amber Benchmarks: cellulose fiber solvated in TIP3P water in a periodic box (408 K atoms) wall time (in seconds) Imaginations unbound

  17. Debugging on abe with gdbwhere.pl and ssh_pbs.pl commands (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/CommonDoc/gdbwhere.html) “ …The gdbwhere.pl command will run a gdb backtrace [(gdb) where ] for the running processes on a machine [state R from the ps command]…” Imaginations unbound

  18. Debugging on abecase: the job is running, but no output is written • Check the job status [dodi@honest2 lev]$ qstat -u dodi abem5.ncsa.uiuc.edu: Req'd Req'd Elap Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time -------------------- -------- -------- ---------- ------ ----- --- ------ ----- - ----- 456689.abem5.ncsa.ui dodi normal nwchem.234 -- 5 1 -- 02:00 R 00:12 Imaginations unbound

  19. Debugging on abe: 2. find compute node(s) where the job is running on [dodi@honest2 lev]$ qstat -f 456689 Job Id: 456689.abem5.ncsa.uiuc.edu Job_Name = nwchem.23415 Job_Owner = dodi@abe1197 job_state = R queue = normal Error_Path = honest1.ncsa.uiuc.edu:/u/ncsa/dodi/scratch-global/lev/lithium _xtal_2x2x2.err exec_host = abe0236/7+abe0236/6+abe0236/5+abe0236/4+abe0236/3+abe0236/2+ab e0236/1+abe0236/0+abe0228/7+abe0228/6+abe0228/5+abe0228/4+abe0228/3+ab e0228/2+abe0228/1+abe0228/0+abe0191/7+abe0191/6+abe0191/5+abe0191/4+ab e0191/3+abe0191/2+abe0191/1+abe0191/0+abe0180/7+abe0180/6+abe0180/5+ab e0180/4+abe0180/3+abe0180/2+abe0180/1+abe0180/0+abe0125/7+abe0125/6+ab e0125/5+abe0125/4+abe0125/3+abe0125/2+abe0125/1+abe0125/0 Imaginations unbound

  20. Debugging on abe: 3. ssh to one of the compute nodes and type top [dodi@honest2 lev]$ ssh abe0236 [dodi@abe0236 ~]$ top top - 10:56:38 up 35 days, 11:30, 2 users, load average: 7.99, 6.47, 4.74 Tasks: 346 total, 10 running, 336 sleeping, 0 stopped, 0 zombie Cpu(s): 99.6% us, 0.4% sy, 0.0% ni, 0.0% id, 0.0% wa, 0.0% hi, 0.0% si Mem: 16269968k total, 5745372k used, 10524596k free, 6292k buffers Swap: 8393952k total, 736k used, 8393216k free, 4331528k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 8610 dodi 25 0 1714m 89m 5004 R 100 0.6 6:49.59 nwchem 8611 dodi 25 0 1714m 89m 4960 R 100 0.6 6:49.30 nwchem 8613 dodi 25 0 1714m 89m 4908 R 100 0.6 6:49.54 nwchem 8615 dodi 25 0 1714m 89m 4832 R 100 0.6 6:48.69 nwchem 8614 dodi 25 0 1714m 89m 4952 R 100 0.6 6:49.84 nwchem 8617 dodi 25 0 1895m 538m 301m R 100 3.4 6:49.44 nwchem 8612 dodi 25 0 1714m 88m 5016 R 100 0.6 6:48.98 nwchem 8616 dodi 25 0 1714m 88m 4908 R 99 0.6 6:48.88 nwchem 8747 dodi 16 0 6420 1388 876 R 1 0.0 0:00.97 top 8283 dodi 16 0 6000 1456 660 S 0 0.0 0:00.11 tcsh 8307 dodi 16 0 5288 936 328 S 0 0.0 0:00.00 pbs_demux 8424 dodi 16 0 6020 1456 644 S 0 0.0 0:00.11 456689.abem.SC 8585 dodi 16 0 39784 6800 1136 S 0 0.0 0:00.02 python2.3 Imaginations unbound

  21. 4. Debug with gdbwhere.pl and ssh_pbs.pl ssh_pbs.pl 456689 "~consult/debug/gdbwhere.pl" > mygdb.out & (http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/CommonDoc/gdbwhere.html) Imaginations unbound

  22. mygdb.out abe0236: PROCESS ID: 8610 Using host libthread_db library "/usr/local/lib64/tls/libthread_db.so.1". [Thread debugging using libthread_db enabled] [New Thread 182920302432 (LWP 8610)] [New Thread 1084229984 (LWP 8623)] 0x0000002a95e0fee0 in PMPI_Comm_rank () from /usr/local/mvapich2-0.9.8p2patched-intel-ofed-1.2/lib/libmpich.so #0 0x0000002a95e0fee0 in PMPI_Comm_rank () from /usr/local/mvapich2-0.9.8p2patched-intel-ofed-1.2/lib/libmpich.so #1 0x00000000023b433f in armci_util_spin (n=1140850688, notused=0x7fbfffc094) at message.c:225 #2 0x000000000238e3b4 in armci_util_wait_int () #3 0x00000000023b3a51 in armci_smp_bcast (x=0x44000000, n=-1073758060, root=1) at message.c:565 #4 0x00000000023b3c83 in armci_msg_bcast (buf=0x44000000, len=-1073758060, root=1) at message.c:682 #5 0x00000000021cfb68 in ga_brdcst_ () #6 0x000000000093b65d in rtdb_broadcast () #7 0x000000000093bb38 in rtdb_get () #8 0x000000000093b0fa in rtdb_get_ () Imaginations unbound

  23. message.c . . . /*\ busy wait * n represents number of time delay units * notused is useful to fool compiler by passing address of sensitive variable \*/ #define DUMMY_INIT 1.0001 double _armci_dummy_work=DUMMY_INIT; void armci_util_spin(int n, void *notused) { int i; for(i=0; i<n; i++) if(armci_msg_me()>-1) _armci_dummy_work *=DUMMY_INIT; if(_armci_dummy_work>(double)armci_msg_nproc())_armci_dummy_work=DUMMY_INIT; } /***************************Barrier Code*************************************/ void armci_msg_barr_init(){ "message.c" 2017 lines --10%-- 225,20 10% Imaginations unbound

More Related