slide1 n.
Download
Skip this Video
Loading SlideShow in 5 Seconds..
What we do: High performance computing, university owned and operated center PowerPoint Presentation
Download Presentation
What we do: High performance computing, university owned and operated center

Loading in 2 Seconds...

play fullscreen
1 / 33
toviel

What we do: High performance computing, university owned and operated center - PowerPoint PPT Presentation

96 Views
Download Presentation
What we do: High performance computing, university owned and operated center
An Image/Link below is provided (as is) to download presentation

Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server.

- - - - - - - - - - - - - - - - - - - - - - - - - - - E N D - - - - - - - - - - - - - - - - - - - - - - - - - - -
Presentation Transcript

  1. What we do: • High performance computing, university owned and operated center • DoD funded through HPCMP • Provide HPC resources and support • Conduct research locally, globally Who we are:Committed to helping scientists seek understanding of our past, present and future by applying computational technology to advance discovery, analysis and prediction

  2. University Research Center • ARSC computational, visualization and network resources enable broad opportunities for students and faculty • HPC training, outreach activities, internships and science application workshops • HPC facility providing open accessibility for research and education • Developing, evaluating and using acceleration technologies to drive leading-edge HPC computing • • Employs undergraduates, graduate students , post-doctoral fellows, faculty and staff

  3. New and Current HPC Users in Training • Introduce new and current users to HPC and visualization resources • Provides programming skills for successful use of ARSC resources • In-depth instruction, hands-on assistance developing codes • Collaborative discussions with other users and parallel computing experts • Early adoption and assessment of software tools

  4. Outline • The computers • How we access them • Storage environment and policies • Environment on midnight • Submitting Jobs • Hurricane Katrina Example

  5. Arctic-powered HPC • Pingo • Cray XT5 dedicated March 2009 • 3,456 processors • 31.8 teraflops of peak computing power (30 trillion arithmetic calculations a second) • 13.5 terabytes of memory • Midnight • Sun Opteron cluster • 12.02 teraflops of peak computing power (12 trillion arithmetic calculations a second) • 9.2 terabytes of memory

  6. Accessing midnight Password/SecurID Kerberos Server Authenticate Kerberos Ticket ARSC ssh –X –Y midnight.arsc.edu Kerberos Ticket midnight Obtain a Kerberos ticket, which gets stored on your local machine Use Kerberized software to connect – this software will send the valid Kerberos ticket and allow you to connect

  7. Accessing midnight

  8. 1. ARSC storage • ARSC provides storage in three primary locations. Environment variables are defined for each location. • $HOME • $WRKDIR or $WORKDIR (and/or $SCRATCH) • $ARCHIVE or $ARCHIVE_HOME • Available via special request: • $DATADIR

  9. $HOME • Purpose: location to store configuration files and commonly used executables. • Quota: 100MB by default on IBMs. 512 MB on midnight. • Backed Up: yes • Purged: no • Notes: Available from computational nodes and login nodes.

  10. $WRKDIR • Purpose: place to run jobs and store temporary files. • Quota: quota varies by system. 100 GB on midnight. • Backed Up: no • Purged: yes • Notes: Available from computational nodes and login nodes. Can be increased on most systems if you need more space.

  11. StorageTek Silo & Sun Fire 6800’s

  12. $ARCHIVE • Purpose: place to store files long term. • Quota: no quota • Backed Up: yes • Purged: no • Notes: May not be available from all computational nodes. Available from login nodes. Files can be offline. Network Filesystem (NFS) hosted by two Sun File 6800 systems: nanook and seawolf.

  13. Midnight • Manufacturer: Sun Microsystems • Operating System: Linux (SLES 9) • Interconnect: Voltaire Infiniband • Processors: 2.6 GHz AMD Opteron (dual core) 2312- total compute cores

  14. Midnight Hardware • Sun Fire X4600- login and compute nodes • Sun Fire X2200- compute nodes • Sun Fire X4500- temporary filesystem

  15. X4600 Login Nodes • 2- X4600 Login Nodes • called midnight.arsc.edu (or midnight1.arsc.edu) and midnight2.arsc.edu. • Each node has: • 4- AMD Opteron 2.6 GHz dual core processors • 32- GB of shared memory • 1-4X Infiniband network card • QFS access to long term storage (i.e. $ARCHIVE) on seawolf. • Linux Operating System (SLES 9)

  16. X4600 Compute Nodes • 55- X4600 Nodes • Each node has: • 8- AMD Opteron 2.6 GHz dual core processors • 64- GB of shared memory • 1-4X Infiniband network card • Linux Operating System (SLES 9)

  17. X2200 Compute Nodes • 358- X2200 Nodes • Each node has: • 2- AMD Opteron 2.6 GHz dual core processors • 16- GB of shared memory • 1-4X Infiniband network card • Linux Operating System (SLES 9)

  18. Modules Environment • Midnight has the modules package installed (not to be confused with Fortran 90 modules). • This package allows you to quickly switch between different versions of a software package (e.g. compilers).

  19. Modules Environment • ARSC also uses modules for packages that require one or more environment variables to be set to function properly. This hopefully makes such packages easier to use.

  20. Modules • When you log on to midnight the PrgEnv module is loaded by default. This module loaded the Pathscale compilers into the PATH along with the corresponding MPI libraries. • The “module load” is done in either the .profile or the .login.

  21. Sample Module Commands

  22. Available Modules

  23. Sample Module Use mg56 % which pathcc /usr/local/pkg/pathscale/pathscale-2.5/bin/pathcc mg56 % module list Currently Loaded Modulefiles: 1) voltairempi-S-1.pathcc 3) PrgEnv 2) pathscale-2.5 mg56 % module switch PrgEnv PrgEnv.path-3.0 mg56 % which pathcc /usr/local/pkg/pathscale/pathscale-3.0/bin/pathcc mg56 % module list Currently Loaded Modulefiles: 1) PrgEnv.path-3.0 3) voltairempi-S-1.pathcc 2) pathscale-3.0

  24. Module usage for WRF – my recommendation module purge module load PrgEnv.pgi module load ncl

  25. More information on modules • Midnight how to page http://www.arsc.edu/support/howtos/usingsun.html#modules • HPC Users’ Newsletter http://www.arsc.edu/support/news/HPCnews/HPCnews342.shtml • Modules Documentation http://modules.sourceforge.net/ http://modules.sourceforge.net/man/module1.html

  26. Queuing System • Jobs on midnight use the PBS queuing system for queuing and scheduling. • PBS allow you to submit jobs, remove jobs, etc.

  27. Common PBS commands • qsub job.pbs- submit the script “job.pbs” to run by PBS. • qstat- list jobs which haven’t yet completed • qdel jobid- delete a job from the queue. • qmap- show a graphical list of the current work on nodes.

  28. qsub • tells the queuing system: • how many processors your job will need • what kind of nodes • what queue to use • how much walltime the job will need • what to do with stdout and stderr • and more...

  29. Common Queues • debug- for quick turn around debugging work. • standard- regular work. This queue requires that you have an allocation of CPU time. • background- lowest priority queue, but doesn’t require that you have an allocation. • data- queue which allows data to be transferred to long term storage. (i.e. $ARCHIVE_HOME)

  30. PBS script- MPI job using X2200 (4way) nodes #!/bin/bash #PBS -q standard #PBS -l select=8:ncpus=4:node_type=4way #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR mpirun -np 32 ./myprog # This script request 8 chunks with 4 cpus # each on 4way nodes (a.k.a X2200).

  31. PBS script- MPI job using X4600 (16way) nodes #!/bin/bash #PBS -q standard #PBS -l select=2:ncpus=16:node_type=16way #PBS -l walltime=8:00:00 #PBS -j oe cd $PBS_O_WORKDIR mpirun -np 32 ./myprog # This script request 2 chunks with 16 cpus # each on 16way nodes (a.k.a X4600).

  32. Additional PBS Resources • Midnight How To Guide: http://www.arsc.edu/support/howtos/usingsun.html#batch • ARSC HPC Users’ Newsletter- Job Chaining Articles: http://www.arsc.edu/support/news/HPCnews/HPCnews322.shtml#article2 http://www.arsc.edu/support/news/HPCnews/HPCnews320.shtml#article3 http://www.arsc.edu/support/news/HPCnews/HPCnews319.shtml#article1

  33. Additional Resources • Midnight How-to Page: http://www.arsc.edu/support/howtos/usingsun.html • ARSC HPC Users’ Newsletter http://www.arsc.edu/support/news/ • Pathscale Documentation http://pathscale.com/docs.html • ARSC Help Desk Phone: (907) 450-8602 Email:consult@arsc.edu Web:http://www.arsc.edu/support • Some Exercises based on this talk: http://people.arsc.edu/~bahls/classes/midnight_intro.tar.gz