1 / 14

Running jobs on SDSC Resources

Running jobs on SDSC Resources. Krishna Muriki Oct 16 , 2006 kmuriki@sdsc.edu SDSC User Services. Path directions. DataStar system overview Batch Job environment Simple job compilation Job queues/scripts Job submission Access to HPSS resources. Access to IA64 cluster, job management.

dara
Download Presentation

Running jobs on SDSC Resources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Running jobs on SDSC Resources Krishna Muriki Oct 16 , 2006 kmuriki@sdsc.edu SDSC User Services

  2. Path directions • DataStar system overview • Batch Job environment • Simple job compilation • Job queues/scripts • Job submission • Access to HPSS resources. • Access to IA64 cluster, job management.

  3. Batch/Interactive computing • Batch job environment • Job Manager – Load Leveler (tool from IBM) • Job Scheduler – Catalina (SDSC internal tool) • Job Monitoring – Various commands • Batch & Interactive use on different nodes. • DataStar Login Nodes • dslogin.sdsc.edu • dspoe.sdsc.edu • dsdirect.sdsc.edu

  4. Queues & nodes • Start with dspoe (interactive queues) • Do production runs from dslogin (normal & normal32 queues) • Use express queues from dspoe to get it right now. • Use dsdirect for special needs.

  5. Now lets do it ! • Example files are located here: • /gpfs/projects/workshop/running_jobs • Copy the whole directory • Use Makefile to compile the source code. • Edit the parameters in the job submission scripts. • Communicate with job manager using his language.

  6. Job Manager language • Ask him to show the queue: llq • Ask him to submit your job to queue: llsubmit • Ask him to cancel your job in the queue: llcancel • Special (more useful commands from SDSC’s inhouse tool – Catalina – plz bare with me – I’m slow  ) • ‘showq’ to look at the status of the queue. • ‘show_bf’ to look at the backfill window opportunities

  7. Access to HPSS - 1 • What is HPSS: The centralized, long-term data storage system at SDSC is the High Performance Storage System (HPSS) • currently stores more than 3 PB of data (as of June 2006) • total system capacity of 7.2 PB of data. • Data added at an average rate of 100 TB per month (between Aug’0 5 and Feb’ 06).

  8. Access to HPSS - 2 • First thing – setup your authentication: • run ‘get_hpss_keytab’ script. • Know HPSS language to talk to it: • hsi • htar

  9. SDSC IA64 cluster

  10. Batch/Interactive computing on IA64 • Batch job environment • Job Manager – PBS (Open source tool) • Job Scheduler – Catalina (SDSC internal tool) • Job Monitoring – Various commands & ‘Clumon’ • Batch & Interactive use on different nodes. • IA64 Login Nodes • tg-login1.sdsc.edu ( alias to tg-login.sdsc.edu ) • tg-login2.sdsc.edu • tg-c127.sdsc.edu,tg-c128.sdsc.edu, • tg-c129.sdsc.edu & tg-c130.sdsc.edu.

  11. Queues & Nodes. • Total around 260 nodes • With 2 processors each. • All in single batch queue – ‘dque’ • That’s sufficient now lets do it! • Example files in • /gpfs/projects/workshop/running_jobs • PBS commands – qstat, qsub, qdel

  12. Running Interactive Interactive use is via PBS: qsub -I -V -l walltime=00:30:00 -l nodes=4:ppn=2 • This request is for 4 nodes for interactive use (using 2 cpus/node) for a maximum wall-clock time of 30 minutes. Once the scheduler can honor the request, PBS responds with: “ready” and gives the node names. • Once nodes are assigned, user can now run any interactive command. For example, to run an MPI program, parallel-test on the 4 nodes, 8 cpus: mpirun -np 8 -machinefile $PBS_NODEFILE parallel-test

  13. References • See all web links at • http://www.sdsc.edu/user_services

More Related